Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain this character.
I write a script (itself encoded with UTF8) in Perl 5.10.0 (on OS X 10.6.5) with: use encoding 'utf8'; use charnames ':full:'; The script opens the file with: open FH, '<:utf8', filename.txt; It reads lines in with: while <FH> {} Then, in a regular expression in the script, I can match the NO-BREAK SPACE with any of these patterns: 1. /\N{NO-BREAK SPACE}/ 2. / / (where the character between slashes looks like a space but is a no-break space) 3. /[\x7f-\x80]/ Patterns 1 and 2 make sense, but pattern 3 is mysterious to me, because the range specified in pattern 3 includes DELETE and an unnamed character but does not include NO-BREAK SPACE. Moreover, I expect to be able to match the NO-BREAK SPACE with these patterns, but I cannot: 4. /[\xa0]/ 5. /\xa0/ In the related documentation, I have not found anything explaining why pattern 3 works, or anything explaining why patterns 4 and 5 do not work. I have replicated these anomalies in Perl 5.8.8. under Red Hat Enterprise Linux 5. I would be delighted to receive explanations or references to documentation that I have overlooked or misunderstood. ˉ