On Wed, 10 Aug 2005 14:06:56 +0530, Sastry <[EMAIL PROTECTED]> wrote > > > > As suggested by you, I ran the following script which resulted in > > > substituting all the characters with X irrespective of the "special > > > case" [i-j]. > > > > > > ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ s/[\x89-\x91]/X/g; > > > is($a, "XXXXXXXX");
> > +++quote begin > > REGULAR EXPRESSION DIFFERENCES > > As of perl 5.005_03 the letter range regular expression such as [A-Z] > > and [a-z] have been especially coded to not pick up gap characters. > > For example, characters such as o WITH CIRCUMFLEX that lie between I > > and J would not be matched by the regular expression range /[H-K]/. > > This works in the other direction, too, if either of the range end > > points is explicitly numeric: [\x89-\x91] will match \x8e, even though > > \x89 is i and \x91 is j, and \x8e is a gap character from the alphabetic > > viewpoint. > If I specify [\x89-\x91] it just matches the end characters (i,j) > and doesn't match any of the gapped characters( including \x8e), > unlike what you had mentioned. > Is this correct? > -Sastry According to the above statement in perlebcdic.pod, s/[\x89-\x91]/X/g must substitute \x8e with X. But it doesn't concern whether tr/\x89-\x91/X/ would substitute \x8e with X or not, since tr/// does not use brackets, [ ]. Though I think ranges in [ ] and ranges in tr/// should coincide and agree that tr/\x89-\x91/X/ should substitute \x8e with X, that is just my opinion. I don't know whether it is true and correct. By the way, when you say "If I specify [\x89-\x91]", does it mean s/[\x89-\x91]/X/g or tr/\x89-\x91/X/ ? I'm confused. We are first informed by you that gapped characters are not substituted with X by tr/\x89-\x91/X/. And you said s/[\x89-\x91]/X/g substituted all the characters including gapped characters with X, hadn't you? If so, I assume your [\x89-\x91] which doesn't matching any of the gapped characters to be tr/\x89-\x91/X/. The following is a part of the current core tests from op/pat.t. I believe they should be passed. Regards, SADAHIRO Tomoyuki +++begin # The 242 and 243 go with the 244 and 245. # The trick is that in EBCDIC the explicit numeric range should match # (as also in non-EBCDIC) but the explicit alphabetic range should not match. if ("\x8e" =~ /[\x89-\x91]/) { print "ok 242\n"; } else { print "not ok 242\n"; } if ("\xce" =~ /[\xc9-\xd1]/) { print "ok 243\n"; } else { print "not ok 243\n"; } # In most places these tests would succeed since \x8e does not # in most character sets match 'i' or 'j' nor would \xce match # 'I' or 'J', but strictly speaking these tests are here for # the good of EBCDIC, so let's test these only there. if (ord('i') == 0x89 && ord('J') == 0xd1) { # EBCDIC if ("\x8e" !~ /[i-j]/) { print "ok 244\n"; } else { print "not ok 244\n"; } if ("\xce" !~ /[I-J]/) { print "ok 245\n"; } else { print "not ok 245\n"; } } else { for (244..245) { print "ok $_ # Skip: only in EBCDIC\n"; } } ---end