)on EBCDIC platform

SADAHIRO Tomoyuki Wed, 10 Aug 2005 17:48:20 -0700

On Wed, 10 Aug 2005 14:06:56 +0530, Sastry <[EMAIL PROTECTED]> wrote
> 
> > > As suggested by you, I ran the following script which resulted in
> > > substituting all the characters with X irrespective of the "special
> > > case" [i-j].
> > >
> > > ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ s/[\x89-\x91]/X/g;
> > > is($a, "XXXXXXXX");


> > +++quote begin
> > REGULAR EXPRESSION DIFFERENCES
> > As of perl 5.005_03 the letter range regular expression such as [A-Z]
> > and [a-z] have been especially coded to not pick up gap characters.
> > For example, characters such as o WITH CIRCUMFLEX that lie between I
> > and J would not be matched by the regular expression range /[H-K]/.
> > This works in the other direction, too, if either of the range end
> > points is explicitly numeric: [\x89-\x91] will match \x8e, even though
> > \x89 is i and \x91 is j, and \x8e is a gap character from the alphabetic
> > viewpoint.
> If I specify  [\x89-\x91]  it just matches the end characters (i,j)
> and doesn't match any of the gapped characters( including \x8e),
> unlike what you had mentioned.
> Is this correct? 
> -Sastry

According to the above statement in perlebcdic.pod,
s/[\x89-\x91]/X/g must substitute \x8e with X.
But it doesn't concern whether tr/\x89-\x91/X/ would substitute \x8e
with X or not, since tr/// does not use brackets, [ ].

Though I think ranges in [ ] and ranges in tr/// should coincide
and agree that tr/\x89-\x91/X/ should substitute \x8e with X,
that is just my opinion.
I don't know whether it is true and correct.

By the way, when you say "If I specify  [\x89-\x91]", does it
mean s/[\x89-\x91]/X/g or tr/\x89-\x91/X/ ?  I'm confused.

We are first informed by you that gapped characters are not
substituted with X by tr/\x89-\x91/X/.
And you said s/[\x89-\x91]/X/g substituted all the characters
including gapped characters with X, hadn't you?
If so, I assume your [\x89-\x91] which doesn't matching any of
the gapped characters to be tr/\x89-\x91/X/.

The following is a part of the current core tests from op/pat.t.
I believe they should be passed.

Regards,
SADAHIRO Tomoyuki

+++begin
# The 242 and 243 go with the 244 and 245.
# The trick is that in EBCDIC the explicit numeric range should match
# (as also in non-EBCDIC) but the explicit alphabetic range should not match.

if ("\x8e" =~ /[\x89-\x91]/) {
  print "ok 242\n";
} else {
  print "not ok 242\n";
}

if ("\xce" =~ /[\xc9-\xd1]/) {
  print "ok 243\n";
} else {
  print "not ok 243\n";
}

# In most places these tests would succeed since \x8e does not
# in most character sets match 'i' or 'j' nor would \xce match
# 'I' or 'J', but strictly speaking these tests are here for
# the good of EBCDIC, so let's test these only there.
if (ord('i') == 0x89 && ord('J') == 0xd1) { # EBCDIC
  if ("\x8e" !~ /[i-j]/) {
    print "ok 244\n";
  } else {
    print "not ok 244\n";
  }
  if ("\xce" !~ /[I-J]/) {
    print "ok 245\n";
  } else {
    print "not ok 245\n";
  }
} else {
  for (244..245) {
    print "ok $_ # Skip: only in EBCDIC\n";
  }
}
---end

Re: Transliteration operator(tr//)on EBCDIC platform

Reply via email to