On Mon, 8 Aug 2005 15:36:40 +0100, Nicholas Clark <[EMAIL PROTECTED]> wrote
> On Thu, Aug 04, 2005 at 11:42:54AM +0530, Sastry wrote: > > Hi > > > > I am trying to run this script on an EBCDIC platform using perl-5.8.6 > > > > ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/; > > is($a, "XXXXXXXX"); > > > > > > The result I get is > > > > 'X«»ðý±°X' > > > > a) Is this happening since \x8a\x8b\x8c\x8d\x8f\x90 are the gapped > > characters in EBCDIC ? > > I think so. In that \x89 is 'i' and \x91 is 'j'. > > > > b) Should all the bytes in $a change to X? > > I don't know. It seems to be some special case code in regexec.c: > > #ifdef EBCDIC > /* In EBCDIC [\x89-\x91] should include > * the \x8e but [i-j] should not. */ > if (literal_endpoint == 2 && > ((isLOWER(prevvalue) && isLOWER(ceilvalue)) || > (isUPPER(prevvalue) && isUPPER(ceilvalue)))) > { > if (isLOWER(prevvalue)) { > for (i = prevvalue; i <= ceilvalue; i++) > if (isLOWER(i)) > ANYOF_BITMAP_SET(ret, i); > } else { > for (i = prevvalue; i <= ceilvalue; i++) > if (isUPPER(i)) > ANYOF_BITMAP_SET(ret, i); > } > } > else > #endif > > > which I assume is making [i-j] in a regexp leave a gap, but [\x89-\x91] not. > I don't know where ranges in tr/// are parsed, but given that I grepped > for EBCDIC and didn't find any analogous code, it looks like tr/\x89-\x91// > is treated as tr/i-j// and in turn i-j is treated as letters and always > "special cased" S_scan_const() in toke.c seems to expand ranges in tr///, while S_regclass() in regcomp.c (what I assume you mean) copes with those in []. ++++++++ from toke.c, line 1419 #ifdef EBCDIC if ((isLOWER(min) && isLOWER(max)) || (isUPPER(min) && isUPPER(max))) { if (isLOWER(min)) { for (i = min; i <= max; i++) if (isLOWER(i)) *d++ = NATIVE_TO_NEED(has_utf8,i); } else { for (i = min; i <= max; i++) if (isUPPER(i)) *d++ = NATIVE_TO_NEED(has_utf8,i); } } else #endif The former doesn't have thing like literal_endpoint in the latter; thus tr/// seem not to tell literals from metacharacters in ranges and tr/\x89-\x91/X/ will not replace \x8e in EBCDIC. Hmm, it may be a possible inconsistency in the case of EBCDIC. Sastry, would you please do the following codelet on your EBCDIC? ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ s/[\x89-\x91]/X/g; is($a, "XXXXXXXX"); Does that work similarly to yours? ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/; is($a, "XXXXXXXX"); Regards, SADAHIRO Tomoyuki