On Mon, 8 Aug 2005 15:36:40 +0100, Nicholas Clark <[EMAIL PROTECTED]> wrote

> On Thu, Aug 04, 2005 at 11:42:54AM +0530, Sastry wrote:
> > Hi
> > 
> > I am trying to run this script on an EBCDIC platform using perl-5.8.6
> >  
> > ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/;
> > is($a, "XXXXXXXX");
> > 
> > 
> > The result I get is 
> > 
> >  'X«»ðý±°X'
> > 
> > a) Is this happening  since \x8a\x8b\x8c\x8d\x8f\x90 are the gapped
> > characters in EBCDIC ?
> 
> I think so. In that \x89 is 'i' and \x91 is 'j'.
> 
> 
> > b) Should all the bytes in $a change to X?
> 
> I don't know. It seems to be some special case code in regexec.c:
> 
> #ifdef EBCDIC
>               /* In EBCDIC [\x89-\x91] should include
>                * the \x8e but [i-j] should not. */
>               if (literal_endpoint == 2 &&
>                   ((isLOWER(prevvalue) && isLOWER(ceilvalue)) ||
>                    (isUPPER(prevvalue) && isUPPER(ceilvalue))))
>               {
>                   if (isLOWER(prevvalue)) {
>                       for (i = prevvalue; i <= ceilvalue; i++)
>                           if (isLOWER(i))
>                               ANYOF_BITMAP_SET(ret, i);
>                   } else {
>                       for (i = prevvalue; i <= ceilvalue; i++)
>                           if (isUPPER(i))
>                               ANYOF_BITMAP_SET(ret, i);
>                   }
>               }
>               else
> #endif
> 
> 
> which I assume is making [i-j] in a regexp leave a gap, but [\x89-\x91] not.
> I don't know where ranges in tr/// are parsed, but given that I grepped
> for EBCDIC and didn't find any analogous code, it looks like tr/\x89-\x91//
> is treated as tr/i-j// and in turn i-j is treated as letters and always
> "special cased"

S_scan_const() in toke.c seems to expand ranges in tr///,
while S_regclass() in regcomp.c (what I assume you mean) copes
with those in []. 

++++++++ from toke.c, line 1419
#ifdef EBCDIC
                if ((isLOWER(min) && isLOWER(max)) ||
                    (isUPPER(min) && isUPPER(max))) {
                    if (isLOWER(min)) {
                        for (i = min; i <= max; i++)
                            if (isLOWER(i))
                                *d++ = NATIVE_TO_NEED(has_utf8,i);
                    } else {
                        for (i = min; i <= max; i++)
                            if (isUPPER(i))
                                *d++ = NATIVE_TO_NEED(has_utf8,i);
                    }
                }
                else
#endif

The former doesn't have thing like literal_endpoint in the latter;
thus tr/// seem not to tell literals from metacharacters in ranges
and tr/\x89-\x91/X/ will not replace \x8e in EBCDIC.

Hmm, it may be a possible inconsistency in the case of EBCDIC.
Sastry, would you please do the following codelet on your EBCDIC?

($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ s/[\x89-\x91]/X/g;
 is($a, "XXXXXXXX");

Does that work similarly to yours?
($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/;
 is($a, "XXXXXXXX");

Regards,
SADAHIRO Tomoyuki


Reply via email to