)on EBCDIC platform

SADAHIRO Tomoyuki Thu, 11 Aug 2005 11:32:07 -0700

On Wed, 10 Aug 2005 23:56:31 -0700 (PDT), rajarshi das <[EMAIL PROTECTED]> wrote


> Hi,
> This is Rajarshi expressing Sastry's viewpoints since he's on vacation. 
> 
> SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
> 
>> According to the above statement in perlebcdic.pod,
>> s/[\x89-\x91]/X/g must substitute \x8e with X.
>> But it doesn't concern whether tr/\x89-\x91/X/ would substitute \x8e
>> with X or not, since tr/// does not use brackets, [ ].
> 
>> Though I think ranges in [ ] and ranges in tr/// should coincide
>> and agree that tr/\x89-\x91/X/ should substitute \x8e with X,
>> that is just my opinion.
>> I don't know whether it is true and correct.
> Is there some way we can confirm if this is correct (and expected behaviour)
> since there isnt any explicit documentation for the tr operator ?

Since t/op/tr.t already has a test case (cf. Change 9038)
which Sastry previously pointed out its failing on EBCDIC Platform,
I assume that at least the then pumpking thought it to be correct.

>> By the way, when you say "If I specify [\x89-\x91]", does it
>> mean s/[\x89-\x91]/X/g or tr/\x89-\x91/X/ ? I'm confused.
> We mean tr/\x89-\x91/X/.
> 
> 
>> We are first informed by you that gapped characters are not
>> substituted with X by tr/\x89-\x91/X/.
>> And you said s/[\x89-\x91]/X/g substituted all the characters
>> including gapped characters with X, hadn't you? 
> 
> Yes.
>> If so, I assume your [\x89-\x91] which doesn't matching any of
>> the gapped characters to be tr/\x89-\x91/X/.
> That's correct. We mean tr/\x89-\x91/X/.
> 
> 
>> The following is a part of the current core tests from op/pat.t.
>> I believe they should be passed.
> Yes all the following tests pass. I think the following tests are in the 
> context of the 
> s/[]/X/ operator and hence pass. 
> 
> Thanks,
> 
> Rajarshi.

OK. To me, it is confirmed that s/[]/X/ is fine and tr/// has a problem.
Since I don't have any EBCDIC machine, I can't ensure the following
patch will really makes sense.

Regards,
SADAHIRO Tomoyuki

! t/op/tr.t, toke.t

diff -ur perl~/t/op/tr.t perl/t/op/tr.t
--- perl~/t/op/tr.t     Mon Aug 01 17:17:24 2005
+++ perl/t/op/tr.t      Thu Aug 11 23:41:22 2005
@@ -295,18 +295,15 @@
 # (i-j, r-s, I-J, R-S), [\x89-\x91] [\xc9-\xd1] has to match them,
 # from Karsten Sperling.
 
-# Not working in EBCDIC as of 12674.
 $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/;
 is($c, 8);
 is($a, "XXXXXXXX");
-   
-# Not working in EBCDIC as of 12674.
+
 $c = ($a = "\xc9\xca\xcb\xcc\xcd\xcf\xd0\xd1") =~ tr/\xc9-\xd1/X/;
 is($c, 8);
 is($a, "XXXXXXXX");
 
-
-SKIP: {   
+SKIP: {
     skip "not EBCDIC", 4 unless $Is_EBCDIC;
 
     $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/i-j/X/;
diff -ur perl~/toke.c perl/toke.c
--- perl~/toke.c        Mon Jul 18 04:31:02 2005
+++ perl/toke.c Thu Aug 11 22:55:18 2005
@@ -1368,6 +1368,9 @@
     I32  has_utf8 = FALSE;                     /* Output constant is UTF8 */
     I32  this_utf8 = UTF;                      /* The source string is assumed 
to be UTF8 */
     UV uv;
+#ifdef EBCDIC
+    UV literal_endpoint = 0;
+#endif
 
     const char *leaveit =      /* set of acceptably-backslashed characters */
        PL_lex_inpat
@@ -1417,8 +1420,9 @@
                 }
 
 #ifdef EBCDIC
-               if ((isLOWER(min) && isLOWER(max)) ||
-                   (isUPPER(min) && isUPPER(max))) {
+               if (literal_endpoint == 2 &&
+                   ((isLOWER(min) && isLOWER(max)) ||
+                    (isUPPER(min) && isUPPER(max)))) {
                    if (isLOWER(min)) {
                        for (i = min; i <= max; i++)
                            if (isLOWER(i))
@@ -1437,6 +1441,9 @@
                /* mark the range as done, and continue */
                dorange = FALSE;
                didrange = TRUE;
+#ifdef EBCDIC
+               literal_endpoint = 0;
+#endif
                continue;
            }
 
@@ -1455,6 +1462,9 @@
            }
            else {
                didrange = FALSE;
+#ifdef EBCDIC
+               literal_endpoint = 0;
+#endif
            }
        }
 
@@ -1788,6 +1798,10 @@
            s++;
            continue;
        } /* end if (backslash) */
+#ifdef EBCDIC
+       else
+           literal_endpoint++;
+#endif
 
     default_action:
        /* If we started with encoded form, or already know we want it
###END OF PATCH

[PATCH] Re: Transliteration operator(tr//)on EBCDIC platform

Reply via email to