Hi Sadahiro

                Having incorporated the changes in the doop.c and op.c
I strangely get lots of failures and here are the test results. Seems
like the first approach itself fails on tr// and there will certainly
more failures when we run the entire test suite which uses these
functions.
               In the second approach, the change seems to be
affecting only tr// . Please let me know your suggestions for the
changes which I can apply in S_scan_const() and see if it works.

regards
Sastry


# Failed at t/op/tr.t line 110
#      got 'š\''
Wide character in print at ./test.pl line 48.
# expected '΋\''
# Failed at t/op/tr.t line 209
Wide character in print at ./test.pl line 48.
#      got '¯œD–㯜D–ã'
Wide character in print at ./test.pl line 48.
# expected '¯œ¯Û–㯜¯Û–ã'
# Failed at t/op/tr.t line 219
#      got 'CDÚCDÚ'
Wide character in print at ./test.pl line 48.
# expected 'C¯Û–ãC¯Û–ã'
# Failed at t/op/tr.t line 224
Wide character in print at ./test.pl line 48.
#      got 'ED–ãED–㌨Føã'
Wide character in print at ./test.pl line 48.
# expected 'E¯Û[E¯Û[Œ¨Føã'
# Failed at t/op/tr.t line 234
Wide character in print at ./test.pl line 48.
#      got '¯Û¯Û¯Û¯Û¯Û¯Û'
Wide character in print at ./test.pl line 48.
# expected '¯ÛD¯Û¯ÛD¯Û'
# Failed at t/op/tr.t line 283
Wide character in print at ./test.pl line 48.
#      got '¯œD–㯥E–ã'
Wide character in print at ./test.pl line 48.
# expected '¯œ¯œ–㯥¯Û–ã'
# Failed at t/op/tr.t line 350
#      got '§ÿ'
Wide character in print at ./test.pl line 48.
# expected 'ΰÎ"'
1..99
ok 1 - uc
ok 2 - lc
ok 3 - partial uc
ok 4 - EBCDIC discontinuity
ok 5 - tr cancels IOK and NOK
ok 6 - harmless if explicitly not updating
ok 7 - harmless if implicitly not updating
ok 8 -     no error
ok 9 - handles UTF8
ok 10
ok 11
ok 12
ok 13
ok 14
ok 15
ok 16
ok 17 - changing UTF8 chars in a UTF8 string, same length
ok 18
ok 19 -     more bytes
ok 20
not ok 21 - Putting UT8 chars into a non-UTF8 string
ok 22
ok 23 - Removing UTF8 chars from UTF8 string
ok 24
ok 25 - Counting UTF8 chars in UTF8 string
ok 26 -          non-UTF8 chars in UTF8 string
ok 27 -          UTF8 chars in non-UTFs string
ok 28 - tr/a-z-9//
ok 29 - hyphens, leading
ok 30 -    trailing
ok 31 -    both
ok 32
ok 33
ok 34
ok 35 - reversed range check
ok 36 - cannot update read-only var
ok 37 - explicit read-only count
ok 38 -     no error
ok 39 - implicit read-only count
ok 40 -     no error
ok 41 - LHS of non-updating tr
ok 42 - LHS bad on updating tr
ok 43 - byte2byte transliteration
ok 44
ok 45
ok 46
not ok 47 - byte2wide transliteration
ok 48 -    wide2byte
ok 49 -    wide2wide
not ok 50 - byte2wide & wide2byte
not ok 51 - all together now!
ok 52 - transliterate and count
ok 53
not ok 54 - translit w/complement
ok 55
ok 56 - translit w/deletion
ok 57
ok 58 - translit w/squeeze
ok 59
ok 60
ok 61
ok 62
ok 63 - UTF range
not ok 64
ok 65
ok 66
ok 67
ok 68
ok 69
ok 70
ok 71
ok 72
ok 73
ok 74
ok 75
ok 76
ok 77
ok 78
ok 79
ok 80
ok 81
ok 82
not ok 83
ok 84
ok 85
ok 86
ok 87
ok 88 - pp_trans needs to unshare shared hash keys
ok 89 -    no error
ok 90 - implicit count on constant
ok 91 -    no error
ok 92 - implicit count outside array bounds, index negative
ok 93 -     doesn't extend the array
ok 94 - implicit count outside array bounds, index positive
ok 95 -     doesn't extend the array
ok 96 - implicit count outside hash bounds
ok 97 -     doesn't extend the hash
ok 98 - non-modifying tr/// on a scalar ref
ok 99 -     doesn't stringify its argument




On 9/14/05, SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
> 
> On Wed, 14 Sep 2005 16:50:26 +0530, Sastry <[EMAIL PROTECTED]> wrote
> 
> > Hi Sadahiro
> >
> > On 9/12/05, SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
> > >
> > > I attribute the failure in tr/\x{12c}-\x{130}/\xc0-\xc4/; to
> > > such an ambiguity of \xc0-\xc4. In this expression the left part
> > > \x{12c}-\x{130} parsed before coerces \xc0-\xc4 into Unicode,
> > > and results in the failure.
> > So this is still a problem on EBCDIC! Is there a way to fix this?
> 
> > > #test case B # On ASCII platform, of course successful
> > > $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x{100}\x89-\x91/X/;
> > > is($c, 8);
> > > is($a, "XXXXXXXX");
> > This test fails on EBCDIC.  In S_scan_const(), there is a statement below.
> > /* Insert oct or hex escaped character.
> >                * There will always enough room in sv since such
> >                * escapes will be longer than any UTF-8 sequence
> >                * they can end up as. */
> >
> >               /* We need to map to chars to ASCII before doing the tests
> >                  to cover EBCDIC
> >               */
> >               if (!UNI_IS_INVARIANT(NATIVE_TO_UNI(uv))) {
> >                                          if (!has_utf8 && uv > 255) {
> >
> > on an ASCII , the first if condition is true as uv is 137  and it
> > falls in the variant range as uv >\x7F whereas on EBCDIC the if
> > condition is false. Can you explain why this behaviour is?
> 
> see "else" for this "if." This condition tests whether uv needs
> multiple octets in UTF-8/UTF-EBCDIC or only needs a single octet.
> "\x89" in Latin-1 corresponds to a double-octet representation
> in UTF-8, and true (that needs multiple octets) on ASCII platform.
> "\x89" in EBCDIC corresponds to a single-octet representation
> in UTF-EBCDIC, and false on EBCDIC platform.
> 
> Where "else" runs, there is no difference between ASCII and UTF-8;
> or between single-octet EBCDIC and UTF-EBCDIC.
> 
> > Also I found that the characters are expanded during runtime in
> > S_do_trans_simple_utf8()
> 
> If I understand it correctly, expansion of character ranges isn't
> performed in do_trans_simple_utf8(). It is performed in scan_const()
> for non-Unicode and pmtrans() for Unicode.
> 
> > Do you have any suggestion where the problem is?
> 
> (1) one way (I think worse)
> Perl should treat the range in the native order (not in Unicode one)
> through the parse time, the compile time, and the run time.
> 
> using uvchr_to_utf8() instead of uvuni_to_utf8(),
>      utf8n_to_uvchr() instead of utf8n_to_uvuni(),
> in op.c#pmtrans and doop.c#do_trans_simple_utf8 etc.
> 
> But swash_fetch() also needs change (the current swash does not
> know EBCDIC, only Unicode); changes of swash may lead to
> corruption of lc(), uc(), regular expression \p{something} etc.
> 
> (2) another way (I think better)
> No change of swash, pmtrans, do_trans_****.
> 
> Then all character ranges within 0..255 (not only for non-Unicode
> but also for Unicode) to be expanded in scan_const().
> (and pmtrans() will expand only uv >= 256).
> 
> I think this way requires only the change of toke.c#scan_const
> and influences only tr///.
> 
> But the change will be quite big, since the current scan_const()
> only expands non-Unicode and assumes a single octet encoding.
> The range 0..255 in UTF-8/UTF-EBCDIC includes double-octet characters.
> 
> I'm not sure whether such a change should be enclosed
> with #ifdef EBCDIC and #endif
> 
> Regards,
> SADAHIRO Tomoyuki
> 
> 
>

Reply via email to