On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote:

> > For the code points being tested
> > ("\x{0442}\x{0435}\x{0441}\x{0442}")
> > does the perl source file contain the correct byte
> > sequence in UTF-EBCDIC?
> Yes it does, since I ran the test, 
> if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq
> ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'}))
> print "ok\n";
> and the test ran fine, if that is what you mean by the
> source file containing the correct byte sequence. Or
> am I mistaken ?

You are mistaken, I'm afraid. bareword means no quotes.

In ASCII & UTF-8 land, the 1 liner

$ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a'

gives

3500


The 3 bytes in the source code between '{' and '}' are 224, 182 and 172
which are the UTF-8 encoding for the code point 3500.

My question is, what are the bytes in UTF-EBCDIC that encode code point 3500?
If you put those 3 bytes directly between the '{' and '}' characters in
the EBCDIC version of that 1 liner, does it also print 3500?

> > If so, *that* would explain the failures, and be the
> > thing that needs
> > correcting. The test file would need if/else with a
> > different test on EBCDIC.
> what would you suggest be put in the if/ else ?

I think that the regression tests tended to do something like

if (ord 'A' == 65) {
  # Do the ASCII/UTF-8 version
} else {
  # Assume EBCDIC
}

Nicholas Clark

Reply via email to