Nicholas Clark <[EMAIL PROTECTED]> wrote:
On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote:
> > For the code points being tested
> > ("\x{0442}\x{0435}\x{0441}\x{0442}")
> > does the perl source file contain the correct byte
> > sequence in UTF-EBCDIC?
> Yes it does, since I ran the test,
> if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq
> ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'}))
> print "ok\n";
> and the test ran fine, if that is what you mean by the
> source file containing the correct byte sequence. Or
> am I mistaken ?
You are mistaken, I'm afraid. bareword means no quotes.
In ASCII & UTF-8 land, the 1 liner
$ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a'
gives
3500
The 3 bytes in the source code between '{' and '}' are 224, 182 and 172
which are the UTF-8 encoding for the code point 3500.
My question is, what are the bytes in UTF-EBCDIC that encode code point 3500?The equivalent bytes on UTF-EBCDIC are 186, 84 and 83.
If you put those 3 bytes directly between the '{' and '}' characters in
the EBCDIC version of that 1 liner, does it also print 3500?
I am unable to put those three bytes in the 1-liner you mentioned above, since I am unable to print the chars corresponding to those bytes (www.kostis.net/charsets/ebc1047.htm) on the command line.
> > If so, *that* would explain the failures, and be the
> > thing that needs
> > correcting. The test file would need if/else with a
> > different test on EBCDIC.
> what would you suggest be put in the if/ else ?
I think that the regression tests tended to do something like
if (ord 'A' == 65) {
# Do the ASCII/UTF-8 version
} else {
# Assume EBCDIC
}Thanks,
Rajarshi.
Nicholas Clark
Start your day with Yahoo! - make it your home page