Dan Kogai <[EMAIL PROTECTED]> writes: >> Dan, in case EBCDIC scares you (and it should :-), a quick intro: >> basically, consider the whole low 256 characters being rearranged from >> what they are in ASCII. For example, ord("A") is 0xC1, not 0x41. (The >> pod/perlebcdic.pod has the full tables.) > > Sure it does scare me. I have to confess UTF-EBCDIC was totally out >of mind. But here I got a hint; Like what perl used to be, CJK >encodings are very, very ASCII-chauvinistic; Its variable-length >encoding heavily relies on the fact that ascii leaves MSB of the byte >alone. That way you can tell if a given byte is a whole (half-width) >character or half of full-width character.
That is fine. When in the CJK codings they can stay ASCII_oid. The problem comes when we convert to perl's internal form. An ASCII 'A' in shift-JIS or whatever will still become 0xC1 in an EBCDIC perl because that is "defined" to be EBCDIC perl's view of U+0041. So if tests convert CJK into "internal" and then just do ord() they will fail for range 0..255. There are some XS functions to map native<->unicode numbers. > The shadow of ASCII casts even on ISO-2022, an escape-based encoding >that is not supposed to be affected by MSB and such (Only \e was >supposed to matter); in ISO-2022, most 2-byte characters are >represented by either 96x96 or 94x94 grid, which is (7bit ascii - >control characters) or (that - space (0x20) and DEL (\x7F)). > Obviously this will not work on EBCDIC.... Nor should it. > This one may be tougher than we think.... > FYI I know something called 12-bit EBCDIC kanji also exists. I know >only of existence but is that in our support list? If OS390 (or ICU given its history) has tables we can probably support them. > >> The test logs are attached: I would really appreciate if you could see >> some pattern in the failures. > > I will do the best I can but I will be away for this weekend and I >won't be back online till Sunday at least. > >> -- >> $jhi++; # http://www.iki.fi/jhi/ >> # There is this special biologist word we use for 'stable'. >> # It is 'dead'. -- Jack Cohen > >Dan the Unstable according to Jack Cohen -- Nick Ing-Simmons http://www.ni-s.u-net.com/