Re: Encode test problems in EBCDIC

Dan Kogai Thu, 21 Feb 2002 19:57:01 -0800

jhi,

On 2002.02.22, at 07:54, Jarkko Hietaniemi wrote:
> Hi,
>
> the new JP test and the old Tcl test are doing "somewhat okay" in EBCDIC
> (I'm using an OS/390 mainframe).


   I wish I had an access to it...

> Failed Test                       Stat Wstat Total Fail  Failed  List 
> of Failed
> -------------------------------------------------------------------------------
> ...
> ../ext/Encode/t/JP.t               255 65280    22   16  72.73%  7-22
> ../ext/Encode/t/Tcl.t              137 35072   632   34   5.38%  
> 592-598 600
>                                                                  602 
> 604 606
>                                                                  608 
> 610 612-
>                                                                  632
>
> My problem is what to do about these failures.  Especially the Tcl.t
> is rather frustratingly close to success.  The JP.t might be a hard
> nut to crack.  Should I just skip the failing tests?  If so, we need
> to figure out what is the pattern of the failures (hardcording by test
> numbers would feel really evil...)?  We might entertain the idea of
> completely skipping these tests, but the relatively high success rate
> seems to be saying that fixing this instead of ignoring this might be
> possible.


   I am yet to grok your test to the fullest extent but this much I can't 
tell;  Don't let the high success rate foo you;  Remember 8bit part is 
much smaller compared to 16bit part.  If your tests attempts something 
like "feed an UTF-EBCDIC to a given encoding, decode it back and see if 
it matches the original", chances are MOST iso-8859-1 part is failing.  
But once again, I am yet to check in full detail.

>   Dan, in case EBCDIC scares you (and it should :-), a quick intro:
>   basically, consider the whole low 256 characters being rearranged from
>   what they are in ASCII.  For example, ord("A") is 0xC1, not 0x41. (The
>   pod/perlebcdic.pod has the full tables.)

   Sure it does scare me.  I have to confess UTF-EBCDIC was totally out 
of mind.  But here I got a hint;  Like what perl used to be, CJK 
encodings are very, very ASCII-chauvinistic;  Its variable-length 
encoding heavily relies on the fact that ascii leaves MSB of the byte 
alone.  That way you can tell if a given byte is a whole (half-width) 
character or half of full-width character.
   The shadow of ASCII casts even on ISO-2022, an escape-based encoding 
that is not supposed to be affected by MSB and such (Only \e was 
supposed to matter);  in ISO-2022, most 2-byte characters are 
represented by either 96x96 or 94x94 grid, which is (7bit ascii - 
control characters) or (that - space (0x20) and DEL (\x7F)).
   Obviously this will not work on EBCDIC....
   This one may be tougher than we think....
   FYI I know something called 12-bit EBCDIC kanji also exists.  I know 
only of existence but is that in our support list?

> The test logs are attached: I would really appreciate if you could see
> some pattern in the failures.

   I will do the best I can but I will be away for this weekend and I 
won't be back online till Sunday at least.

> --
> $jhi++; # http://www.iki.fi/jhi/
>         # There is this special biologist word we use for 'stable'.
>         # It is 'dead'. -- Jack Cohen

Dan the Unstable according to Jack Cohen

Re: Encode test problems in EBCDIC

Reply via email to