Re: ICU's uconv vs Linux iconv and UTF-8

Dan Kogai Fri, 01 Feb 2002 07:18:44 -0800

On 2002.02.01, at 23:57, Mark Leisher wrote:
>     Dan> FYI I have reported this brain-dead mapping problem to Unicode
>     Dan> Consortium but never got an answer.  Well, they are not public
>     Dan> society in a way they charge for the membership to say 
> anything.  One
>     Dan> of the reasons so many Japanese love to hate Unicode...
>
> This kind of false information is why many Japanese continue to love to 
> hate
> Unicode.  If you were actually on the Unicode mailing list, you 
> wouldn't be
> repeating garbage like this.
>
> Sign up and send a message about the mapping tables.  You will get an 
> answer.


   I have signed up to [EMAIL PROTECTED] a long ago and I thought I did 
since I am still getting invitation to conferences and such.  But I 
checked [EMAIL PROTECTED] and it did subscribe my address again instead 
of getting an error message saying I have already subscribed.  Hmm....  
Anyway,  I have resubscribed so here I go....
   Okay.  Here is. let me begin with the original message.  Sorry for 
repetition, folks in [EMAIL PROTECTED]

> On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
>> As part of the mystery of CJK encodings I notice that IBM's ICU's uconv
>> and SuSE6.4 linux iconv differ as to the UTF-8 representation if 
>> table.euc
>>
>> Both converters will round-trip with themselves and give byte exact
>> copy of table.euc
>>
>> Weirdly they differ in how they map '\' and '~' in ASCII space as
>> well as some spots in higher characters.
>
>   Oh, yes.  This is the problem of the original Unicode 2.x map;  It is 
> not ASCII preservative.  I have posted this problem to perl-
> [EMAIL PROTECTED] when I first released Jcode.  Several discussions 
> later, I made Jcode so that it preserves ASCII by default and added 
> $Jcode::Unicode::PEDANTIC to change the behavior
>   Here is the exerpt from Jcode::Unicode
>
> VARIABLES
>        $Jcode::Unicode::PEDANTIC
>            When set to non-zero, x-to-unicode conversion becomes
>            pedantic.  That is, '\' (chr(0x5c)) is converted to
>            zenkaku backslash and '~" (chr(0x7e)) to JIS-x0212
>            tilde.
>
>            By Default, Jcode::Unicode leaves ascii ([0x00-0x7f])
>            as it is.
>
>> Linux iconv will not take ICU's UTF-8.
>> ICU's uconv will read the iconv output but does produce same as 
>> original
>> table.euc.
>
>   So far as I see Linux iconv is ascii-preservative while ICS's is 
> Unicode-strict.
>   From Perl's point of view ASCII preservative should be default.
>   FYI I have reported this brain-dead mapping problem to Unicode 
> Consortium but never got an answer.  Well, they are not public society 
> in a way they charge for the membership to say anything.   One of the 
> reasons so many Japanese love to hate Unicode...
>
>> Our current euc-jp.ucm is compatible with Linux iconv.
>
>   Right choice.
>
> Dan the Man with So Many Charsets to Deal With

   Now let me repeat the same question I have asked a long ago.  Why is 
the Unicode - JISX2xxx map remains so that it does not preserve ASCII 
part?  Despite the fact most converters ignores the original map and 
leaves ASCII part as is?
   One more question.  Where has the contents in 
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ gone?

_____  Dan Kogai
   __/ ____   CEO, DAN co. ltd.
  /__ /-+-/  2-8-14-418 Shiomi Koto-ku Tokyo 135-0052 Japan
    /--/--- mailto: [EMAIL PROTECTED] / http://www.dan.co.jp/ ---------
__/  /    Tel:+81 3-5665-6131   Fax:+81 3-5665-6132
          GPG Key: http://www.dan.co.jp/~dankogai/dankogai.gpg.asc

Re: ICU's uconv vs Linux iconv and UTF-8

Reply via email to