Re: ICU's uconv vs Linux iconv and UTF-8

Nick Ing-Simmons Fri, 01 Feb 2002 07:21:25 -0800

Dan Kogai <[EMAIL PROTECTED]> writes:
>On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
>> As part of the mystery of CJK encodings I notice that IBM's ICU's uconv
>> and SuSE6.4 linux iconv differ as to the UTF-8 representation if
>> table.euc
>>
>> Both converters will round-trip with themselves and give byte exact
>> copy of table.euc
>>
>> Weirdly they differ in how they map '\' and '~' in ASCII space as
>> well as some spots in higher characters.
>
>   Oh, yes.  This is the problem of the original Unicode 2.x map;  It is
>not ASCII preservative.  I have posted this problem to perl-
>[EMAIL PROTECTED] when I first released Jcode.  Several discussions
>later, I made Jcode so that it preserves ASCII by default and added
>$Jcode::Unicode::PEDANTIC to change the behavior


Ah. I take your point. If we used ICU's pedantic form
Both UNIX ~/foo and MS C:\Foo get mangled.

The other differences (having looked at diff in yudit) seems to be
mapping ˘ (cent),Ł (pound) ,Ź (not) and one of the longer dashes to
different width variants (full width for ICU).

I am going off ICU ...


>   So far as I see Linux iconv is ascii-preservative while ICS's is
>Unicode-strict.
>   From Perl's point of view ASCII preservative should be default.
>   FYI I have reported this brain-dead mapping problem to Unicode
>Consortium but never got an answer.  Well, they are not public society
>in a way they charge for the membership to say anything.   One of the
>reasons so many Japanese love to hate Unicode...
>
>> Our current euc-jp.ucm is compatible with Linux iconv.
>
>   Right choice.
>
>Dan the Man with So Many Charsets to Deal With
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Re: ICU's uconv vs Linux iconv and UTF-8

Reply via email to