Dan Kogai <[EMAIL PROTECTED]> writes: >On 2002.02.01, at 19:24, Nick Ing-Simmons wrote: >> As part of the mystery of CJK encodings I notice that IBM's ICU's uconv >> and SuSE6.4 linux iconv differ as to the UTF-8 representation if >> table.euc >> >> Both converters will round-trip with themselves and give byte exact >> copy of table.euc >> >> Weirdly they differ in how they map '\' and '~' in ASCII space as >> well as some spots in higher characters. > > Oh, yes. This is the problem of the original Unicode 2.x map; It is >not ASCII preservative. I have posted this problem to perl- >[EMAIL PROTECTED] when I first released Jcode. Several discussions >later, I made Jcode so that it preserves ASCII by default and added >$Jcode::Unicode::PEDANTIC to change the behavior
Ah. I take your point. If we used ICU's pedantic form Both UNIX ~/foo and MS C:\Foo get mangled. The other differences (having looked at diff in yudit) seems to be mapping ¢ (cent),£ (pound) ,¬ (not) and one of the longer dashes to different width variants (full width for ICU). I am going off ICU ... > So far as I see Linux iconv is ascii-preservative while ICS's is >Unicode-strict. > From Perl's point of view ASCII preservative should be default. > FYI I have reported this brain-dead mapping problem to Unicode >Consortium but never got an answer. Well, they are not public society >in a way they charge for the membership to say anything. One of the >reasons so many Japanese love to hate Unicode... > >> Our current euc-jp.ucm is compatible with Linux iconv. > > Right choice. > >Dan the Man with So Many Charsets to Deal With -- Nick Ing-Simmons http://www.ni-s.u-net.com/