Re: ICU's uconv vs Linux iconv and UTF-8

Nick Ing-Simmons Fri, 01 Feb 2002 10:06:20 -0800

Mark Davis <[EMAIL PROTECTED]> writes:
>>ICU's pedantic form
>
>The goal for ICU is to be charset neutral, and support all of the
>conversions that are in modern use. There are a large number of
>variants of character sets;



Fair enough - but as shipped (I downloaded it earlier this week)
it comes with a convrtrs.txt which maps MIME's EUC-JP onto 
something it calls ibm-33722 which has the behaviour I reported in at 
the start of this thread. 

>you can use the one you want. 

It is not a question of which _I_ want - it is a question of which one(s)
CJK perl users want/expect/need.

In so far a _I_ want any particular one it is the one which is going 
to match the X11 font encoding so I can in my naive westerner's way 
see what it looks like - and I have not a clue which one that is ...

>See:
>
>http://oss.software.ibm.com/icu/charset/index.html

I huge list and I don't see how to "grep" it for the provenance of 
the table (not that many seem to have any).

So can the experts - ideally native reading experts not theorists - tell 
me which ICU (or other open source) table(s) they want/expect/need,
or failing that which ones have proven troublesome.

There seem to be at least 4 EUC-JP mappings in that list 
AIX, Solaris, glibc and Java

If we cannot get any answers "quickly" then I think Dan is correct - 
we should un-bundle the whole CJK encoding stuff from the "core" into 
a family of CPAN modules.

Which gives me a design choice:

A. Bundle a "pragmatic" set of CJK which are fast and causes least build 
   pain for non CJK users (i.e. compact precompiled form)

B. Make it as easy as possible for end-user to drop in a new encoding
   from (say) a .ucm file.

I can obvioulsy try for both - but they seem to be pulling in opposite 
directions at present. 

Meanwhile I will go fix the bugs in the core's :encoding logic ...

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Re: ICU's uconv vs Linux iconv and UTF-8

Reply via email to