Autrijus Tang <[EMAIL PROTECTED]> writes: >On Sat, Mar 02, 2002 at 11:12:42AM +0000, Nick Ing-Simmons wrote: >> >This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use >> >that functionality for Encode.pm? >> The .ucm format can cope: > >Thanks! I'm done with conversion and tested against libiconv. Patch follows; >files are available at <http://autrijus.org/ucm.tar.gz>. > >Libiconv's GB18030 table elicited some warnings from compile: > > Unicode character 0xfdXX is illegal at ../compile line 81, <E> line 39659.
There are some other warnings running compile without -Q e.g. the attached. It seems that some of these encoding are not round-trip safe. One reason for prefering .ucm is that by declaring one of multiple map chars a fallback one can get the "right" thing for e.g. <U00F3> is that 2B2E or 282E ? > >The range is question is fdxx and ffxx. Is that anything to worry about? > >Also, the resulting file size is quite hefty: > >-rw-r--r-- 1 root 512 1688107 Mar 2 19:51 euc-tw.ucm >-rw-r--r-- 1 root 512 1543333 Mar 2 19:51 gb18030.ucm > >And they add ~600k to the compressed perl distribution. Is that acceptable? > >The good news is there won't be anything else that big coming from the Chinese >front; aside from HZ, perl's support could be considered complete. Test case? -- Nick Ing-Simmons http://www.ni-s.u-net.com/
/home/perl5/perlio/perl -I../../lib compile -So /tmp/Encode/iso-ir-165.ucm Encode/iso-ir-165.enc D encoded iso-ir-165 U03B1 is 283B and 2641 UFF47 is 2840 and 2367 U1FB1 is 2B21 and 2821 U03AC is 2B22 and 2822 U1FB0 is 2B23 and 2823 U1F70 is 2B24 and 2824 U0113 is 2B25 and 2825 U00E9 is 2B26 and 2826 U011B is 2B27 and 2827 U00E8 is 2B28 and 2828 U012B is 2B29 and 2829 U00ED is 2B2A and 282A U01D0 is 2B2B and 282B U00EC is 2B2C and 282C U014D is 2B2D and 282D U00F3 is 2B2E and 282E U01D2 is 2B2F and 282F U00F2 is 2B30 and 2830 U016B is 2B31 and 2831 U00FA is 2B32 and 2832 U01D4 is 2B33 and 2833 U00F9 is 2B34 and 2834 U01D6 is 2B35 and 2835 U01D8 is 2B36 and 2836 U01DA is 2B37 and 2837 U01DC is 2B38 and 2838 U00FC is 2B39 and 2839 U00EA is 2B3A and 283A U03B1 is 2B3B and 2641 U1E3F is 2B3C and 283C U0144 is 2B3D and 283D U0148 is 2B3E and 283E U01F9 is 2B3F and 283F UFF47 is 2B40 and 2367 34 mapping conflicts