On Sat, Mar 02, 2002 at 11:12:42AM +0000, Nick Ing-Simmons wrote: > >This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use > >that functionality for Encode.pm? > The .ucm format can cope:
Thanks! I'm done with conversion and tested against libiconv. Patch follows; files are available at <http://autrijus.org/ucm.tar.gz>. Libiconv's GB18030 table elicited some warnings from compile: Unicode character 0xfdXX is illegal at ../compile line 81, <E> line 39659. The range is question is fdxx and ffxx. Is that anything to worry about? Also, the resulting file size is quite hefty: -rw-r--r-- 1 root 512 1688107 Mar 2 19:51 euc-tw.ucm -rw-r--r-- 1 root 512 1543333 Mar 2 19:51 gb18030.ucm And they add ~600k to the compressed perl distribution. Is that acceptable? The good news is there won't be anything else that big coming from the Chinese front; aside from HZ, perl's support could be considered complete. Thanks, /Autrijus/ diff -ur Encode/CN/Makefile.PL Encode.new/CN/Makefile.PL --- Encode/CN/Makefile.PL Sat Mar 2 11:45:11 2002 +++ Encode.new/CN/Makefile.PL Sat Mar 2 19:52:53 2002 @@ -6,6 +6,7 @@ GBK => ['gbk.enc'], GB2312 => ['gb2312.enc'], GB12345 => ['gb12345.enc'], + GB18030 => ['gb18030.ucm'], CP936 => ['cp936.enc'], 'ISO-IR-165' => ['iso-ir-165.enc'], ); --- Encode/Encode.pm Sat Mar 2 11:45:11 2002 +++ Encode.new/Encode.pm Sat Mar 2 20:10:56 2002 @@ -170,7 +170,7 @@ # TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8 # TODO: HP-UX '15' encodings japanese15 korean15 roi15 # TODO: Cyrillic encoding ISO-IR-111 (useful?) -# TODO: Chinese encodings GB18030 EUC-TW HZ +# TODO: Chinese encodings HZ # TODO: Armenian encoding ARMSCII-8 # TODO: Hebrew encoding ISO-8859-8-1 # TODO: Thai encoding TCVN diff -ur Encode/TW/Makefile.PL Encode.new/TW/Makefile.PL --- Encode/TW/Makefile.PL Sat Mar 2 11:45:11 2002 +++ Encode.new/TW/Makefile.PL Sat Mar 2 19:52:46 2002 @@ -5,6 +5,7 @@ my %tables = ('BIG5' => ['big5.enc'], 'BIG5-HKSCS' => ['big5-hkscs.enc'], 'CP950' => ['cp950.enc'], + 'EUC-TW' => ['euc-tw.ucm'], ); my $name = 'TW';
msg00705/pgp00000.pgp
Description: PGP signature