Autrijus Tang <[EMAIL PROTECTED]> writes: >> > - 'gb18030', used in glibc2.2, is a superset of gbk, which is a super >> > set of gb2312; we should use that instead of 'gbk' if we want gbk >> > support. > >This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use >that functionality for Encode.pm?
The .ucm format can cope: <code_set_name> "whatever" <mb_cur_min> 1 <mb_cur_max> 4 <subchar> \x3F # CHARMAP <U0000> \x00 |0 # <control> <U0001> \x01 |0 # <control> <U0002> \x02 |0 # <control> <U0003> \x03 |0 # <control> ..... <U2222> \x04\x05 |0 # two byte ....... <U4444> \x06\x07\x08\x09 |0 # fourbyte .... END CHARMAP -- Nick Ing-Simmons http://www.ni-s.u-net.com/