Dear Uni-encoders and -decoders,

Dirk Meyer from Adobe has put together an extensive summary of the chinese GB 18030 
encoding standard that was published on 2000-mar-17. Ken Lunde and I assisted Dirk 
with reviews and comments.

The summary is on the web site of Ken's famous CJKV book "with the fish":
ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf

To summarize the summary, we now have an english text describing the new encoding in 
its details. There are a few apparent errors, typos, and inconsistencies in the 
chinese standard text that need to be resolved.

For implementers, there is enough information in the summary to describe the encoding 
structure and to prepare an implementation.

What is still missing - aside from the resolution of the issues mentioned here - is a 
precise mapping table for how to map between at least the one-byte and two-byte 
portions of GB 18030 to and from Unicode.
In theory, it should be almost the same as GBK, but to be sure, we need precise, 
complete, and machine-readable mappings.
Given the one-byte and two-byte portions and the description in the standard and in 
the summary, the four-byte portion can be derived with a little bit of Perl or similar.

Anyone who needs to implement or know about GB 18030 should probably read this text.

Anyone who can contribute precise mapping tables and/or can help resolving the open 
issues please do so.


Best regards,

markus

Reply via email to