We have published mapping data for Windows cp936 from the actual Windows 2000 
converter API. This is probably more up to date and complete than what is listed on 
the unicode.org site.
Of course, these tables also "only" show correspondences with Unicode, but
a) they also show unidirectional mappings, unlike the unicode.org tables
b) many modern systems (Windows, MacOS, Java, etc.) process all text always in 
Unicode, so what does not have a mapping to Unicode does not get processed (and you 
may not need to worry about it)

The GBK table from that is available in the Unicode TR 22 XML format at 
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/windows-936-2000.xml?content-type=text/plain
and in the ICU-specific .ucm format at 
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/ucm/windows-936-2000.ucm?content-type=text/plain

The main page for our repository is at http://oss.software.ibm.com/icu/charset/

As for your specific questions:

1. You can use the descriptions and properties of the equivalent Unicode characters 
according to the mapping. (Except for what maps to private-use code points.)

2. I don't know about actual tagging. The IANA list is at 
http://www.iana.org/assignments/character-sets
There is currently no registered name "GBK" in that list.

3. The Windows mappings show the Euro sign U+20AC at GBK 0x80. There is no mapping for 
the copyright sign U+00A9.
GBK 0xa2e3 is mapped to the private-use code point U+E76C.

Note that GBK is superseded by GB 18030. See the mapping table at 
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml?content-type=text/plain
There, U+20AC is mapped to GB 18030 0xa2e3.

Please check the above links for more questions about what is mapped where.

markus


Reply via email to