Kenneth Whistler wrote:

Frank,

> You don't need to explain to me
> the concept of GB18030. The question I have is about details mapping
> information.

Now, now, there's no need to get snippy with me. It sounded
like you were unclear from the kinds of questions you were
asking.

Sorry for that. I have any flame in my message.
> I look at
> http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml .
>
> It is interesting that the mapping between U+10000 and U+10FFFF is check
> in only 5 weeks ago in the version 1.3
>
>              |          30910:   <range uFirst="10000" uLast="10FFFF"
> bFirst="90 30 81 30" bLast="E3 32 9A 35"  bMin="81 30 81 30" bMax="FE 39
> FE 39"/>
>

> Is the U+10000 - U+10FFFF mapping between Unicode and GB18030 specified
> in the GB18030 standard itself? can someone fax me that page ? Thanks.

Unfortunately, I don't have the revised and corrected version of
the standard to hand.

Is that possible you can fax me the old original version ? My fax number is   +1 650 937 5413 . Thanks
 
But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in
Chinese):

"From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond
to GB 13000's 16 supplementary planes..."

Thank you very much. This is the information I need. It clearly define the mapping between GB18030 to Unicode supplement planes in the character level. Thanks.  With this information, we can implement the conversion between GB18030 to Unicode.
 
If you look at the ICU specification, bFirst="90 30 81 30" and
bLast="E3 32 9A 35" corresponds to:

83 "groups" (90..E2) of GB 18030:    83 x 10 x 1260 = 1045800 code points
 2 "planes" (E3 30..31) of GB 18030:       2 x 1260 =    2520 code points
25 "rows"   (E3 32 81..99) of GB 18030:        25 x 10 =  250 code points
 6 "cells"  (E3 32 9A 30..35) of GB 18030:                  6 code points
                                             Total    1048576 code points

And 1048576 code points = 16 x 66536 code points = 16 planes of 10646.

So GB 18030 and ICU agree. Start at 0x90308130 and lay out all the
rest of the Unicode supplementary code points in order.

--Ken

Reply via email to