Frank,

> You don't need to explain to me
> the concept of GB18030. The question I have is about details mapping
> information.

Now, now, there's no need to get snippy with me. It sounded
like you were unclear from the kinds of questions you were
asking.

> I look at
> http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml .
> 
> It is interesting that the mapping between U+10000 and U+10FFFF is check
> in only 5 weeks ago in the version 1.3
> 
>              |          30910:   <range uFirst="10000" uLast="10FFFF"
> bFirst="90 30 81 30" bLast="E3 32 9A 35"  bMin="81 30 81 30" bMax="FE 39
> FE 39"/>
> 

> Is the U+10000 - U+10FFFF mapping between Unicode and GB18030 specified
> in the GB18030 standard itself? can someone fax me that page ? Thanks.

Unfortunately, I don't have the revised and corrected version of
the standard to hand.

But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in
Chinese):

"From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond
to GB 13000's 16 supplementary planes..."

If you look at the ICU specification, bFirst="90 30 81 30" and
bLast="E3 32 9A 35" corresponds to:

83 "groups" (90..E2) of GB 18030:    83 x 10 x 1260 = 1045800 code points
 2 "planes" (E3 30..31) of GB 18030:       2 x 1260 =    2520 code points
25 "rows"   (E3 32 81..99) of GB 18030:        25 x 10 =  250 code points
 6 "cells"  (E3 32 9A 30..35) of GB 18030:                  6 code points
                                             Total    1048576 code points

And 1048576 code points = 16 x 66536 code points = 16 planes of 10646.

So GB 18030 and ICU agree. Start at 0x90308130 and lay out all the
rest of the Unicode supplementary code points in order.

--Ken

Reply via email to