Re: Questions on Chinese collation, stroke

Mark Davis ☕ Thu, 07 Jun 2012 17:58:25 -0700

On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <[email protected]> wrote:


> Hi,
>
> I have two questions regarding the collation sequence defined in
> zh.xml, CLDR 21.0
>
> 1. Why is U+8303 (范)  counted as 9 strokes instead of 8 for <collation
> type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes but
> sorted before U+8303 (范).
>

CLDR now gets the stroke collation data from the kTotalStokes property. The
values for that are in the file Unihan/Unihan_DictionaryLikeData.txt in the
Unicode Character Database.

There you find the line:

U+8303 kTotalStrokes 8

If that is in error, or if there is any other error in the kTotalStrokes
data, then please report the correct value according to
http://www.unicode.org/review/pri230/ so that it can be fixed.
As a related matter, CLDR now gets the pinyin collation data from
the kMandarin property. The values for that are in the
file Unihan/Unihan_Readings.txt in the Unicode Character Database. So if
any of those are in error, they should also be reported as per
http://www.unicode.org/review/pri230/ .

The beta data is in ftp://www.unicode.org/Public/6.2.0/ucd/. Currently in
ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip
but as the beta proceeds, the d1 might change to d2,d3...


>
> 2. Does the collation type, stroke, apply to both Simplified and
> Traditional Chinese, as I do not see anything defined in zh_Hant.xml
> under "stroke"?
>

Let me look at that.


>
> Thanks,
> Matt
>
>
>

Re: Questions on Chinese collation, stroke

Reply via email to