On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <[email protected]> wrote:
> Hi, > > I have two questions regarding the collation sequence defined in > zh.xml, CLDR 21.0 > > 1. Why is U+8303 (范) counted as 9 strokes instead of 8 for <collation > type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes but > sorted before U+8303 (范). > CLDR now gets the stroke collation data from the kTotalStokes property. The values for that are in the file Unihan/Unihan_DictionaryLikeData.txt in the Unicode Character Database. There you find the line: U+8303 kTotalStrokes 8 If that is in error, or if there is any other error in the kTotalStrokes data, then please report the correct value according to http://www.unicode.org/review/pri230/ so that it can be fixed. As a related matter, CLDR now gets the pinyin collation data from the kMandarin property. The values for that are in the file Unihan/Unihan_Readings.txt in the Unicode Character Database. So if any of those are in error, they should also be reported as per http://www.unicode.org/review/pri230/ . The beta data is in ftp://www.unicode.org/Public/6.2.0/ucd/. Currently in ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip but as the beta proceeds, the d1 might change to d2,d3... > > 2. Does the collation type, stroke, apply to both Simplified and > Traditional Chinese, as I do not see anything defined in zh_Hant.xml > under "stroke"? > Let me look at that. > > Thanks, > Matt > > >

