Re: UCA tertiary weight assignment vs. decomposition type definition in Unicode character database

2012-01-27 Thread Ken Whistler

On 1/27/2012 1:16 PM, Matt Ma wrote:

Hi,

There are a few characters having no decomposition type defined in
UnicodeData.txt, but they were assigned tertiary weight in
allkeys.text as if the characters had decomposition type. Here are a
few examples (version 6.0.0),

...



U+A733, U+A732, U+1F1E6  were given tertiary weight as they were
compat, while U+31B4 as it werefinal.


Yep, that is all done deliberately, to make the default sorting a bit 
more consistent.

The normative decompositions in UnicodeData.txt are only the starting point
for attempting to give more consistent default weights for collation.



Is this something documented outside of UCA?


No, because it is only relevant *to* UCA. At least as far as documentation
written by the UTC is concerned.

Well, I suppose it is also relevant to CLDR, because CLDR bases its 
collation
tables on a tailoring of allkeys.txt from UCA. I don't know what 
documentation

there may or may not be about the default treatment for tertiary weights
in CLDR. Somebody involved in the details of CLDR collation will have
to answer that one.

--Ken





Re: UCA tertiary weight assignment vs. decomposition type definition in Unicode character database

2012-01-27 Thread Mark Davis ☕
CLDR doesn't modify anything but primaries in the root ordering. Particular
languages may modify any of the levels, but I don't think anything is
typically done except for primary and secondary (with the exception of
Japanese, which is quite complicated).

Mark
*— Il meglio è l’inimico del bene —*
*
*
*
[https://plus.google.com/114199149796022210033]
*



On Fri, Jan 27, 2012 at 13:51, Ken Whistler k...@sybase.com wrote:

 On 1/27/2012 1:16 PM, Matt Ma wrote:

 Hi,

 There are a few characters having no decomposition type defined in
 UnicodeData.txt, but they were assigned tertiary weight in
 allkeys.text as if the characters had decomposition type. Here are a
 few examples (version 6.0.0),

 ...


  U+A733, U+A732, U+1F1E6  were given tertiary weight as they were
 compat, while U+31B4 as it werefinal.


 Yep, that is all done deliberately, to make the default sorting a bit more
 consistent.
 The normative decompositions in UnicodeData.txt are only the starting point
 for attempting to give more consistent default weights for collation.



 Is this something documented outside of UCA?


 No, because it is only relevant *to* UCA. At least as far as documentation
 written by the UTC is concerned.

 Well, I suppose it is also relevant to CLDR, because CLDR bases its
 collation
 tables on a tailoring of allkeys.txt from UCA. I don't know what
 documentation
 there may or may not be about the default treatment for tertiary weights
 in CLDR. Somebody involved in the details of CLDR collation will have
 to answer that one.

 --Ken