On Thu, 14 Mar 2013 19:13:43 -0700 Markus Scherer <[email protected]> wrote:
> On Thu, Mar 14, 2013 at 4:09 PM, Richard Wordingham < > [email protected]> wrote: > > > On Thu, 14 Mar 2013 14:49:18 -0700 > > Markus Scherer <[email protected]> wrote: > While variableTop="u2FD5" ... > ... but a) this is a rarely used option and b) depends on the > implementation, and c) it makes no practical sense to make letters > ignorable. That doesn't stop the LDML specification having the example locales en-u-vt-0061 and en-u-vt-0061-0065. (I don't see what collating elements have variable weight in one but not the other.) > "Fractional weights" is nothing other than the "large weights" > mechanism applied to byte-based weights of all levels. The UCA is > already "fractional" for implicit primaries. Not quite. The characterisation of variable weights knows nothing of the concept, and that is the problem. One can envisage a *remapping* of DUCET such that all non-Han characters get 'large weights', reserving the one number primary weights for Han characters. Changing the range of variable weights parametrically (e.g. from up to symbols to up to punctuation) in that could be a nightmare. There appears to be an algebraic characterisation of what sequences of primaries could be treated as variable. Strings of primary weights of collating elements can be decomposed into substrings of primary weights of other collating elements and an order preserving change of the irreducible substrings will preserve the order of the collating elements. This is a consequence of how humans (or just Unicode man?) generate primary weights, and does not apply to collation elements in general. This decomposition is just a reflection of the sometimes non-standard compatibility decomposition used when devising weights. The irreducible substrings are the elements that can be treated as variable. However, the ability to decompose is fragile - it would not work if the Tamil script had been encoded as a syllabary but collated as consonants and vowels. > And pinning the variable-top value to the next following end of a > reordering group, and no higher than the end of the primary-weight > range for currency symbols. Is there a CLDR ticket for this change to the meaning of variableTop? I couldn't find one. Richard.

