On Wed, 13 Mar 2013 21:07:06 +0000 "Whistler, Ken" <[email protected]> wrote:
> Richard Wordingham wrote: > > > One of the changes from Version 6.1.0 to 6.2.0 of the the UCA > > (UTS#10) was to changed weights from being 16 bits to just being > > general non-negative integers. Was this just to accommodate the > > 4th weight in DUCET (scheduled for deletion in Version 6.3.0), or > > is it intended to do away with the inconvenient concept of 'large > > weights'? > It has nothing to do with any putatively inconvenient concept of > large weights. 'Large weights' make it difficult (I don't say impossible) to check UCETs for well-formedness. > It loosened up the spec, so that the spec itself didn't seem to be > requiring that each of the first 3 levels had to be expressed with a > full 16 bits in any collation element table. I don't read it that way. But it did allow the 4th weight to go up to 10FFFF! (Last explicit weight in DUCET 6.2.0 is 2A600.) > As a matter of convenience in generation and display, the DUCET has > always been generated using a 4 digit hex notation for the first 3 > levels. So each could be conceived as a 16-bit number, as the > original description of collation elements implied. > > But in practice (and by design), the range of secondary and tertiary > weights were constrained. You only need 9 bits to express the > secondary weights in the table and only 5 bits to express the > tertiary weights. DUCET and the CLDR root are not the only UCETs. I recall nothing that stops a tailoring needing more bits for the secondary and tertiary weights. > And no, nobody is "threatening" you or anybody else with "having to > accommodate 36 bit weights". But I can no longer turn round and say that a 36 bit weight is illegal. > It might make sense to include a note somewhere to indicate that some > aspects of the algorithm do implicitly assume that weights cannot > exceed 16-bit values without requiring other adjustments to the > algorithm. I'm listing them at the moment. > Section 6.2 Large Weight Values already addresses the > approach one would take if one needs to deal with more than 64K > primary weight values, in a way which does not break the rest of the > algorithm. You've just reminded me that 'escape hatch' is broken for secondary weights. It seems a shame to me that one can't parametrically tailor DUCET to give a rhyming dictionary sort. Richard.

