Eli,

You're not missing anything. This is a bug in the documentation of
decomps.txt. Initially, added decompositions for the DUCET default
weights were all tagged as <sort>. This results in a distinct *tertiary*
weight in the initial collation weight values in DUCET. Later on,
there turned up cases where an added decomposition for the DUCET
input worked better *without* a distinct tertiary weight. In
particular, this applies to the large collection of combining marks
whose secondary weights are now collapsed into a smaller set of
distinct values. It also applies to the o with stroke character you
cite below. The documentation for decomps.txt just needs to be
updated to reflect that new pattern.

--Ken

On 2/21/2016 8:32 AM, Eli Zaretskii wrote:
   # 3. In some cases a new decomposition is added for a character which
   #    has no decomposition mapping in UnicodeData.txt. In this third case,
   #    a new decomposition tag "<sort>" is introduced, to distinguish these
   #    introduced decompositions from those derived from UnicodeData.txt.

However, I see in decomps.txt entries that seem to belong to neither
of the 3 classes described above.  Here are 2 notable examples:

   00F8;;006F 0338 # LATIN SMALL LETTER O WITH STROKE => LATIN SMALL LETTER O + 
COMBINING LONG SOLIDUS OVERLAY
   0142;;006C 0335 # LATIN SMALL LETTER L WITH STROKE => LATIN SMALL LETTER L + 
COMBINING SHORT STROKE OVERLAY

In both these cases, UnicodeData.txt defines no decomposition
properties, but the "<sort>" tag I expected to see is absent from
decomps.txt.  Is there something I'm missing here?


Reply via email to