The decompositions are not needed for plain text searches, that can use the collation data (with the collation data, you can unify at the primary level differences such as capitalisation and ignore diacritics, or transform some base groups of letters into a single entry, or make some significant primary difference when there are diacritics (for example in German equating 'ae' and 'ä' at the primary level).
Yes, collation must use the canonical decompositions, but does not need to follow the compatibility decompositions for all locales (even if this is done for the root locale and the DUCET... with some exceptions considering the rules for the most important language using an encoded letter and all its *canonical* equivalents). Compatibility decompositions in the UCD have little use, they should be preserved in encoded texts and transformations of text, they are just suggestions which *may* be useful: - for rendering text (the most important use is in character mappings within fonts, or in fallback mappings implemented in the rendering engine), - or for mappings to legacy encodings (e.g. when converting to GSM for SMS services, or converting for display in text-only devices and terminals using a limited OEM charset) 2015-02-19 12:59 GMT+01:00 Eli Zaretskii <[email protected]>: > > Date: Thu, 19 Feb 2015 11:47:24 GMT > > From: Julian Bradfield <[email protected]> > > > > In Arabic, the variant of a letter is determined entirely by its > > position, so there is no compelling need to represent the forms > separately > > (as characters rather than glyphs) save for the existence of legacy > > standards (and if there is, you can use the ZWJ/ZWNJ hacks). Thus the > > forms would not have been encoded but for the legacy standards. > > Whereas in Hebrew, non-final forms appear finally in certain contexts > > in normal text; and in Greek, while Greek text may have a determinate > > choice between σ and ς, there are many contexts where the two symbols > > are distinguished (not least maths). > > Got it, thanks. > _______________________________________________ > Unicode mailing list > [email protected] > http://unicode.org/mailman/listinfo/unicode >
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

