On 2015/02/20 05:17, Eli Zaretskii wrote:
From: Philippe Verdy <[email protected]>
Date: Thu, 19 Feb 2015 20:31:07 +0100
Cc: Julian Bradfield <[email protected]>,
unicode Unicode Discussion <[email protected]>
The decompositions are not needed for plain text searches, that can use the
collation data (with the collation data, you can unify at the primary level
differences such as capitalisation and ignore diacritics, or transform some
base groups of letters into a single entry, or make some significant primary
difference when there are diacritics (for example in German equating 'ae' and
'ä' at the primary level).
Sorry, I disagree. First, collation data is overkill for search,
since the order information is not required, so the weights are simply
wasting storage. Second, people do want to find, e.g., "²" when they
search for "2" etc. I'm not saying that they _always_ want that, but
sometimes they do. There's no reason a sophisticated text editor
shouldn't support such a feature, under user control.
Well, for cased scripts, search is usually case-insensitive, but case
conversions aren't given by compatibility decompositions.
If the question isn't "Why are there equivalences useful for search that
are not covered by compatibility decompositions?", but "Why doesn't
Unicode provide some data for final/non-final Hebrew letter
correspondence?", maybe the answer is that it hasn't been seen as a need
up to now because it's so easy to figure out.
Regards, Martin.
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode