At 08:33 PM 7/9/2004, John Cowan wrote:
> I have just reviewed this list and found it odd that Hebrew presentation
> forms are included but Arabic ones are not.

The specification actually called only for Latin, Greek, and Cyrillic;
I added Hebrew pour la lagniappe.  If someone wants to add Arabic, I
encourage them to do so.

> the Hebrew presentation forms but also most of the precomposed
> characters are redundant in this list.

True; however, the current list indicates the scope of what actually
happens, even if it is overlong.

I have taken the file from the server today and massaged it to be in a form suitable for inclusion in the next draft of TR#30, which will be issued in time for the UTC to review it in August.


Once the review issue opens for this draft, please comment on the review form, so that the UTC has formal input to evaluate.

My understanding of the folding would be that it would be more agressive in diacritic folding than some languages, so that it is useful in cross language searching. For example, it should allow English users to search for words with accented characters in them by supplying the equivalent word spelled in base letters only.

'i' has a dot, but doesn't have a base letter that's more 'basic' than itself, since dotless-i, while theoretically there, is more specialized and not universally accessible from input devices.

o-slash, can be analyzed as o and slash, even though that's not done canonically in Unicode. Allowing users outside Scandinavia to perform fuzzy searches for words with this character is useful.

In this view of folding, Language-specific fuzzy searches would be tailored (usually by being based on collation information, rather than on generic diacritic folding).

A./





Reply via email to