On 17/03/2004 11:30, Ernest Cline wrote:

...


Mixed Turkish and other European language documents that are without
language markup have the same problem, no matter where the burden
is placed.  Some I's will receive inappropriate glyphs when a casing rule
is applied.  The problem is just as pronounced with either method, and
the need to rewrite such documents to ensure proper casing is the same.

I will admit that my preferred solution has higher initial costs, but lower
long term costs that cause me to favor it.  In any case, changing to my
preferred solution now would not be worth the confusion that would be
caused.  If there ever is a successor to Unicode, then it would be worth
examining this idea, but such an event is at least twenty years away.



Your preferred solution has advantages only if the long term costs are real. But how often is it necessary to apply casing rules to existing documents? Quite rarely, I would think. Search engines might want to, I agree, but I would expect a basic search engine to fold dotted and dotless i on the basis that they cannot be distinguished reliably. On your solution the costs must be borne for all documents before moving to Unicode or its putative successor; with the solution chosen by Unicode, the costs need be borne only for the minority of documents which actually need casing rules applied to them.

We might hope that within twenty years almost all new documents will be marked with their language.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to