On 17/03/2004 07:12, Ernest Cline wrote:

Well, in the event that Unicode ever does add  DOTTED J to go with
DOTLESS J, I sincerely hope that it does not follow the example of
DOTTED I and DOTLESS I.  It would have been better in my opinion
to have encoded upper and lower case forms of both characters
separate from the ordinary I.  That would have placed language
specific burdens not on the casing algorithm of Unicode but on the
transfer of data from legacy character sets.  It's probably too late
to change this for the I, but hopefully this can be avoided for J if
a distinct dotted J  character is needed.



It was too late to change this one even before Unicode was dreamed up, in fact as soon as anyone started using legacy character sets to write Turkish and used the ordinary ASCII i and I for Turkish dotted i and dotless I respectively. Any documents in mixed Turkish and European languages, without explicit language markup, would be hopelessly messed up, and the burden which you wanted to put "on the transfer of data from legacy character sets" would have implied the need to rewrite all such documents.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to