Re: Nicest UTF

Marcin 'Qrczak' Kowalczyk Sun, 12 Dec 2004 05:21:56 -0800

"Philippe Verdy" <[EMAIL PROTECTED]> writes:

> It's hard to create a general model that will work for all scripts
> encoded in Unicode. There are too many differences. So Unicode just
> appears to standardize a higher level of processing with combining
> sequences and normalization forms that are better approaching the
> linguistic and semantic of the scripts. Consider this level as an
> intermediate tool that will help simplify the identification of
> processing units.


While rendering and user input may use evolving rules with complex
specifications and implementations which depend on the environment
and user's configuration (actually there is no other choice: this
is inherently complicated for some scripts), string processing in
a programming language should have a stable base with well-defined
and easy to remember semantics which doesn't depend on too many
settable preferences and version variations.

The more complex rules a protocol demands (case-insensitive
programming language identifiers, compared after normalization,
after bidi processing, with soft hyphens removed etc.), the more
tools will implement it incorrectly. Usually with subtle errors
which don't manifest until someone tries to process an unusual name
(e.g. documentation generation tool will produce hyperlinks with
dangling links, because a WWW server does not perform sufficient
transformations of addresses).

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

Re: Nicest UTF

Reply via email to