"Philippe Verdy" <[EMAIL PROTECTED]> writes: > It's hard to create a general model that will work for all scripts > encoded in Unicode. There are too many differences. So Unicode just > appears to standardize a higher level of processing with combining > sequences and normalization forms that are better approaching the > linguistic and semantic of the scripts. Consider this level as an > intermediate tool that will help simplify the identification of > processing units.
While rendering and user input may use evolving rules with complex specifications and implementations which depend on the environment and user's configuration (actually there is no other choice: this is inherently complicated for some scripts), string processing in a programming language should have a stable base with well-defined and easy to remember semantics which doesn't depend on too many settable preferences and version variations. The more complex rules a protocol demands (case-insensitive programming language identifiers, compared after normalization, after bidi processing, with soft hyphens removed etc.), the more tools will implement it incorrectly. Usually with subtle errors which don't manifest until someone tries to process an unusual name (e.g. documentation generation tool will produce hyperlinks with dangling links, because a WWW server does not perform sufficient transformations of addresses). -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/