From: "Peter Kirk" <[EMAIL PROTECTED]> > I see the point, but I would think there was something seriously wrong > with a database setup which could change its ordering algorithm without > somehow declaring all existing indexes invalid.
Why would such a SQL engine do so, if what has changed is an external utility library (for example provided by the OS), when the designer had assumed (after reading the Unicode standard), that NF normalizations were assumed to be stable across all backward and forward version of Unicode, and that a previously tested full compliance could be reached by using this external implementation instead of reimplementing it internally the engine itself? Such a change of policy in Unicode would mean a reduced interoperability of existing compliant systems. What is worse, is that distributed systems exchanging data normalized through a common service at one time could experiment later incoherence in the normalization. When I was speaking about full-text search capabilities for example, I meant that the main role of combining classes is not to create grapheme clusters, but to allow handling all canonically equivalent sequences using binary compares instead of requiring constant renormalization to compare all canonically equivalent strings occurences. As all Unicode algorithms are defined to handle canonically equivalent strings the same way so that they will return the same binary results from the same source, modifying the canonical equivalences by merging existing combining classes would in fact affect all standard (or proposed standard) Unicode algorithms, including the most complex ones like collation and text break scanners. We have no choice: - either modifying bogous combining classes and breaking the stability pact for backwards compatibility of normalized strings (at least those containing only characters of the common assigned Unicode subset), - or duplicate existing characters with newer codepoints with modified properties and deprecating (not forbidding) the old ones. - or include in the standard a way to override the combining class order (with CGJ or a new specific and documented CCO control) if it is impossible to deprecate existing characters. I will approve the W3C requirement that really needs that normalized strings in any version of Unicode stay normalized in ALL its versions. For any reason, even if this order is illogical and does not work well with all languistic usages; if it ever causes a problem in a particular language, one has to propose, standardize and use come other character to solve it, but not alter existing ones. After all, that's what has been done since long in Unicode: not all characters are unified, or given a canonical equivalence, even if those characters always use the same glyph. Look for example the Greek characters borrowed in the Latin script or in the Mathematical block: they were kept separate, not unified and not canonically equivalent to preserve the semantic of text using them. Why this "incoherent" status for individual characters would not apply also to combining sequences when there are legitimate reason to deunify them?

