From: "Peter Kirk" <[EMAIL PROTECTED]>This needs some clarification. Most modern Hebrew is written without any combining marks, or sometimes with just a few scattered ones for specific disambiguation. In such cases the combining classes of Hebrew marks are irrelevant because they never appear in combination. But sometimes, especially in texts for children and language learners, modern Hebrew is written with vowel points, dagesh and sin and shin dots, although not usually accents. In this case, not just in biblical Hebrew, the combining classes ARE a problem, because they imply a canonical order which is illogical as well as hard to render.
I can see that there might be some problems in the changeover phase. But
these are basically the same problems as are present anyway, and at
least putting them into a changeover phase means that they go away
gradually instead of being standardised for ever, or however long
Unicode is planned to survive for.
I had already thought about it. But this may cause more troubles in the future for handling languages (like modern Hebrew) in which those combining classes are not a problem, ...
... and where the ordering of combining characters isBut there is no bonus from the ordering of combining classes, but rather a detrimental effect. Full text searches are already seriously complicated because what is logically one character is split in the canonical order. The relative ordering of sin and shin dot with vowel points leads to a situation equivalent to the French sequence <c-cedilla, a> being represented canonically as <c, a, cedilla> - except that also a dagesh and a meteg may be inserted between the equivalents of c and cedilla. That is not exactly a bonus if you want to search for the consonant c-cedilla.
a real bonus that would be lost if combining classes are merged, notably for
full text searches where the number of order combinations to search could
explode, as the effective order in occurences could become unpredictable for
searches.
Yes, the effective order of occurrences could become unpredictable if characters were not entered in the recommended order, i.e. words were misspelled. But that is true in any language: simple searches will not find misspelled words.
Of course, if the combining class values were really bogous, a much simplerThis has already been suggested. The problem is the old one that this effectively deprecates all existing pointed Hebrew text, and implementations and fonts based on the current definitions.
way would be to deprecate some existing characters, allowing new
applications to use the new replacement characters, and slowly adapt the
existing documents with the replacement characters whose combining classes
would be more language-friendly.
...I see the point, but I would think there was something seriously wrong with a database setup which could change its ordering algorithm without somehow declaring all existing indexes invalid.
As for requirements that lists
are normalised and sorted, I would consider that a process that makes
assumptions, without checking, about data received from another process
under separate control is a process badly implemented and asking for
trouble.
Here the problem is that we will not always have to manage the case of
separate processes, but also the case of utility libraries: if this library
is upgraded separately, the application using it may start experimenting
problems. e.g. I am thinking about the implied sort order in SQL databases
for table indices: what would happen if the SQL server is stopped just the
time to upgrade a standard library implementing the normalization among many
other services, because another security bug such as a buffer overrun is
solved in another API? When restarting the SQL server with the new library
implementing the new normalization, nothing would happen, apparently, but
the sort order would no more be guaranteed, and stored sorted indices would
start being "corrupted", in a way that would invalidate binary searches
(meaning that some unique keys could become duplicated, or not found,
producing unpredictable results, critical if they are assumed for, say, user
authentication, or file existence).
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

