This is a known issues that exists since years(and that has been discussed heavily at that time). Unfortunately this CANNOT be changed and will never be changed. Canonical combiing classes are immutable and assigned for ever because they are part of character properties that MUST honor the stability, and notably here, any change would break the stability of all normalizations.
In cases where this could cause problems because two Hebrew combining marks with a non-zero combining class have a significant order, the only solution is to separate them with a combining grapheme joiner (CGJ) to preserver their relative order. For modern Hebrew this is not a problem, you don't need CGJ and nothing is needed: the existing canonical equivalence has no impact on the interpretation even if the relative order of combining marks proposed by the normalization is not the most logical. But this has no impact. All of these has been discussed. There will be no change. In fact the situation is ven more complex than what you think when you look at Biblic Hebrew where there a some words where the multiple diacritics can occur over the same base letter and interact graphically in a complex way and where the encoding order is highly significant. In fact the problematic cases are effectively the dots occuring in the middle (dagesh), the sin/shin dots. And the historic cantillation marks (which ideally should have been assined a 0 combining mark, but you can still emulate that behavior by prefixing a CGJ before these marks to make sure they won't be reordered). There are also other specific issues for Yiddish. Some complex issue being in the encoding for the Biblic name of Yerushalayim and the name of God and similar words with diphtongs represented by multiple diacritics in a significant order (because there are in fact some implied but unwritten base letters: the CGJ can be used where the implied letter is missing to solve the issue). 2014-08-05 23:05 GMT+02:00 Maxim Iorsh <[email protected]>: > Hello, > > I propose to change combining classes for certain Hebrew accents. > > Presently, the Hebrew accents belong to one of the following classes: 220 > (below), 222 (below right), 228 (above left), 230 (above). Accordingly, the > canonical ordering puts "below" accents before "below right" accents, for > example. > > Unfortunately, the resulting order is wrong. As Hebrew is a right-to-left > script, the accents which are located below the letter on the right should > go *before* accents which reside below the letter in the middle. The same > goes for accents above letters. > > My proposal is to modify the combining class property as follows: > > 059A HEBREW ACCENT YETIV: ccc=219 "Below_Right_RTL" > 05AD HEBREW ACCENT DEHI: ccc=219 "Below_Right_RTL" > 05AE HEBREW ACCENT ZINOR: ccc=231 "Above_Left_RTL" > > Alternatively, existing class 218 "Below_Left" could be assigned to 059A, > 05AD and possibly renamed to "Below_Char_Start" or something similar, so > that it means "left" for LTR scripts and "right" for RTL scripts. The class > 232 "Above_Right" could be assigned to 05AE and renamed accordingly. > > Thank you, > -- Maxim. > > P. S. In a related note, does anybody know why Hebrew marks (05B0-05C7) > are assigned fixed combining classes? It looks like most of them would be > perfectly ok with 220 "Below" class, or other appropriate non-fixed classes. > > _______________________________________________ > Unicode mailing list > [email protected] > http://unicode.org/mailman/listinfo/unicode > >
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

