From: "Peter Kirk" <[EMAIL PROTECTED]> > I know there was quite a lot of discussion of collation of Hebrew in > August, confused partly because it was spread over three lists (unicode, > bidi and hebrew). I don't think we found a good solution then except to > define as contractions each of several hundred possible combinations > following a shin. > > I wonder if it might work (either in DUCET or in a tailored collation) > to make the Hebrew vowel distinctions a third level sort, with the > consonant modifiers dagesh, rafe and sin and shin dot at the second > level, and accents at the fourth level. Contractions could then be made > for dagesh, rafe and sin/shin dot so that the latter, which follows in > the canonical order, will be collated as if coming first; and there are > not many combinations, although we do have to allow for intervening > meteg, which has fourth level significance.
Exactly, this will work, provided that consonnant modifiers are grouped together and not separated by the normalization canonical order with vowels or vowel modifiers (accents). I don't know all the details of Hebrew points, so it would help if you could list exhaustively to which category these points belong: 1) consonnant modifiers: dagesh, rahe, sin dot, shin dot > 05BC ; [.0000.00BD.0002.05BC] # HEBREW POINT DAGESH OR MAPIQ > 05BF ; [.0000.00C0.0002.05BF] # HEBREW POINT RAFE > 05C1 ; [.0000.00C1.0002.05C1] # HEBREW POINT SHIN DOT > 05C2 ; [.0000.00C2.0002.05C2] # HEBREW POINT SIN DOT 2) vowels? 3) vowel modifiers/accents? > 05B0 ; [.0000.0000.00B2.05B0] # HEBREW POINT SHEVA > 05B1 ; [.0000.0000.00B3.05B1] # HEBREW POINT HATAF SEGOL > 05B2 ; [.0000.0000.00B4.05B2] # HEBREW POINT HATAF PATAH > 05B3 ; [.0000.0000.00B5.05B3] # HEBREW POINT HATAF QAMATS > 05B4 ; [.0000.0000.00B6.05B4] # HEBREW POINT HIRIQ > 05B5 ; [.0000.0000.00B7.05B5] # HEBREW POINT TSERE > 05B6 ; [.0000.0000.00B8.05B6] # HEBREW POINT SEGOL > 05B7 ; [.0000.0000.00B9.05B7] # HEBREW POINT PATAH > 05B8 ; [.0000.0000.00BA.05B8] # HEBREW POINT QAMATS > 05B9 ; [.0000.0000.00BB.05B9] # HEBREW POINT HOLAM > 05BB ; [.0000.0000.00BC.05BB] # HEBREW POINT QUBUTS Here you assign vowels and vowel modifiers the same collation level. Isn't it a problem?

