On 19/08/2003 07:24, Mark Davis wrote:
B. Dagesh
2) There is something strange in the combinations of Shin with Dagesh and
dots: for all other letters, the form without Dagesh sorts before the form
with Dagesh. But Shin with Sin/Shin dot sort after their corresponding
combinations with Dagesh. I cannot imagine a justification for that.
We have currently in UCA the following (from UCA 4.0.0d1 (beta)) 05B0 ; [.0000.00B2.0002.05B0] # HEBREW POINT SHEVA 05B1 ; [.0000.00B3.0002.05B1] # HEBREW POINT HATAF SEGOL 05B2 ; [.0000.00B4.0002.05B2] # HEBREW POINT HATAF PATAH 05B3 ; [.0000.00B5.0002.05B3] # HEBREW POINT HATAF QAMATS 05B4 ; [.0000.00B6.0002.05B4] # HEBREW POINT HIRIQ 05B5 ; [.0000.00B7.0002.05B5] # HEBREW POINT TSERE 05B6 ; [.0000.00B8.0002.05B6] # HEBREW POINT SEGOL 05B7 ; [.0000.00B9.0002.05B7] # HEBREW POINT PATAH 05B8 ; [.0000.00BA.0002.05B8] # HEBREW POINT QAMATS 05B9 ; [.0000.00BB.0002.05B9] # HEBREW POINT HOLAM 05BB ; [.0000.00BC.0002.05BB] # HEBREW POINT QUBUTS 05BC ; [.0000.00BD.0002.05BC] # HEBREW POINT DAGESH OR MAPIQ 05BF ; [.0000.00C0.0002.05BF] # HEBREW POINT RAFE 05C1 ; [.0000.00C1.0002.05C1] # HEBREW POINT SHIN DOT 05C2 ; [.0000.00C2.0002.05C2] # HEBREW POINT SIN DOT FB1E ; [.0000.00C3.0002.FB1E] # HEBREW POINT JUDEO-SPANISH VARIKA
To make this change, we would move Dagesh to after SIN DOT. Question: should it also go after VARIKA or not?
Mark __________________________________ http://www.macchiato.com ► “Eppur si muove” ◄
Please, don't rush any changes to the UCA here. We need a proper review of what is required for biblical as well as modern Hebrew (hopefully the same but possibly not), not just a quick conclusion that we fix things by reordering dagesh.
A lot of the problem with dagesh etc comes from the highly inappropriate canonical combining classes for U+05B0 to U+05C4. I was told not long ago that the ordering of these didn't matter, only the distinctions do, but the ordering sure does matter when it comes to collation. Shin with dagesh and patah is logically <shin, shin dot, dagesh, patah> and should probably be collated on the basis of that ordering, i.e. sort first by the sin/shin dot, then by whether there is dagesh or not, then by the vowel. But the canonically ordered NFD which is the input to collation is <shin, patah, dagesh, shin dot>. So somehow the collation algorithm has to be asked to undo the damage which normalisation did and collate these things in the right order.
And please don't discuss Hebrew here in isolation from the discussion of the same subject on the Hebrew list - at least the discussion which I was raising there on the understanding that matters of Hebrew were supposed to be discussed there.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

