From: "Peter Kirk" <[EMAIL PROTECTED]>I know there was quite a lot of discussion of collation of Hebrew in August, confused partly because it was spread over three lists (unicode, bidi and hebrew). I don't think we found a good solution then except to define as contractions each of several hundred possible combinations following a shin.
On 27/10/2003 12:28, Mark Davis wrote:
Mark, can you outline what these mechanisms are or point me to aCollation is very different, and already has mechanisms for dealing with sequences. So no CGJ is needed there (except for case 2).
Mark
definition e.g. in a section of UTR #10? As I had understood it, the
only way to deal with sequences of the sort I have in mind is to list
each possible individually as a contraction. The Logical_Order_Exception
property (see http://www.unicode.org/reports/tr10/ section 3.1.3) just
might be useful, but doesn't seem to have the necessary flexibility as
it causes a character to be swapped with ANY following character, not
just with any of a restricted list of such characters. The backwards
marking used for French accents (section 3.1.2) seems to apply over too
long a string.
The backwards marking is not restricted to French accents in collation level 2. You can use reverse ordering at any tailored level to fit other needs, and you can also insert an extra collation level.
So I think that Mark is right here as it gives you full control on the length of the collating sequence at each level of the collation keys. The case 2 is effectively an exception.
The bad thing is that the current default UCA ordering table does not create such collation keys with intermediate levels for Hebrew vowels, and you need tailoring to create a base level with consonnants, one level with vowels, a third level for sin/shin dots, a fourth for meteg, a fifth for accents... unless the text is encoded in logical order using the CCO-convention.
Philippe.
I wonder if it might work (either in DUCET or in a tailored collation) to make the Hebrew vowel distinctions a third level sort, with the consonant modifiers dagesh, rafe and sin and shin dot at the second level, and accents at the fourth level. Contractions could then be made for dagesh, rafe and sin/shin dot so that the latter, which follows in the canonical order, will be collated as if coming first; and there are not many combinations, although we do have to allow for intervening meteg, which has fourth level significance.
Thus we might need something like the following data, with some of the values chosen arbitrarily (i.e. for what was least editing from my source!):
05B0 ; [.0000.0000.00B2.05B0] # HEBREW POINT SHEVA
05B1 ; [.0000.0000.00B3.05B1] # HEBREW POINT HATAF SEGOL
05B2 ; [.0000.0000.00B4.05B2] # HEBREW POINT HATAF PATAH
05B3 ; [.0000.0000.00B5.05B3] # HEBREW POINT HATAF QAMATS
05B4 ; [.0000.0000.00B6.05B4] # HEBREW POINT HIRIQ
05B5 ; [.0000.0000.00B7.05B5] # HEBREW POINT TSERE
05B6 ; [.0000.0000.00B8.05B6] # HEBREW POINT SEGOL
05B7 ; [.0000.0000.00B9.05B7] # HEBREW POINT PATAH
05B8 ; [.0000.0000.00BA.05B8] # HEBREW POINT QAMATS
05B9 ; [.0000.0000.00BB.05B9] # HEBREW POINT HOLAM
05BB ; [.0000.0000.00BC.05BB] # HEBREW POINT QUBUTS
05BC ; [.0000.00BD.0002.05BC] # HEBREW POINT DAGESH OR MAPIQ
05BC 05C1 ; [.0000.00C1.0002.05C1] [.0000.00BD.0002.05BC] # dagesh and shin dot
05BC 05C2 ; [.0000.00C2.0002.05C2] [.0000.00BD.0002.05BC] # dagesh and sin dot
05BC 05BD 05C1 ; [.0000.00C1.0002.05C1] [.0000.00BD.0002.05BC] [.0000.0000.0000.05BD] # dagesh, meteg and shin dot
05BC 05BD 05C2 ; [.0000.00C2.0002.05C2] [.0000.00BD.0002.05BC] [.0000.0000.0000.05BD] # dagesh, meteg and sin dot
05BF ; [.0000.00C0.0002.05BF] # HEBREW POINT RAFE
05BF 05C1 ; [.0000.00C1.0002.05C1] [.0000.00C0.0002.05BF] # rafe and shin dot
05BF 05C2 ; [.0000.00C2.0002.05C2] [.0000.00C0.0002.05BF] # rafe and sin dot
05C1 ; [.0000.00C1.0002.05C1] # HEBREW POINT SHIN DOT
05C2 ; [.0000.00C2.0002.05C2] # HEBREW POINT SIN DOT
plus in principle some extra contractions with both dagesh and rafe.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

