On 27/10/2003 16:39, Philippe Verdy wrote:

From: "Peter Kirk" <[EMAIL PROTECTED]>



On 27/10/2003 12:28, Mark Davis wrote:



Collation is very different, and already has mechanisms for dealing with
sequences. So no CGJ is needed there (except for case 2).

Mark





Mark, can you outline what these mechanisms are or point me to a
definition e.g. in a section of UTR #10? As I had understood it, the
only way to deal with sequences of the sort I have in mind is to list
each possible individually as a contraction. The Logical_Order_Exception
property (see http://www.unicode.org/reports/tr10/ section 3.1.3) just
might be useful, but doesn't seem to have the necessary flexibility as
it causes a character to be swapped with ANY following character, not
just with any of a restricted list of such characters. The backwards
marking used for French accents (section 3.1.2) seems to apply over too
long a string.



The backwards marking is not restricted to French accents in collation level 2. You can use reverse ordering at any tailored level to fit other needs, and you can also insert an extra collation level.

So I think that Mark is right here as it gives you full control on the
length
of the collating sequence at each level of the collation keys. The case 2
is effectively an exception.

The bad thing is that the current default UCA ordering table does not create
such collation keys with intermediate levels for Hebrew vowels, and you
need tailoring to create a base level with consonnants, one level with
vowels, a third level for sin/shin dots, a fourth for meteg, a fifth for
accents...
unless the text is encoded in logical order using the CCO-convention.

Philippe.


I know there was quite a lot of discussion of collation of Hebrew in August, confused partly because it was spread over three lists (unicode, bidi and hebrew). I don't think we found a good solution then except to define as contractions each of several hundred possible combinations following a shin.

I wonder if it might work (either in DUCET or in a tailored collation) to make the Hebrew vowel distinctions a third level sort, with the consonant modifiers dagesh, rafe and sin and shin dot at the second level, and accents at the fourth level. Contractions could then be made for dagesh, rafe and sin/shin dot so that the latter, which follows in the canonical order, will be collated as if coming first; and there are not many combinations, although we do have to allow for intervening meteg, which has fourth level significance.

Thus we might need something like the following data, with some of the values chosen arbitrarily (i.e. for what was least editing from my source!):

05B0 ; [.0000.0000.00B2.05B0] # HEBREW POINT SHEVA
05B1 ; [.0000.0000.00B3.05B1] # HEBREW POINT HATAF SEGOL
05B2 ; [.0000.0000.00B4.05B2] # HEBREW POINT HATAF PATAH
05B3 ; [.0000.0000.00B5.05B3] # HEBREW POINT HATAF QAMATS
05B4 ; [.0000.0000.00B6.05B4] # HEBREW POINT HIRIQ
05B5 ; [.0000.0000.00B7.05B5] # HEBREW POINT TSERE
05B6 ; [.0000.0000.00B8.05B6] # HEBREW POINT SEGOL
05B7 ; [.0000.0000.00B9.05B7] # HEBREW POINT PATAH
05B8 ; [.0000.0000.00BA.05B8] # HEBREW POINT QAMATS
05B9 ; [.0000.0000.00BB.05B9] # HEBREW POINT HOLAM
05BB ; [.0000.0000.00BC.05BB] # HEBREW POINT QUBUTS
05BC ; [.0000.00BD.0002.05BC] # HEBREW POINT DAGESH OR MAPIQ
05BC 05C1 ; [.0000.00C1.0002.05C1] [.0000.00BD.0002.05BC] # dagesh and shin dot
05BC 05C2 ; [.0000.00C2.0002.05C2] [.0000.00BD.0002.05BC] # dagesh and sin dot
05BC 05BD 05C1 ; [.0000.00C1.0002.05C1] [.0000.00BD.0002.05BC] [.0000.0000.0000.05BD] # dagesh, meteg and shin dot
05BC 05BD 05C2 ; [.0000.00C2.0002.05C2] [.0000.00BD.0002.05BC] [.0000.0000.0000.05BD] # dagesh, meteg and sin dot
05BF ; [.0000.00C0.0002.05BF] # HEBREW POINT RAFE
05BF 05C1 ; [.0000.00C1.0002.05C1] [.0000.00C0.0002.05BF] # rafe and shin dot
05BF 05C2 ; [.0000.00C2.0002.05C2] [.0000.00C0.0002.05BF] # rafe and sin dot
05C1 ; [.0000.00C1.0002.05C1] # HEBREW POINT SHIN DOT
05C2 ; [.0000.00C2.0002.05C2] # HEBREW POINT SIN DOT


plus in principle some extra contractions with both dagesh and rafe.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to