For the record, I repeat that I am not convinced that the CGJ is an
appropriate solution for the problems associated with the right Meteg. I
tend to think we need a separate character.

Jony

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
> Sent: Saturday, October 25, 2003 1:12 PM
> To: Peter Kirk
> Cc: [EMAIL PROTECTED]
> Subject: Re: New contribution N2676
> 
> 
> From: "Peter Kirk" <[EMAIL PROTECTED]>
> > Have combining classes actually been defined for these characters?
> >
> > This is of course exactly the same problem as with Hebrew 
> vowel points 
> > and accents, except that this time it applies to real living 
> > languages. Perhaps it is time to do something about these combining 
> > classes which conflict with the standard.
> 
> Do you mean officially documenting the correct (and strict) 
> use of CGJ as the only way to bypass the default order 
> required by the combining classes in normalized forms? It 
> would be a good idea to document officially which use of CGJ 
> is superfluous and should be avoided in NF forms, and which 
> use is required.
> 
> 1) This will affect only the input methods for those 
> languages that need to "swap" the standard order of combining 
> characters to keep their logical order (all this will require 
> is a additional input control that will try swapping 
> ambiguous orders).
> 
> 2) A complete documentation may need to specify which pairs 
> of combining characters are affected (this should list the 
> pairs of combining characters <c1, c2> where CC(c1) > CC(c2) 
> and that require to be encoded <c1, CGJ, c2> to be kept in 
> logical order, as the sequence <c1, c2> will be reordered 
> into <c2, c1> in normalized forms.
> 
> 3) The other issue would be that there may exist other 
> combining characters than those in this pair. Suppose I want 
> to represent <base, c1, c2, c3>, where CC(c1) > CC(c2), but 
> c3 does not have a conflicting pair in the previous list. 
> Should it be encoded as <base, c1, CGJ, c2, c3> or as <base, 
> c1, c3, CGJ, c2>? As the standard normalization algorithm 
> cannot be changed, both sequences will be possible with the 
> NF forms, even though they represent the same character.
> 
> One could design an extra normalization step to force one 
> interpretation (so that only combining characters with 
> conflicting combining classes that have been forced "swapped" 
> will appear after CGJ, all other diacritics being encoded 
> preferably in the first sequence before the CGJ).
> 
> This extra step should not be part of the NF forms (because 
> Unicode states that normailzed forms will be kept normalized 
> in all further versions of Unicode), but this could be named 
> differently, by describing a system in which extra 
> normalization steps may be applied that may change NF forms 
> into other "equivalent" sequences also in normalized form.
> 
> 
> 
> 



Reply via email to