Sorry, Philippe, I had meant a separate character for a "right Meteg", not a separate control character. Does this mean we agree?
Jony > -----Original Message----- > From: Philippe Verdy [mailto:[EMAIL PROTECTED] > Sent: Saturday, October 25, 2003 5:58 PM > To: Jony Rosenne > Cc: [EMAIL PROTECTED] > Subject: Re: CGJ - Combining Class Override > > > From: "Jony Rosenne" <[EMAIL PROTECTED]> > > > For the record, I repeat that I am not convinced that the CGJ is an > > appropriate solution for the problems associated with the > right Meteg. > > I tend to think we need a separate character. > > Yes, it's possible to devize another character explicitly to > override very precisely the ordering of combining classes. > But this still does not change the problem, as all the > existing NF* forms in existing documents using any past or > present version of Unicode MUST remain in NF* form with > further additions. > > If one votes for a separate control character, it should come > with precise rules describing how such override can/must be > used, so that we won't break existing implementations. This > character will necessary have a combining class 0, but will > still have a preceding context. Strict conformance for the > new NF* forms must still obey to the precise ordering rules, > and this character, whatever its form, shall not be used > everytime it is not needed, i.e. when the existing > NF* forms still produce the correct logical order (that's why > its use should then be restricted to a list of known > combining characters that may need this override). > > Call it <CCO> "Combining Class Override" ? This does not > change the problem: this character should be used only > between pairs of combining characters, such as the encoded sequence: > {c1, CCO, c2} > shall conform to the rules: > (1) CC(c1) > CC(c2) > 0, > (2) c1 is known (listed by Unicode?) to require this override > to keep the logical ordering needed for correct text semantics. > > The second requirement should be made to avoid abuses of this > character. But it is not enforceable if CGJ is kept for this function. > > The CCO character should then be made "ignorable" for > collation or text breaks, so that collation keys will become: > [ CK(c1), CK(c2) ] for {c1, CCO, c2} > [ CK(c2), CK(c1) ] for {c2, c1} and {c1, c2} if normalized > > Legacy applications will detect a separate combining sequence > starting at CCO, but newer applications will still know that > both sequences are describing a single grapheme cluster. > > This knowledge should not be necessary except in grapheme > renderers, or in some input methods that will allow users to > enter: > (1) keys <c2><c1> producing the normalized text {c2, c1} > as before; > (2) keys <c1><c2> producing the normalized text {c1, CCO, c2} > instead of {c2, c1} as before; > (3) optionally support a keystroke or selection system to swap > combining characters. > > If this is too complex, the only way to manage the situation > is to duplicate existing combining characters that cause this > problem, and I think this may go even worse as this > duplication may need to be combinatorial and require a lot of > new codepoint assignments. > > >

