On 14/07/2004 23:10, Kenneth Whistler wrote:
...
Thanks for all the clarification which I have snipped.
One such situation is Holam Male which never takes an additional combining mark*. So why can't we represent it as <VAV, HOLAM, variation selector>?
Because the UTC has ruled out <CM, VAR> as interpretable sequences.
Is there a better reason than "because we say so"? You don't have to answer that one.
After all in practice there is no normalisation problem with this. (By the way, I am proposing as one option <VAV, variation selector, HOLAM>, but that has been opposed on the debatable grounds that what changes is not the VAV but the HOLAM - the best description is that the whole grapheme cluster changes.)
I don't have a quarrel with describing things that way -- but you
just can't get from here to there with variation selectors.
I don't quite understand you here. Are you saying that <VAV, variation selector, HOLAM> would be acceptable for representing a variation of the entire grapheme cluster, or that it would not?
The alternatives which we might consider include <VAV, ZW(N)J, HOLAM>. This corresponds closely to Peter Constable's recommendations for Indic languages in http://www.unicode.org/review/pr-37.pdf, which is to use <base, ZWJ, VIRAMA>, and indeed to the existing special-case rule for Bengali RA + ya-phalaa in Figure 12 of that document. Or would we do much better to stick to <ZW(N)J, VAV, HOLAM> or <HOLAM, ZW(N)J, VAV>, keeping ZW(N)J outside the combining sequence?
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

