On 15/07/2004 10:32, Asmus Freytag wrote:

Nobody doubts that some text exists with multiple accents on vowels. Where the vowels are not Latin a,o,u, there is no issue at all, in this case, since there are no differences in German sorting for them. ...


Well, yes, but http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2819.pdf, does not make it clear that the <CGJ, DIAERESIS> sequence is to be used only with Latin a, o and u; rather it states "<CGJ, [DIAERESIS]> â trÃma". Perhaps the proposal needs modification to make this point clear, if that is the intention.

... Where the vowels are a, o, u, as for the Livonian example you cited, it's a matter of the design of the collation table to get the correct sorting behavior.

If there is anything in UCA that would make it impossible to design correct collation tables for German university libraries, when CGJ is used with Trema, but not for umlaut, then you have an issue. At the moment, I see lots of speculation, and red herrings (Greek and Coptic, indeed!) but no smoking gun.


Greek and Coptic is not irrelevant. First, you did not restrict the set of base characters when you wrote:

Secondly, the dieresis is used to indicate that two vowels are pronounced separately. I haven't seen a case where the vowels would already be accented.

and of course the diaeresis and accent characters used in Greek are the same ones used in Latin script. Second, N2819 does not make it clear that the <CGJ, DIAERESIS> sequence is to be used only for Latin script data. I would expect (someone can check this of course, and without checking this is indeed speculation) that there is Greek text in German bibliographic databases in which the Greek diaeresis is represented in ISO 5426 as trÃma rather than umlaut; that would be correct because the function of Greek diaeresis is separation rather than vowel modification. And I would expect an implementer reading N2819 to conclude that all ISO 5426 trÃmas should be converted to <CGJ, DIAERESIS> as no mention is made of a restriction to Latin script or to just a, o and u. So there is a real chance of a conversion program producing sequences which could confuse normalisation, e.g. <IOTA, CGJ, DIAERESIS, ACUTE>, although hopefully not <IOTA, ACUTE, CGJ, DIAERESIS> which might be a real problem.



And yes, the incidence of Livonian data (relative to trema, which is rather uncommon relative to umlaut) may be below a threshold where providing a support short of the theoretical optimum is a practical concern. That decision belongs to the German bibliographers.


Well, it seems that we are agreeing that there may be a problem in theory, and potentially in practice with small amounts of marginal data, but Unicode is choosing to leave the problem for the specific users of the sequence to deal with. That is indeed a reasonable approach. But it was not considered an acceptable one for use of variation selectors with combining marks, even in a case where there is no valid data which actually exhibits the normalisation problem.

My concern as always is with the apparent inconsistency of bending the normal rules or ignoring the normalisation concerns for German while refusing to do more or less the same for Hebrew. I appreciate that Germany is a larger and richer country than Israel and so, at least for commercial interests, its concerns deserve some priority. But that should not be a reason to reject as invalid or insignificant issues concerning Hebrew. And the issue of avoiding incompatible representation of the same data is a real one for Hebrew Holam Male vs. Vav Haluma just as it is for German umlaut vs. trÃma.

I am not actually asking for variation selectors with combining marks because I realise that the UTC has already made a decision and is unlikely to reverse it. But I am asking for some flexibility on some of the principles, of the kind which has been demonstrated with umlaut and trÃma, and also in the Indic scripts proposal under review, in order to find an acceptable solution to a real problem. That flexibility might include allowing either <VAV, variation selector, HOLAM> or <VAV, ZWJ, HOLAM> to represent Holam Male although technically the VAV glyph does not (usually) change (nor does the HOLAM glyph) and the HOLAM dot does not ligate with the it, just moves relative to it.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to