On Saturday, August 09, 2003 11:14 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
> On 09/08/2003 13:41, John Cowan wrote: > > > Peter Kirk scripsit: > > > > > > > > > The gap may not be large, but Philippe, John H and I have > > > identified a real gap. Why this antagonism against filling it? > > > > > > > > > > What you have identified is a set of implementation defects, not > > problems with the Unicode Standard. The standard way to do what > > you want is to precede the combining mark with SP or NBSP. If that > > "doesn't work", then the implementation that makes it not work > > needs to be fixed. > > > > > > > Tell Microsoft! (See Noah Levitt's posting.) And the W3C or SGML commities with the *ML character model! > If this is indeed "The standard way to do what you want", then the > standard needs to make it clear that the sequence of <space, combining > mark> or <NBSP, combining mark> has the properties which I want, i.e. > it has the width of the combining mark alone, and not the full width > of a space, and does not expand for justification, is not a line > breaking opportunity, does not in fact have any of the properties of > a space. I expect to see such a clarification in the next edition of > the Unicode Standard. Don't forget the issues created by the fact that in many cases, there's no other way than using "defective" sequences, hoping that the implementation will render the diacritic alone and not the dotted circle, and will correctly space the diacritic. For now the tricky solution using any (unspecified) control character before the diacritic is really a trick, and not interoperable, and it complexifies the plain-text search application where there is no predictable or stable base character to match this diacritic (in addition, many input methods or keyboard driver will not allow you to enter such "defective" sequence, meaning that for example the "Yerushala(y)im" word cannot be entered and searched exactly within a large text, as the implied invisible letter has no stable representation). Note that the CGJ solution will not work when the isolated diacritic must be the initial of a word or breakable token: for this case, the solution with SPACE is really tricky due to the special treatment of SPACE notably in HTML, SGML, XML and often SQL which "normalize" whitespaces. Thanks, the existing spacing diacritics do not have these problems as they are not canonically equivalent to the suggested SPACE+diacritic "compatibility equivalent", however this is only part of a solution for some diacritics (not ALL), and it only fills the use as symbols, but not as regular letters within the same word with surrounding letters. So there is really two gaps: a small gap for missing spacing diacritics used as symbols, and a large gap for all isolated diacritics used within a word (that the CGJ solution only solves in the middle or at end of a word, but not at its initial). -- Philippe. Spams non tol�r�s: tout message non sollicit� sera rapport� � vos fournisseurs de services Internet.

