Peter Kirk continued this... > On 19/04/2004 13:03, Kenneth Whistler wrote: > > >... Those other middle dots give > >people textual representation alternatives now, if they need to make > >distinctions, and textual rendering alternatives, if they need to make > >middle dots which display with slightly different heights, sizes, or > >spacings, depending on the rendering requirements. > > > > > > Ken, does Unicode specify height, size and spacing distinctions between > the various middle dots which you listed?
No. > If I understand correctly, it > certainly doesn't do so exhaustively. Correct. > So in effect what you are > suggesting here is that people make and use their own private > distinctions between characters which are not defined by Unicode. Not at all. I am suggesting that people who use Unicode characters *will* use them according to their identity. However, that doesn't mean that identification of a character neatly solves all issues of their rendering, nor will it automatically make things neat and tidy when people use characters in different contexts which may have different rendering concerns. The Unicode Standard is not prescriptive about rendering, beyond the basics required to simply ensure correct mapping of textual content into streams of characters. If one font vendor wants to have a raised glyph for the MIDDLE DOT and another wants to have a lowered glyph for the same character, it is not the Unicode Standard's business to put the two vendors in a room until one gives up and admits the other one is correct. > This > sounds very like advising people to ignore Unicode character identiies > and properties and do their own thing. Rather strange advice from > someone in your position, surely? I love the way you put positions in peoples' mouths. By the way, I challenge you to point to the Unicode character properties in the Unicode Character Database which define the relative position for middle dots with respect to x-height of a font, or the spacing of middle dots, for example. > > Surely, in the current situation and if further proliferation of middle > dots is considered undesirable, It is undesirable, yes. > users should be advised to presume that > distinctions between middle dots are not a plain text matter No, they should not. Because the existence of multiple different middle dots in the standard which are *not* canonical equivalents of each other makes it a plain text matter. > and so > should be handled by markup, including language selection. In some cases, yes -- it depends on the effect which is intended, and the context and application it occurs in. > > And if (as I just suggested on the Hebrew list might be true of some > variant Hebrew pointing systems) someone finds a well documented script > in which a true middle dot and an x-height dot are used contrastively, > the correct approach would be either to accept, reluctantly, that at > least one new dot needs to be encoded; or else for Unicode to define > clearly which existing character should be used for which dot in this > script. Or: None of the Above The users of characters for particular domains bear their own responsibility to define their usage. It is not up to the Unicode Consortium to go around defining everyone's spelling rules and orthographic conventions for them. If there are things unclear in the standard which are making its use difficult for people in certain cases, then that is certainly a concern of the Unicode Technical Committee. And if someone brings in convincing evidence of the existence of a semantically significant plain text distinction between two dots that cannot plausibly be handled by *any* combination of the multitudinous dot characters already present in the standard, then the UTC might consider that sufficient justification to encode yet another middle dot. Given, however, the fact that there already are so many dot characters, and given that their rendering often varies by font, the chance of getting some additional pair of dot distinctions by height on the line canonized with yet another dot encoding seems unlikely to me. It is a will-'o-the-wisp to expect any and all multilingual Unicode text to display "correctly" to any arbitrary n-th degree of typographical rectitude with any and all Unicode-conformant fonts. The use of specific fonts with specific designs is *precisely* to enable plain text (or marked-up text, for that matter) to be displayed as desired for particular contexts. The criterion for Unicode plain text is basically *legible* text. > The worst thing that could happen would be for different text > providers to make different and incompatible selections among the > existing characters, leading to total confusion. But that seems to be > the approach which you, Ken, are advocating. I see. And thank you, Peter, for pointing that error out to me. Text providers have their own responsibility to ensure that they are using interoperable conventions for the representation of text. The Unicode Standard does not tell providers of Latin text whether they should interchange text using macrons over long vowels or without, or using IPA length marks or middle dots or some other convention, nor in all uppercase or in mixed case. It *does* specify that the sequence <o, combining-macron> is canonically equivalent to <o-macron>, so that text processes that deal with Latin (or any other) text, should treat the interpretation of those two sequences as the same. That's the difference. --Ken

