On Sun, 20 Oct 2013 11:47:23 +0300 "Jukka K. Korpela" <[email protected]> wrote:
> 2013-10-20 2:38, Richard Wordingham wrote: > > Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain > > text? > The answer is that any string of > characters may be considered as plain text and any string of > characters may be treated as rich text according to some conventions. The correct phrasing of the question is, I suppose, "Can the combination of U+25CC DOTTED CIRCLE plus a combining mark be represented as plain text?". Fortunately, everyone has understood the question. > > If so, how many dotted circles should appear? > Possibly none. An implementation need not support any particular > collection of characters. But an implementation that supports U+25CC > must treat it as a spacing character, and an implementation that > supports e.g. U+0300 must treat it as a combining mark. So if the > implemention is capable of visually rendering them, it shall render > U+25CC U+0300 as a dotted circle with an acute accent above it. In > this case, exactly one dotted circle should appear, then. I fear the get-out clause is that an implementation doesn't support a collection of characters, but rather a collection of strings. Many renderers supporting Thai don't support Thai character sequence <DO, II, II> in any useful fashion, instead allowing the second II to overstrike the first, sometimes in such a way that the author does not realise he has double struck the II character. This accidental non-standard sequence is surprisingly common on the Internet. In the OpenType world, this is probably a font issue. > > If the sequence is not plain text, what mark-up notations are > > available to control the number of dotted circles produced? I > > am particularly interested in notation for HTML, e.g. via a style > > sheet. Should the sequence instead be treated as a graphic? > I don’t understand these questions. If the sequence is treated as > other than plain text, then the results depend on the specific “rich > text” or other conventiones applied. If it can be argued that the extra dotted circle is valid, it would be convenient for web authors to have a mechanism to suppress it, rather as numeric format controls often allow control over the presence of an optional plus sign. I was wondering if this issue had already been addressed in mark-up language. > What it means is a different issue. U+25CC is a symbol that can be > used in a variety of meanings. I don’t think it means anything > specific to most people, unless a definition is given. U+0E31 is a > Thai vowel sign, and I don’t think any meaning in general has been > assigned to it when applied to something else than a Thai letter. In the context prompting the question, it is an explicit place holder for a consonant. The usual symbol used by Thais (or, at least, their textbook writers) is a dash, though the dash characters I tried had the same problems with Uniscribe - dotted circles sprouted. At least the hyphen-minus is available on Thai keyboard layouts. When naming the vowels, o ang is used, but, alas, this is not suitable in the said context. Richard.

