2013-10-20 2:38, Richard Wordingham wrote:

Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain
text?

Well, is <h1>hello<h1> plain text? The answer is that any string of characters may be considered as plain text and any string of characters may be treated as rich text according to some conventions.

If so, how many dotted circles should appear?

Possibly none. An implementation need not support any particular collection of characters. But an implementation that supports U+25CC must treat it as a spacing character, and an implementation that supports e.g. U+0300 must treat it as a combining mark. So if the implemention is capable of visually rendering them, it shall render U+25CC U+0300 as a dotted circle with an acute accent above it. In this case, exactly one dotted circle should appear, then.

Implementations often have bugs in dealing with combinining mark. This may depend on the rendering software, or on the font.

If the sequence is not plain text, what mark-up notations are
available to control the number of dotted circles produced?  I
am particularly interested in notation for HTML, e.g. via a style
sheet. Should the sequence instead be treated as a graphic?

I don’t understand these questions. If the sequence is treated as other than plain text, then the results depend on the specific “rich text” or other conventiones applied.

This question is prompted by a confused discussion of what the notation
<U+25CC, U+0E31 THAI CHARACTER MAI HAN-AKAT, U+25CC> on a web page
meant.

What it means is a different issue. U+25CC is a symbol that can be used in a variety of meanings. I don’t think it means anything specific to most people, unless a definition is given. U+0E31 is a Thai vowel sign, and I don’t think any meaning in general has been assigned to it when applied to something else than a Thai letter.

The rendering of the sequence is a different matter. Not surprisingly, tests on IE 10 show varying results. Using my test page
http://www.cs.tut.fi/~jkorpela/listfonts1.html
that renders, on IE, a given string in all the fonts available in the system, I noticed that on my system, only SunExt-A and Unifont result in correct rendering. Using Arial Unicode MS, the rendering is correct except for the circles being dashed, and I think this is incorrect for U+25CC, as it violates the identity of the character as a dotted circle. A few other fonts contain the characters too, but the renderings have three similar dotted rings, with the Thai diacritic above the middle one or (in FreeSerif and Quivira) between the 2nd and 3rd. – On Chrome, Safari, and Firefox, the results are similar, except that Chrome shows the string as broken even when Arial Unicode MS is declared.

The confusion was caused because some of us saw two dashed
circles and others saw three dashed circles (one for each character)
when viewing the web page.

The implementations that show three dotted circles are non-conforming. Showing three dashed circles would be even more non-conforming.

If the purpose is to display the combining diacritic the same way as in the code charts in the standard, i.e. with a dotted symbol appearing as generically showing the place of a base character, then I’m afraid the approach does not work in general. It should work, in the sense that conforming implementations would render it the desired way if they support the characters in rendering, but web browsers just don’t conform.

What you could do in a web page is to put U+00A0 U+25CC in one element and U+0E31 in another and position the elements in the same place, set to have the same width and to be horizontally centered. But I’m afraid this would be off-topic here and could involve some nasty details.

Yucca






Reply via email to