2011/11/19 Ken Whistler <[email protected]>: > On 11/18/2011 5:24 PM, Philippe Verdy wrote: >> >> This arc in the example is definitely NOT mathematics > > Nor did I say it was. > >> (even if you >> have read a version where it was attempted to represent it using a >> Math TeX notation in this page, an obvious error because it used an >> angular \widehat and not the appropriate sign). > > Irrelevant. > >> This arc is a true >> phonetic mark of a contextual elision (the intermediate letter(s) are >> not to be pronounced, even though they are still written to explicit >> the phonetically elided word(s) and keep their usual orthography). > > The fact that the function of the mark is to indicate a contextual elision > is > also essentially irrelevant to the analysis of whether such marking consists > of a mark (character) in text or a mark-up (non-character) of text. > > The issue to pay attention to is whether the scoping of the modification of > text is cleanly delimited to a single character at a time, or is in > principle > extensible across n characters.
Unicode encodes many things whose scope of modification applies to more than one character. At first, it already defines grapheme clusters... >> >> Exactly similar to other phonetic symbols like the elision tie (an arc >> adjoininig two words to elide its separating space), or the apostrophe >> (which replaces completely the elided letters). >> >> And obviously a true candidate for plain-text: it provides >> simultaneouly two readings of the text, one is purely phonetic (and >> accurate for poems that have an essential and very strong rythmic >> structure), another is semantic (by the orthography kept). All letters >> have to be present in some way, even if some of them are marked for >> the expected phonetic. > > And is obviously *not* a true candidate for plain text representation. This > kind > of markup for simultaneous alternative readings of text is precisely where > representation by a richer mechanism makes sense. And this is contradicted within the Hebrew, Arabic and Tibetan script where there are simulateous alternative readings marked by combining signs, including for cantillation and songs. > And this is merely the > veriest toe in the water for what I am referring to as "text scoring". > > For an example of the complexity of various approaches to these kinds of > problems, > see: > > http://www.ilc.cnr.it/EAGLES/spokentx/node31.html > > And here is an example of a well worked-out, systematic, multi-level scoring > system > for prosodic information, the ToBI annotation conventions: > > http://www.cs.columbia.edu/~agus/tobi/labelling_guide_v3.pdf Thanks for these. But what I wanted to find is what has been encoded already with the macron half-marks, which can be elongated by U+FE26. Unicode also includes characters only needed for the presentation purpose and nothing else such as the Arabic kashidas (whose glyphs assigned in fonts, is internally used by text renderer to position the joining line, by juxtaposing one or more of them between letters, on order to justifiy text, even if those characters are not really encoded in standard texts, but only on monospaced texts !)... Do you argue here that U+FE26 is not plain-text ? Can't you see that <x,U+FE24,y,U+FE25> also has an alternate encoding with a "double"-wide macron encoded between x and y, and that double diacritics were really not needed if we already had the standard convention of using half-marks to mark the beginning an end of an elongated diacritic applying to runs of more than one grapheme cluster ? (assuming that all half-marks are not reorderable under normalizations, i.e. they have a zero combining class) ? And that finally, all these macron variants are exactly the same sign, just applied to a different number of grapheme clusters (either 1, or 2 with "double" marks, or more when using half-marks) ? What is plain-text then ? Well, only what the UTC and WG2 agree to encode, only because there's already a constant or historical use. As of today, those elongated marks already have an established, historic and constant use. That's why I advocate for a change of paradigm to avoid these mutliplications : In a past message a few days ago, I wanted to propose format controls to mark the beginning and end of "extended clusters". That would solve everything very simply, reusing the same encoded diacritics without defining any more "double" or "half" marks... And even if those "extended clusters" cannot have their layout shown exactly (due to limitations in renderers), these controls would be representable by a visible glyph of their own (for example, right or left, dotted half circles, in dotted squares) to which we could still apply and represent the standard diacritics (without needing to reposition and resize them in a complex way only supported by more advanced renderers), using all existing font technologies (pending for later improvements in renderers and fonts to support better layouts without using pseudo-glyphs). -- Philippe.

