2013/8/7 Richard Wordingham <[email protected]> > On Wed, 7 Aug 2013 01:42:06 +0200 > Philippe Verdy <[email protected]> wrote: > > > 2013/8/6 Richard Wordingham <[email protected]> > > > > > For example, I think the proper > > > upper-casing of <U+1FB3 GREEK SMALL LETTER ALPHA WITH > > > YPOGEGRAMMENI, U+0359 COMBINING ASTERISK BELOW> is <U+0391 GREEK > > > CAPITAL LETTER ALPHA, U+0359, U+0196 LATIN CAPITAL LETTER IOTA, > > > U+0359>. > > > > > > > Why do you use U+0196 LATIN CAPITAL LETTER IOTA instead of U+399 GREEK > > CAPITAL IOTA ??? > > That's a mistake. Sorry. > > > I'm also not convinced that duplicating the combining asterisk below > > is correct here. My opinion is that it should be: > > <U+0391 GREEK CAPITAL LETTER ALPHA, DOUBLE COMBINING ASTERISK BELOW, > > U+0399 GREEK CAPITAL LETTER IOTA> > > with a new "double" diacritic encoded between both letters (it will be > > shown as a single asterisk, centered below the gap between the two > > capital letters... > > The asterisk below indicates that someone once read the letter above, > but it can no longer be verified, e.g. because of further deterioration > of the manuscript. If one converts the text to capitals, the asterisk > below would indicate that the letters cannot be vouched for by the > publisher of the new text, and it makes sense for each unverified > letter to have its own asterisk. >
Or to place the asterisk in the middle gap between the two letters (this preserves the fact that the non capitalized letter was read as a single grapheme. Anyway the capitalization transforms the original text, so you continue to loose semantic information. Why will you still want two asterisks and not three to mark the suppression of lettercase ? If the combining small iota subscript was capitalized as a combining smallcap iota, there would be no ambiguity, both would be interpreted as YPOGEGRAMMENY. Using the standard CAPITAL IOTA is just a loss of distinction and a facility used by legacy character sets which were much more limited. > There's no such "double combining asterisk" character in the UCS. But > > if you replace the asterisk by a macron (below or above) there exists > > such double diaritic. The problem is that collation with strength > > ignoring case diferences will not compare these strings as equal. > > > Or it could also be: > > <U+0391, WJ, U+359, U+0399> > > using a zero-width word joiner to hold the simple combining asterisk > > below (this will create three grapheme clusters, with the second one > > kerned below the two surrounding letters). > > That's not what U+2060 WORD JOINER does. It tells the word breaking > algorithm that is being applied (presumably to scripta continua here) > that there is no word boundary between the two letters. I don't > believe that there is a character that does what you want. > I used Word Joiner because it has been suggested as a replacement for the zero-width non-breaking space (U+FEFF ZWNBSP) which is now used almost only as a leading BOM, and silently discarded from input streams when it is not known if it is a begining of the stream (e.g. when the stream as no defined being but is a live stream, which could combine multiple source streams). And yes what I want here is *also* (not not only) to avoid a word break between alpha and iota (there was no break between alpha and its subscripted iota as they were in the SAME default grapheme cluster, this is not the case if you use come non-breaking control between these two vowels (Greek allows a syllable break here in many words, you would need some dictionary lookup to determine that this was effectively an unbreakable ypogegrammeni). The discussion about kerning is appropriate because this is the intended behavior of the word joiner to remain invisible (not increasing the interletter spacing except in cases where there's no other solution to avoid collisions of glyphs). Clearly, your asterisk below does not take any space below Greek capitals, for usual (non decorated) font styles. If decorated font styles are used, for example in a "lettrine" (dropped initial capital letter) at the start of the first paragraph of a chapter body, the pair of capital letters ALPHA+IOTA should still use a single decorated glyph, and the asterisk should find its place below the glyph or within some blank space left by decoration strokes of the lettrine. The IOTA should not be left out of the lettrine only on the first normal line when ALPHA alone is in the initial lettrine spanning two lines or more. In such situation this is where you'll see that the two letters are linked and form an unbreakable grapheme cluster. But may be you think that this should use a control dedicated to indicating an implicit ligature (even if it's not visible with Greek capitals with common fontstyles, as capitals usually reproduce the engraved monumental style, with basic strokes and minimal decorations). What is the alternative: - CGJ after ALPHA ? I'm not sure that there's no break after CGJ and before another non-combining letter like IOTA. - ZWNBSP ? Not very safe with many algorithms that will drop them silently because they can't know if a leading BOM was already removed in the stream or if the stream comes from the concatenation of two streams. - ZWJ ? it is used to trongly suggest a ligature, but the ligature will only be appropriate for lettrines but not on most paragraphs > > I think this solution is preferable because collation with strength > > ignoring case diferences (and treating WJ as ignorable) will compare > > the uppercased string as equal to the original lowercase string. > > Alternatively, give U+0359 COMBINING ASTERISK BELOW only tertiary > weight. It doesn't seem right to give it priority over accent > differences. >

