On Tue, 22 Oct 2013 01:40:39 +0200 Philippe Verdy <[email protected]> wrote:
> You still don't undestand: I want the composite to behave as if it > was a letter that is missing and it is supposed to replace (including > in the middle of a word... There's no attempt to insert a line break > (in fact I don't want it before or after, unless there are breaking > characters around such as punctuation or spaces). By almost all that's in the Unicode standard, placeholder base character plus combining mark (2 characters in total) should render as though the placeholder were a letter. No control character should be necessary - gluing them together with WJ would not improve things. The only exception is that TUS cautions that some combinations may not render well. I tried to think where WJ might make sense between a base character and combining mark. I could think of only two cases, and in these cases it would apply even within normal words. The first is where a hyphenator, trying to improve line-breaking, decides to insert a line break visually between a base and spacing combing mark (Mc). Such breaks do occur. A WJ might overrule this behaviour. The second is where text is being split into words, e.g. Sanskrit text. Now a WJ would not always be able to help, for sometimes a vowel character must be split between two words. I've just looked at the Uniscribe behaviour in detail. I gave it a sequence <U+0E01 THAI CHARACTER KO KAI, U+25CC DOTTED CIRCLE, U+0E31 THAI CHARACTER MAI HAN-AKAT, U+0E01, U+0E31> to be rendered with the Angsana New font, a font designed for Thai. Uniscribe categorised the string as an unbroken run of Thai characters. Now, the font explicitly defines how U+25CC and U+0E31 are to be combined in a Thai script run, so I don't see how the font can be regarded as broken. Uniscribe nevertheless insists that the sequence is faulty, and converts the first U+0E31 into two glyphs, those for U+25CC and U+0E31. Uniscribe is clearly just being restrictive in what characters a Thai combining mark may be attached to in the backing storage. Richard.

