On Sun, 3 Jun 2018 04:31:32 +0100 Richard Wordingham via Unicode <unicode@unicode.org> wrote:
> However, the text is actually in the Tham script, and without any > line-breaking controls, the first and third examples read, marking the > grapheme cluster boundaries with '|', as ᨾ᩠ᨿᩮ <U+1A3E TAI THAM LETTER > MA, U+1A60 TAI THAM SIGN SAKOT | U+1A3F TAI THAM LETTER LOW YA, U+1A6E > TAI THAM VOWEL SIGN E> and ᩉ᩠ᩅᩱ <U+1A4C TAI THAM LETTER LOW HA, U+1A60 > TAI THAM SIGN SAKOT | U+1A45 TAI THAM LETTER WA, U+1A71 TAI THAM VOWEL > SIGN AI>. What I have marked is the *extended* grapheme cluster boundaries. There is a *legacy* grapheme cluster break before the vowel sign. This may make line-breaking after Indic re-ordering a bit easier. However, in the Lao language, we have sequences in Tham such as <consonant | left matra, top matra, ...> ('|' = legacy grapheme break), and I now fully expect there to be renderings such as: <left matra>, break, <consonant, top matra, ...> There seems to be an example about the string hole in the middle line of BAD-13-1-0100 in Figure 5.4 on p222 of Bounleuth's dissertation (http://ediss.sub.uni-hamburg.de/volltexte/2016/8039/pdf/Dissertation.pdf), but I'm not confident of my reading of the split word as <U+1A2F TAI THAM LETTER DA | U+1A6E TAI THAM VOWEL SIGN E, U+1A65 TAI THAM VOWEL SIGN I, U+1A60 TAI THAM SIGN SAKOT | U+1A36 TAI THAM LETTER NA>. Theppitak would be able to confirm or refute, but he doesn't often participate in this forum. Richard.