Re: Hyphenation Markup

Richard Wordingham via Unicode Sun, 03 Jun 2018 04:03:53 -0700

On Sun, 3 Jun 2018 04:31:32 +0100
Richard Wordingham via Unicode <[email protected]> wrote:


> However, the text is actually in the Tham script, and without any
> line-breaking controls, the first and third examples read, marking the
> grapheme cluster boundaries with '|', as ᨾ᩠ᨿᩮ <U+1A3E TAI THAM LETTER
> MA, U+1A60 TAI THAM SIGN SAKOT | U+1A3F TAI THAM LETTER LOW YA, U+1A6E
> TAI THAM VOWEL SIGN E> and ᩉ᩠ᩅᩱ <U+1A4C TAI THAM LETTER LOW HA, U+1A60
> TAI THAM SIGN SAKOT | U+1A45 TAI THAM LETTER WA, U+1A71 TAI THAM VOWEL
> SIGN AI>.

What I have marked is the *extended* grapheme cluster boundaries.
There is a *legacy* grapheme cluster break before the vowel sign.  This
may make line-breaking after Indic re-ordering a bit easier.  However,
in the Lao language, we have sequences in Tham such as <consonant | left
matra, top matra, ...> ('|' = legacy grapheme break), and I now fully
expect there to be renderings such as:

<left matra>, break, <consonant, top matra, ...>

There seems to be an example about the string hole in the middle line
of BAD-13-1-0100 in Figure 5.4 on p222 of Bounleuth's dissertation
(http://ediss.sub.uni-hamburg.de/volltexte/2016/8039/pdf/Dissertation.pdf),
but I'm not confident of my reading of the split word as <U+1A2F TAI
THAM LETTER DA | U+1A6E TAI THAM VOWEL SIGN E, U+1A65 TAI THAM VOWEL
SIGN I, U+1A60 TAI THAM SIGN SAKOT | U+1A36 TAI THAM LETTER NA>.

Theppitak would be able to confirm or refute, but he doesn't often
participate in this forum.

Richard.

Re: Hyphenation Markup

Reply via email to