On Sat, 23 Feb 2019 14:46:27 +0800 梁海 Liang Hai via Unicode <email@example.com> wrote:
> >>> once the USE acknowledges that subjoined consonants may follow > >>> vowels > >> > >> I expect to update the USE spec to address this soon. > > > > That seems welcome news. I still don't know what the problem with > > supporting them has been. > > USE wasn’t designed to allow such a syllable structure. Tai Tham’s > being supported by USE is kind of an oversight. And although it’s > appropriate to allow conjoined consonants to follow post-base-spacing > vowel signs, it’s not really a trivial debate whether USE should > allow conjoined consonants to non-post-base-spacing (ie, pre-base, > above-base, and below-base) vowel signs—considering the ambiguity. 1. "The goal of the clustering logic is to enable what is graphically consistent with a given script’s rules, rather than enforcing particular orthographic or linguistic rules. Such considerations should be applied at another layer, such as a spelling checker." - USE Specification. There are very few cases that cannot be resolved by a spell-checker once word boundaries are resolved. Pali and Tai phonology (but Lao is TBC) conspire to keep the numbers down. 2. The UTC membership had this discussion when discussing the proposals on the Unicore list. 3. Ambiguity is often font-dependent with above- and below-base vowels, and with tone marks. Marks above are frequently positioned relative to the phonetically preceding spacing consonant element - <SAKOT, BA/PA>, <SAKOT, LA>, <SAKOT, YA> and <SAKOT, SA> are common coda ("sakot") consonants that are spacing. In Northern Thai, <SIGN U, SAKOT, YA> is frequently and <SIGN UU, SAKOT, BA/PA> can be written with the vowel largely to the left of the subscript consonant. Apart from <SIGN U, SAKOT, YA>, Northern Thai largely avoids <vowel below, subscript consonant>, preferring the minor ambiguity of, for example, <RA, SIGN UU, BA> being either /huːp/ or /luː paʔ/. (These two forms are a doublet.) 4. They're explicitly noted in the TUS for the Khmer script, and I suspect they're important for Tai languages in the Khmer ('Khom') script. 5. For visual proofing, one can use colour-coding - people are welcome to copy the relevant logic from my Da Lekh Si font. Word processor support for colour distinctions is limited, but it is in place in several browsers. Most of each akshara is in the foreground colour, so it works with syntax highlighting and similar existing uses of colour-coding. 6. The Sanskrit clusters grv- and gvr- are ambiguous in several Sanskrit-capable Indic scripts. (I haven't yet had the chance to study how Sanskrit is written in Tai Tham, though I do know of one inscription.) 7. The ambiguity of <SAKOT, BA> and <SAKOT, PA> was called out when <SAKOT, BA> was allowed as the usual subscript of U+1A37 TAI THAM LETTER BA. 8. The biggest ambiguity issue is the use of <SAKOT, U+1A4B TAI THAM LETTER A> for U+1A6C TAI THAM VOWEL SIGN OA BELOW. The USE is powerless to deal with this. I wish someone would let me in on the evidence that they are actually distinct. 9. There is actually a problem with CVC aksharas being wrongly encoded paradoxically because of USE's poor support for Tai Tham. HarfBuzz allows an OpenType font to shape Tai Tham text even if it does not declare support for the script. Such fonts have to do Indic rearrangement themselves, and this is generally done by means of ligatures for <preposed vowel, consonant>. Consequently, a cluster <HIGH HA, SAKOT, NA, SIGN E> gets encoded as <HIGH HA, SIGN E, SAKOT, NA>, as there are scores of clusters and five preposed vowels. I know it is possible to do rearrangement properly given access to GSUB; I have a Tai Tham via ASCII mode in my Da Lekh fonts, and I have to do some rearrangement to clean up after the USE. There was a brief, happy period when HarfBuzz's SEA shaping engine was available for Tai Tham, but this was deleted in favour of an implementation of the USE. There are now two bunches of Tai Tham fonts which simply don't work on Microsoft browsers - Graphite fonts and the DIY OpenType Indic rearrangers. Richard.