I'm not sure if this is the right forum for the question; if not, please advise me where I should take the problem for public discussion.
The immediate problem is that the Universal Shaping Engine (USE) uses a regular expression for Indic orthographic syllables that doesn't cover the common CVC orthographic syllables of the Tai Tham script, let alone the rarer CVCVC orthographic syllables. In his paper earlier this year, 'Making fonts for the Universal Shaping Engine' (available at http://tiro.com/John/Universal_Shaping_Engine_TYPOLabs.pdf), John Hudson reported, "It’s called the Universal Shaping Engine, then, not because it shapes all scripts, but because it uses a universal model. Of course, as soon as you declare that you have a universal model, someone comes along with an exception to that model. In this case, the exception is the Tai Tham or Lanna script of northern Thailand, which uses subjoined consonants in ways that may compress multiple syllables into a single cluster, causing recursion in cluster analysis. It remains to be seen whether Tai Tham can be accommodated with exception code in the Universal Shaping Engine, or will need to be passed to a script-specific engine." Does anyone know what the problem is that caused the complaint that Tai Tham needs "recursion in cluster analysis"? For syllables without a dangling stacking control code, the regular expression is similar (see https://www.microsoft.com/typography/OpenTypeDev/USE/intro.htm#clustervalidation for the precise form) to base subscript* vowel* final* where subscript = medial | consonant_subjoined | subjoiner consonant subjoiner = virama | coeng final = final_consonant I have omitted various modifiers for clarity. Now, the obvious generalisation to cover the Tai Tham script (and, incidentally, the Khmer script) is base (subscript* vowel* final2*)* where final2 = final | subjoiner consonant Now, I see iteration here, but we had it before, so I don't know what the problematic 'recursion' is. I can make various guesses. Perhaps the regex needs to be 'unambiguous'. Perhaps it needs to be 'deterministic', i.e. each character can be matched to an element of the regex as soon as encountered. Perhaps the problem is just that the regex encourages backtracking. These possible issues all seem soluble, so please, someone, what is the problem? Richard.

