The word-breaking algorithm defines an apparently innocuous interface for word breaking of 'complex context' scripts such as Thai, Lao and Myanmar. The complex context part, whose internals are deliberately and reasonably not defined by Unicode, assigns word break property values to the characters. Are there any implementations that work that way? Negative answers such as 'xxx does not work that way' would also be useful.
For example, ICU does not work this way. Instead, the complex context parts deliver word boundaries rather than character properties to the part of the algorithm working in accordance with a tailoring of the algorithm in UAX#29. It seems that in general the assignments may be a little complicated. For example, in the usual case of interest, Thai script word characters delimited by white space, it seems to me that the characters of alternate words should be assigned to 'ALetter' and 'Katakana'. Have I missed a trick here? 'RI' is a new alternative to 'ALetter' and 'Katakana', but that seems even more bizarre, and I'd worry about its stability. I'm finding some interesting constraints arisng from the interface. For example, *within* xกy (that's a Thai letter flanked by two English letters), there are either no or two word boundaries. By contrast, there may be no, one or two linebreak opportunities *within* the string. Richard.

