On Thu, 12 Jan 2017 21:03:29 +0100 Mark Davis ☕️ <[email protected]> wrote:
> That was just an example off the top of my head of the format for > using with regex; I don't pretend that it is vetted. Latin is not a > complex script, so it was only an illustration. However, it was just > brain freeze on my part to not also include Inherited or ZWJ. A more > serious effort would look at some of the issues from > http://unicode.org/reports/tr29/, for example. On the other hand, CGJ > is not a problem: it is Mn > <http://unicode.org/cldr/utility/character.jsp?a=034F>. And (say) > U+064B ARABIC FATHATAN has scx=Arabic,Syriac, so wouldn't be included. Ah, I had not appreciated that sc=Inherited does not imply scx=Inherited. Using Script_Extensions to document the international combining characters that are used, for example, with Thai bases could have all sorts of undesirable knock-on effects. Richard.

