For Japanese, Korean and Chinese there are already assigned som "script" codes in ISO 15924 you can use for mixed scripts (e.g. "Jpan"="Hani+Hrkt" and "Hrkt"="Hira"+"Kana") These are already standardized aliases you can use. For some languages this can be more complex (e.g. some Berber languages may use Latin+Tamazigh+Arabic, probably not in identifiers, but possibly in user names if they are also used as identifiers) There will stil remain confusables (such as between Latin, Greek and Cyrillic variants of letter A) which are unavoidable in some names using mixed scripts (notably in user names or some geographic feature names or trademarks if they are used as identifiers for page names or similar on a community website, forum, wiki, or similar). Various websites and applications will need their own limitations on usable names (and must know that any limitation may cause some orthographic problems notably for user names).
In more technical programming languages however, you can usually be much more restrictive as the identifiers used are generally abbreviated and simplified: you can kill lettercase differences for example, as well as bidi controls, and probably some joiner/disjoiner controls and other invisible format controls (the identifiers will need to be distinguished, if needed, using some other characters), and forcing a normalization to NFC is certainly helpful. If you need to embed in these languages some user names, they'll need to be "escaped" sometimes, or included in string litterals rather than plain identifiers. 2016-12-04 12:09 GMT+01:00 Reini Urban <[email protected]>: > Of course there exist several languages which require more than one > script, like > Japanese = Hiragana and Katakana and maybe Han, > Korean = Hangul + Han, … > or african languages as some have other than Latin roots, e.g. Ethiopian > from Semitic. > Indian languages also sound problematic, and all the Old_<script> > > For these I just add aliases to allow multiple Scripts. >

