On Sun, 4 Dec 2016 12:09:36 +0100 Reini Urban <[email protected]> wrote:
> * normalize identifiers (NFC) and only store normalized variants. > this should catch bidi spoofs, combining characters and such. That doesn't catch bidi spoofs. > * check each unicode code point for its Script property and besides > Latin, Common and Inherited only allow the first script, but error on > any other mixed script. Additional scripts need to be declared. > https://github.com/perl11/cperl/issues/229 > > in perl like this: > use utf8 ‘Greek’, ‘Cyrillic’; Your rule isn't clear. Would an identifier like ψ_S be automatically allowed? I presume you're handling the spoofing of the SMALL PHI characters by other means. For multilingual support, you would want rules more like 'After script X, allow script Y'. > Of course there exist several languages which require more than one > script, <snip> > or african languages as some have other than Latin roots, e.g. > Ethiopian from Semitic. I don't see your problem here. What problem do you see with Amharic? > Indian languages also sound problematic, Is this the ZWJ/ZWNJ issue? That surely is a problem within a script. > and > all the Old_<script> Now I am confused. What problem do you see that you don't have in the Latin script? Richard.

