Re: Mixed-Script confusables in prog.languages

2016-12-15 Thread Richard Wordingham
On Wed, 14 Dec 2016 18:44:39 +0100 Reini Urban wrote: > On Dec 5, 2016, at 3:31 PM, Richard Wordingham > wrote: > > The choice with PHI includes: > > > > U+0278 LATIN SMALL LETTER PHI > > U+03C6 GREEK SMALL LETTER PHI > > > > a Greek (!)

Re: Mixed-Script confusables in prog.languages

2016-12-14 Thread Reini Urban
> On Dec 5, 2016, at 3:31 PM, Richard Wordingham > wrote: > > On Mon, 5 Dec 2016 09:31:11 +0100 > Reini Urban wrote: > >>> On Dec 4, 2016, at 11:45 PM, Richard Wordingham >>> wrote: >>> >>> On Sun, 4 Dec

Re: Mixed-Script confusables in prog.languages

2016-12-14 Thread Reini Urban
> On Dec 5, 2016, at 1:51 PM, gfb hjjhjh wrote: > > How about package names like ロシアМС21(Note the МС are Cyrillic), or πr²の秘密, or > エリ_хорошо_μ'sic_4⃣ever? Although they aren't really names that people would > usually use in package/var names, they are meaningful names… My

Re: Mixed-Script confusables in prog.languages

2016-12-05 Thread Richard Wordingham
On Mon, 5 Dec 2016 09:31:11 +0100 Reini Urban wrote: > > On Dec 4, 2016, at 11:45 PM, Richard Wordingham > > wrote: > > > > On Sun, 4 Dec 2016 12:09:36 +0100 > > Reini Urban wrote: > > > >> * normalize identifiers (NFC)

Re: Mixed-Script confusables in prog.languages

2016-12-05 Thread gfb hjjhjh
How about package names like ロシアМС21(Note the МС are Cyrillic), or πr²の秘密, or エリ_хорошо_μ'sic_4⃣ever? Although they aren't really names that people would usually use in package/var names, they are meaningful names... 2016年12月5日 16:39 於 "Reini Urban" 寫道: > > > On Dec 4, 2016,

Re: Mixed-Script confusables in prog.languages

2016-12-05 Thread Martin J. Dürst
On 2016/12/05 04:07, Philippe Verdy wrote: In more technical programming languages however, you can usually be much more restrictive as the identifiers used are generally abbreviated and simplified: you can kill lettercase differences for example, In some languages maybe. But languages such

Re: Mixed-Script confusables in prog.languages

2016-12-05 Thread Martin J. Dürst
On 2016/12/05 17:31, Reini Urban wrote: ψ_S contains Greek U+03C8, Common and Latin. Since Latin and Common are always allowed, the only new script is Greek. The first non-default script is automatically and silently allowed, only a mix with another non-default script, such as Cyrillic

Re: Mixed-Script confusables in prog.languages

2016-12-05 Thread Reini Urban
> On Dec 4, 2016, at 11:45 PM, Richard Wordingham > wrote: > > On Sun, 4 Dec 2016 12:09:36 +0100 > Reini Urban wrote: > >> * normalize identifiers (NFC) and only store normalized variants. >> this should catch bidi spoofs, combining

Re: Mixed-Script confusables in prog.languages

2016-12-04 Thread Richard Wordingham
On Sun, 4 Dec 2016 12:09:36 +0100 Reini Urban wrote: > * normalize identifiers (NFC) and only store normalized variants. > this should catch bidi spoofs, combining characters and such. That doesn't catch bidi spoofs. > * check each unicode code point for its Script property

Re: Mixed-Script confusables in prog.languages

2016-12-04 Thread Markus Scherer
On Sun, Dec 4, 2016 at 3:09 AM, Reini Urban wrote: > Is anybody aware of any other language implementation, which does > confusable or mixed-script protection? > I think R has something, because it has this header: > https://cran.r-project.org/bin/windows/extsoft/3.4/ >

Re: Mixed-Script confusables in prog.languages

2016-12-04 Thread Philippe Verdy
For Japanese, Korean and Chinese there are already assigned som "script" codes in ISO 15924 you can use for mixed scripts (e.g. "Jpan"="Hani+Hrkt" and "Hrkt"="Hira"+"Kana") These are already standardized aliases you can use. For some languages this can be more complex (e.g. some Berber languages

Mixed-Script confusables in prog.languages

2016-12-04 Thread Reini Urban
I’m working on adding Mixed-Script confusable protection to a programming language, cperl a perl5 fork, for security reasons, for its identifiers. i.e. variable names, package names, function names, literals. This is a bit different to the typical use cases of libidna, in email or browsers.