On Wed, 14 Dec 2016 18:44:39 +0100 Reini Urban <[email protected]> wrote:
> On Dec 5, 2016, at 3:31 PM, Richard Wordingham > <[email protected]> wrote: > > The choice with PHI includes: > > > > U+0278 LATIN SMALL LETTER PHI > > U+03C6 GREEK SMALL LETTER PHI > > > > a Greek (!) script character with compatibiity decomposition to > > U+03C6 > > > > U+03D5 GREEK PHI SYMBOL > > > > and a whole host of common script characters with compatibility > > decomposition to U+03C6: > > > > U+1D6D7 MATHEMATICAL BOLD SMALL PHI > > U+1D6DF MATHEMATICAL BOLD PHI SYMBOL > > U+1D711 MATHEMATICAL ITALIC SMALL PHI > > U+1D719 MATHEMATICAL ITALIC PHI SYMBOL > > U+1D74B MATHEMATICAL BOLD ITALIC SMALL PHI > > U+1D753 MATHEMATICAL BOLD ITALIC PHI SYMBOL > > U+1D785 MATHEMATICAL SANS-SERIF BOLD SMALL PHI > > U+1D78D MATHEMATICAL SANS-SERIF BOLD PHI SYMBOL > > U+1D7BF MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL PHI > > U+1D7C7 MATHEMATICAL SANS-SERIF BOLD ITALIC PHI SYMBOL > > > > They are all ID_Start. > > Oh my. Dragons beware. So I need to add some trie tables to add > warnings with those rules also. I don’t want to error on some obscure > confusables rule only yet. perl doesn’t even ship the security > tables, so people are not aware of it. Another solution would be to treat two identifiers as the same if they have the same NFKC normalisation. > > You didn't mention the inherited script. Is that automatically > > allowed, e.g. φ̈ᵣ <U+03C6, U+0308 COMBINING DIAERESIS, U+1D63 LATIN > > SUBSCRIPT SMALL LETTER R> (scripts: Greek, inherited, Latin)? I > > encountered that variable name in a radar specification last week. > > Inherited is allowed with ID_Continue, yes. Not in ID_Start position. > Combiners are normalized to NFC. <U+03C6, U+0308, U+1D63> is unchanged under normalisation to NFC, NFD, NFKC and NFKD. > > There might be issues - it's possible that क̐ <U+0915 DEVANAGARI > > LETTER KA, U+0310 COMBINING CANDRABINDU> might spoof कँ <U+0915, > > U+0901 DEVANAGARI SIGN CANDRABINDU>. > \x{915}\x{310} is legal Devanagari normalized to one char. I don't know know what you mean by this statement. <U+0915, U+0310> is also unchanged under the 4 normalisations. > \x{915}\x{901} are two legal Devanagari characters. > but they are confusables. This would need special confusable rules. Additionally, U+0310 can be confused quite readily with the sequence <U+0306 COMBINING BREVE, U+0307 COMBINING DOT ABOVE>. Richard.

