On Mon, 4 Jun 2018 12:49:20 -0700 Manish Goregaokar via Unicode <unicode@unicode.org> wrote:
> Hi, > > The Rust community is considering > <https://github.com/rust-lang/rfcs/pull/2457> adding non-ascii > identifiers, which follow UAX #31 > <http://www.unicode.org/reports/tr31/> (XID_Start XID_Continue*, with > tweaks). The proposal also asks for identifiers to be treated as > equivalent under NFKC. > (In general, are there other problems folks see with this proposal?) There's the usual lurking issue that the Thai word for water, น้ำ <U+0E19 THAI CHARACTER NO NU, U+0E49 THAI CHARACTER MAI THO, U+0E33 THAI CHARACTER SARA AM>, is unacceptable and often untypable and uncopiable when converted to NFKC น้ํา <U+0E19, U+0E49, U+0E4D THAI CHARACTER NIKHAHIT, U+0E32 THAI CHARACTER SARA AA>. The decomposed form that looks the same is นํ้า <U+0E19, U+0E4D, U+0E49, U+0E32>. The problem is that for sane results, <tone mark, SARA AM> needs special handling. This sequence is also often untypable - part of the protection against Thai homographs. Richard.