Hi,

I came across some 3rd party discussion of my choice of ASCII-range identifiers (and limitation of non-ASCII-range unicode to strings, chars and comments) that cited this as a major problem in the language. This prompted a little more research and reading on my part, and talking with people who had differing experiences with non-English identifier use in programming languages. I now believe that my earlier impression of "almost universal" adoption of ASCII-range identifiers in non-English programming shops was mistaken, an that there is actually substantial value to such programmers in having non-ASCII range available.

Moreover, looking at the approach taken by PEP 3131 (delegating to the NFKC-normalization-closed sets defined in UAX 31, XID_Start/XID_Continue), I see the "proper solution" has a better-established consensus than I had previously understood to exist. So I've updated the Rust manual to delegate to these specifications as well, and filed a bug (issue 242, if anyone wants to jump on it) to get the lexer patched up to handle this change.

Practical implications of this change are few for people (a) already comfortable with ASCII-range identifiers or (b) working outside the lexer. Hopefully it'll make things more welcome for people who don't fit in to case (a) though.

Apologies for the trashing about on this issue, I misunderstood the current state of play (possibly due to a little too much time spent in despair while trying to upgrade ECMAScript 4 to "any Unicode spec after 1995", but that's a whole other story...)

-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to