> The proposal also asks for identifiers to be treated as equivalent under NFKC.
The guidance in #31 may not be clear. It is not to replace identifiers as typed in by the user by their NFKC equivalent. It is rather to internally *identify* two identifiers (as typed in by the user) as being the same. For example, Pascal had case-insensitive identifiers. That means someone could type in myIdentifier = 3; MyIdentifier = 4; And both of those would be references to the same internal entity. So cases like SARA AM doesn't necessarily play into this. > IMO the major issue with non-ASCII identifiers is not a technical one, but rather that it runs the risk of fragmenting the developer community. IMO, forcing everyone to stick to the limitations of ASCII for all identifiers is unnecessary and often counterproductive. First, programmers tend to think of "identifiers" as being specifically "identifiers in programming languages" (and often "identifiers in programming languages that I think are important". Identifiers may occur in much broader contexts, often being much closer to end users (eg spreadsheet formulae) or scripting languages, user identifiers, and so on. Secondly, even with programming languages that are restricted to ASCII, people can choose identifiers in code like the following, which would not be obvious to many people. var Stellenwert = Verteidigungsministerium_Konto.verarbeite(); // Asmus könnte realistischere Beispiele vorschlagen For a given project, and for programming languages (as opposed to more user-facing languages) the language to be used for variables, functions, comments, &c. will often be English, to allow for broader participation. But that should be a choice of the people involved. There are clearly many cases where that restriction is not optimal for a given project, where not all of the developers (and prospective developers) are fluent in English, but do share another common language. Think of all the in-house development in countries and organizations around the world. And finally, it's not like you hear of huge problems from Java or Swift or other programming languages because they support non-ASCII identifiers. Mark On Thu, Jun 7, 2018 at 9:36 AM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Tue, 5 Jun 2018 01:37:47 +0100 > Richard Wordingham via Unicode <unicode@unicode.org> wrote: > > > The decomposed > > form that looks the same is นํ้า <U+0E19, U+0E4D, U+0E49, U+0E32>. > > The problem is that for sane results, <tone mark, SARA AM> needs > > special handling. This sequence is also often untypable - part of the > > protection against Thai homographs. > > I've been misquoted on the Rust discussion topic - or the behaviour is > more diverse that I was aware of. On LibreOffice, with sequence > checking not disabled, typing <U+0E19, U+0E4D> disables the input by > typing of U+0E49 or U+0E32 immediately afterwards. Another mechanism > is for typing another vowel to replace the U+0E4D. The problem here is > that in standard Thai, U+0E4D may not be followed by another vowel or > tone mark, so Wing Thuk Thi (WTT) rules cut in. (They're also quite > good at preventing one from typing Northern Khmer.) In LibreOffice, > typing the NFKC form <U+0E19, U+0E49, U+0E4D, U+0E32> is stopped at > attempting to type U+0E4D, though one can get back to the original by > typing U+0E33 instead. To the rule checker, that is mission > accomplished! > > Richard. > >