There is a PR on swift-evolution to implement UAX#31 recommendations: https://github.com/apple/swift-evolution/pull/531
It was discussed on this list fairly recently, so a quick scroll through the archives should surface those threads. Briefly, UAX#31 recommends NFC for languages with case-sensitive identifiers and NFKC for languages with case-insensitive identifiers, so the proposed normalization in this PR is NFC and not NFKC. On Sat, Oct 1, 2016 at 11:02 Jonathan S. Shapiro via swift-evolution < [email protected]> wrote: > New to the list, but old hand at PL design. Was looking over the lexical > structure of Swift 2.2 and 3.0, and I have some questions. A number of > considerations identified in UAX31 (Unicode Identifier and Pattern Syntax) > and UAX36 (Unicode Security Considerations) aren't obviously addressed. > > Here are some items that jumped out from a casual glance at the spec: > > 1. The specification does not appear to state any particular rules for > compatibility or normalization in identifiers. Other Unicode-aware > programming languages have adopted NFKC almost universally, and for good > reason. The current identifier-head and identifier-character grammar admit > sequences that Unicode considers malformed. > > 2. The specification does not appear to address any notion of Unicode > equivalent sequences. > > 3. The relationship between the identifiers admitted by Swift 3 and > identifiers admitted by UAX31 isn't clear. As a matter of cross-platform > compatibility it would be really good if identifiers permitted by the > default rules of UAX31 were all legal in Swift. This seems important for > cross-language interop. > > Has this relationship been discussed somewhere I can catch up on? > > 4. Valid operators include code points that are undefined in any current > or historical Unicode standard. That seems problematic. Future revisions to > Unicode will eventually place *some* of those code points in the XIDS/XIDC > categories, at which point we will have to choose between backwards > compatibility and interop. Others will be assigned to new combining marks, > which will want to be used in identifiers. As new languages are added to > Unicode, compatibility concerns will exclude some groups from using > identifiers that are natural to them. > > > In order of least-to-most difficulty, I'd like to suggest some changes to > the specification. I'm willing to implement them if agreement can be > reached: > > 1. Pick a Unicode version and exclude any code point that is undefined as > of that standard from both operators and identifiers. It's relatively easy > and backwards compatible to move the Unicode version number forward as the > language specification evolves. > > 2. Ensure that no code point in the Unicode Pattern_Syntax and > Pattern_WhiteSpace categories are not included in identifier-head or > identifier-character. > > 3. Explicitly state that no code point in (XIDS u XIDC) or > Pattern_WhiteSpace is legal in an operator. Consider ensuring that > everything in Pattern_Syntax *is* permitted in an operator. > > 4. I'd personally like to see an explicit statement of the extensions to > XIDS/XIDC that are admitted by identifier-head and identifier-character. > UAX31 refers to such extensions as a "profile", and explicitly allows them. > I'm not interested in changing the identifier space unless there is > something grossly and obviously problematic. What I'm after is enabling > developers to be cognizant of potential interop challenges. > > 5. Adopt NFKC for identifiers. Specify and implement a combining algorithm > version so that forward/backward compatibility is ensured. > > > The first three are pretty trivial. The fourth would take some sleuthing, > but it is straightforward. The fifth is real work. I'd be willing to sign > up to any or all of these, but for a starting point I want to learn where > things stand, what decisions have already been made, and where any current > discussion may be happening. > > I very much doubt that NFKC would break existing code, if only because the > use of malformed Unicode sequences is likely to be rare. To the extent that > they exist in the field, they are almost certainly (a) unintentional, or > (b) security concerns. It seems like a good thing to catch both of those > early to the extent that we can, and to do so while the language definition > remains somewhat fluid. > > > Thanks! > > > Jonathan Shapiro > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
