+1. Even if it's too late for Swift 3, though, I'd argue that it's highly unlikely to be code-breaking in practice. Any existing code that would get tripped up by this normalization is arguably broken already.
On Tue, Jul 26, 2016 at 2:22 PM, João Pinheiro <[email protected]> wrote: > This proposal [gist > <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800>] > is the result of the discussions from the thread "Prohibit invisible > characters in identifier names > <http://thread.gmane.org/gmane.comp.lang.swift.evolution/21022>". I hope > it's still on time for inclusion in Swift 3. > > Sincerely, > João Pinheiro > > > Normalize Unicode Identifiers > > - Proposal: SE-NNNN > <https://gist.github.com/JoaoPinheiro/NNNN-normalize-identifiers.md> > - Author: João Pinheiro <https://github.com/joaopinheiro> > - Status: Awaiting review > - Review manager: TBD > > > <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#introduction> > Introduction > > This proposal aims to introduce identifier normalization in order to > prevent the unsafe and potentially abusive use of invisible or equivalent > representations of Unicode characters in identifiers. > > Swift-evolution thread: Discussion thread > <http://thread.gmane.org/gmane.comp.lang.swift.evolution/21022> > > <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#motivation> > Motivation > > Even though Swift supports the use of Unicode for identifiers, these > aren't yet normalized. This allows for different Unicode representations of > the same characters to be considered distinct identifiers. > > For example: > > let Å = "Angstrom" > let Å = "Latin Capital Letter A With Ring Above" > let Å = "Latin Capital Letter A + Combining Ring Above" > > In addition to that, *default-ignorable* characters like the *Zero Width > Space* and *Zero Width Non-Joiner* (exemplified below) are also currently > accepted as valid parts of identifiers without any restrictions. > > let ab = "ab" > let ab = "a + Zero Width Space + b" > > func xy() { print("xy") } > func xy() { print("x + <Zero Width Non-Joiner> + y") } > > The use of default-ignorable characters in identifiers is problematical, > first because the effects they represent are stylistic or otherwise out of > scope for identifiers, and second because the characters themselves often > have no visible display. It is also possible to misapply these characters > such that users can create strings that look the same but actually contain > different characters, which can create security problems. > > <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#proposed-solution>Proposed > solution > > Normalize Swift identifiers according to the normalization form NFC > recommended for case-sensitive languages in the Unicode Standard Annexes > 15 <https://gist.github.com/JoaoPinheiro/UAX15> and 31 > <https://gist.github.com/JoaoPinheiro/UAX31> and follow the Normalization > Charts <https://gist.github.com/JoaoPinheiro/NormalizationCharts>. > > In addition to that, prohibit the use of *default-ignorable* characters > in identifiers except in the special cases described in UAX31 > <https://gist.github.com/JoaoPinheiro/UAX31>, listed below: > > - Allow Zero Width Non-Joiner (U+200C) when breaking a cursive > connection > - Allow Zero Width Non-Joiner (U+200C) in a conjunct context > - Allow Zero Width Joiner (U+200D) in a conjunct context > > > <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#impact-on-existing-code>Impact > on existing code > > This has potential to be a code-breaking change in cases where people may > have used distinct, but identical looking, identifiers with different > Unicode representations. The likelihood of that happening in actual code is > very small and the problem can be solved by renaming identifiers that don't > conform to the new normalized form into new non-colliding identifiers. > > <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#alternatives-considered>Alternatives > considered > > The option of ignoring *default-ignorable* characters in identifiers was > also discussed, but it was considered to be more confusing and less secure > than explicitly treating them as errors. > > <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#unaddressed-issues>Unaddressed > Issues > There was some discussion around the issue of Unicode confusable > characters, but it was considered to be out of scope for this proposal. > Unicode confusable characters are a complicated issue and any possible > solutions also come with significant drawbacks that would require more time > and consideration. > > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution > >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
