This proposal [gist 
<https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800>] is the 
result of the discussions from the thread "Prohibit invisible characters in 
identifier names 
<http://thread.gmane.org/gmane.comp.lang.swift.evolution/21022>". I hope it's 
still on time for inclusion in Swift 3.

Sincerely,
João Pinheiro


Normalize Unicode Identifiers

Proposal: SE-NNNN 
<https://gist.github.com/JoaoPinheiro/NNNN-normalize-identifiers.md>
Author: João Pinheiro <https://github.com/joaopinheiro>
Status: Awaiting review
Review manager: TBD
 
<https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#introduction>Introduction

This proposal aims to introduce identifier normalization in order to prevent 
the unsafe and potentially abusive use of invisible or equivalent 
representations of Unicode characters in identifiers.

Swift-evolution thread: Discussion thread 
<http://thread.gmane.org/gmane.comp.lang.swift.evolution/21022>
 
<https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#motivation>Motivation

Even though Swift supports the use of Unicode for identifiers, these aren't yet 
normalized. This allows for different Unicode representations of the same 
characters to be considered distinct identifiers.

For example:

let Å = "Angstrom"
let Å = "Latin Capital Letter A With Ring Above"
let Å = "Latin Capital Letter A + Combining Ring Above"
In addition to that, default-ignorable characters like the Zero Width Space and 
Zero Width Non-Joiner (exemplified below) are also currently accepted as valid 
parts of identifiers without any restrictions.

let ab = "ab"
let a​b = "a + Zero Width Space + b"

func xy() { print("xy") }
func x‌y() { print("x + <Zero Width Non-Joiner> + y") }
The use of default-ignorable characters in identifiers is problematical, first 
because the effects they represent are stylistic or otherwise out of scope for 
identifiers, and second because the characters themselves often have no visible 
display. It is also possible to misapply these characters such that users can 
create strings that look the same but actually contain different characters, 
which can create security problems.

 
<https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#proposed-solution>Proposed
 solution

Normalize Swift identifiers according to the normalization form NFC recommended 
for case-sensitive languages in the Unicode Standard Annexes 15 
<https://gist.github.com/JoaoPinheiro/UAX15> and 31 
<https://gist.github.com/JoaoPinheiro/UAX31> and follow the Normalization 
Charts <https://gist.github.com/JoaoPinheiro/NormalizationCharts>.

In addition to that, prohibit the use of default-ignorable characters in 
identifiers except in the special cases described in UAX31 
<https://gist.github.com/JoaoPinheiro/UAX31>, listed below:

Allow Zero Width Non-Joiner (U+200C) when breaking a cursive connection
Allow Zero Width Non-Joiner (U+200C) in a conjunct context
Allow Zero Width Joiner (U+200D) in a conjunct context
 
<https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#impact-on-existing-code>Impact
 on existing code

This has potential to be a code-breaking change in cases where people may have 
used distinct, but identical looking, identifiers with different Unicode 
representations. The likelihood of that happening in actual code is very small 
and the problem can be solved by renaming identifiers that don't conform to the 
new normalized form into new non-colliding identifiers.

 
<https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#alternatives-considered>Alternatives
 considered

The option of ignoring default-ignorable characters in identifiers was also 
discussed, but it was considered to be more confusing and less secure than 
explicitly treating them as errors.

 
<https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#unaddressed-issues>Unaddressed
 Issues

There was some discussion around the issue of Unicode confusable characters, 
but it was considered to be out of scope for this proposal. Unicode confusable 
characters are a complicated issue and any possible solutions also come with 
significant drawbacks that would require more time and consideration.
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to