Some thoughts

> On Sep 18, 2016, at 3:33 PM, Jacob Bandes-Storch via swift-evolution 
> <swift-evolution@swift.org> wrote:
> 
> TL;DR:
> 
> Swift 4 Stage 1 seeks to prioritize "Source stability features". Most 
> source-breaking changes were done with in Swift 3; however, the 
> categorization of Unicode characters into identifiers & operators was never 
> thoroughly discussed on swift-evolution. This seems like it might be our last 
> chance, and I think there are some big improvements to be had.
> 
> I've gathered some information+thoughts into an early-stage pitch / 
> pre-proposal. It doesn't really have a conclusion, so I'm hoping we can 
> discuss these issues and come up with good (pragmatic) solutions here. I 
> imagine this can morph into a proposal later.
> 
> You can read the following in nicer HTML form at 
> https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59 
> <https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59>
> 
> I look forward to the discussion!
> 
> -Jacob
> 
> # Background and motivation
> 
> To ease lexing/parsing and avoid user confusion, the names of custom 
> identifiers (type names, variable names, etc.) and operators in Swift can be 
> composed of (mostly) separate sets of characters.
> 
> Using terminology from TSPL:
> 
> `identifier-head`/`operator-head` are characters which can begin an 
> identifier or operator.
> 
> `identifier-character`/`operator-character` are characters which can appear 
> anywhere in an identifier or operator (these are supersets of the `-head` 
> sets).
> 
> <https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html
>  
> <https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html>>
> 
> (Note also that some particular arrangements of characters are reserved; for 
> instance, `$` followed by digits for an implicit closure parameter, and "If 
> an operator doesn’t begin with a dot, it can’t contain a dot elsewhere." 
> There are also special characters in the language which are neither 
> identifiers nor operators, such as: `()[]{},:@#`)
> 
> 
> ## Prior discussion on swift-evolution
> 
> "Request to add middle dot (U+00B7) as operator character?"
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html
>  
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html>>
> 
> "Free the '$' Symbol!"
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html
>  
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html>>
> 
> "Proposal: Allow Single Dollar Sign as Valid Identifier"
> <https://github.com/apple/swift-evolution/pull/354 
> <https://github.com/apple/swift-evolution/pull/354>>
> 
> 
> Chris Lattner has said:
> 
> > "...our current operator space (particularly the unicode segments covered) 
> > is not super well considered.  It would be great for someone to take a more 
> > systematic pass over them to rationalize things."
> 
> > "We need a token to be unambiguously an operator or identifier - we can 
> > have different rules for the leading and subsequent characters though."
> 

I feel a bit bad having implemented the patch that banned this - it feels like 
dollar was mistakenly left out of the operator character range considering how 
well it worked in operators up to then.  Disambiguation with respect to other 
language constructs (anonymous parameters in closures and LLDB variables) is 
trivial and we already had diagnostics about it.

I definitely support having Swift’s operators use a wider range of the unicode 
spectrum - perhaps even a policy where instead of whitelisting ranges we 
blacklist reserved characters or ranges.

> 
> # Current state of affairs
> 
> Swift's `identifier-head` and `identifier-character` mostly conform to the 
> recommendations in 
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3146.html 
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3146.html>>
> <https://github.com/apple/swift/blob/08e7963/lib/Parse/Lexer.cpp#L421-L489 
> <https://github.com/apple/swift/blob/08e7963/lib/Parse/Lexer.cpp#L421-L489>>
> 
> The allowed operator characters include "Unicode math, symbol, arrow, 
> dingbat, and line/box drawing chars", however I don't believe this aligns 
> with any particular spec:
> <https://github.com/apple/swift/blob/08e7963/include/swift/AST/Identifier.h#L87-L121
>  
> <https://github.com/apple/swift/blob/08e7963/include/swift/AST/Identifier.h#L87-L121>>
> <https://github.com/apple/swift/commit/a2341a4 
> <https://github.com/apple/swift/commit/a2341a4>>
> 
> 
> 
> ## Identifiers/operators elsewhere
> 
> There is an Unicode Standard Annex "identifier and pattern syntax" 
> <http://unicode.org/reports/tr31/ <http://unicode.org/reports/tr31/>> which 
> defines the categories `ID_Start`/`ID_Continue`.
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AID_Continue%3A%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AID_Continue%3A%5D>>
> 
> ### ECMAScript 2015 "ES6"
> 
> Uses `ID_Start` and `ID_Continue`, as well as `Other_ID_Start` / 
> `Other_ID_Continue`.
> <http://www.ecma-international.org/ecma-262/6.0/#sec-names-and-keywords 
> <http://www.ecma-international.org/ecma-262/6.0/#sec-names-and-keywords>>
> 
> ### Haskell
> 
> Distinguishes identifiers/operators by their general category (such as "any 
> Unicode lowercase letter", "any Unicode symbol or punctuation", etc.).
> <http://www.fileformat.info/info/unicode/category/index.htm 
> <http://www.fileformat.info/info/unicode/category/index.htm>>
> 
> In particular, identifiers can start with any lowercase letter or _, and may 
> contain any letter/digit/'/_. This would seem to include letters like δ and 
> Я, and digits like ٢.
> 
> <https://www.haskell.org/onlinereport/syntax-iso.html 
> <https://www.haskell.org/onlinereport/syntax-iso.html>>
> <https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973
>  
> <https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973>>
> 

To give a language that supports the extreme case: Coq and Agda allow the full 
range of the Unicode spectrum (or so their implementation/docs would seem to 
say) in identifiers.

> 
> 
> # Current problems
> 
> ## Weird identifier code points
> 
> The current `identifier-character` set contains many characters which 
> wouldn't make good identifiers:
> 
> - 11 entire planes of characters (U+20000–U+2FFFD, etc.) which are currently 
> unassigned.
> - The middle dot · which looks like an operator.
> - Many non-combining "modifiers" and accent marks, such as ´ and ¨ and ꓻ 
> which don't really make sense on their own.
> - "Tone marks" from various languages, including ˫ (similar to a box-drawing 
> character ├ which is an operator).
> - The "Greek question mark" ;
> - Symbols which are simply not linguistic, such as ۞ and ༒.
> 
> short url: <https://goo.gl/tyn0Cz <https://goo.gl/tyn0Cz>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D>>
> 
> ## Weird operator code points
> 
> The current `operator-character` set has a lot of characters that are clearly 
> operator-esque (≈ ∈ ⊕ ⊅), but some things are not so obviously desirable:
> 
> - Box-drawing characters
> - Combining accents and other characters
> - Various symbols, e.g. ⚄ and ♄ (this category also overlaps with emoji)
> - Braille patterns such as ⠟ — should they not be treated as letter-like 
> (thus identifiers)?
> - A plethora of arrows
> 
> short url: <https://goo.gl/s136Nh <https://goo.gl/s136Nh>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D>>
> 
> 
> ## Code points which are both
> 
> A handful of characters are accepted both as `identifier-head` and 
> `operator-head` (which seems pointless and might have been unintentional):
> 
> U+3021–U+3029, Suzhou numerals  〡〢〣〤〥〦〧〨〩 
> <https://en.wikipedia.org/wiki/Suzhou_numerals 
> <https://en.wikipedia.org/wiki/Suzhou_numerals>>
> 
> U+302A–U+302F, ideographic & hangul tone marks   〪  〫  〬  〭  〮  〯
> 
>     let 〨 = 2
>     infix operator <〨>
> 
> (Note that `infix operator 〨` doesn't work because the lexer greedily treats 
> this as an identifier. Also, interestingly, the corresponding ideographic 
> zero 〇 is only an identifier char.)
> 
> short url: <https://goo.gl/lZcMqO <https://goo.gl/lZcMqO>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d]
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d]>>
> 
> In addition to the numerals and tone marks above, many (all?) combining marks 
> are accepted as `identifier-character` and `operator-character`. These may be 
> necessary for natural-looking words in some languages, but they don't seem 
> necessary for operators.
> 
> Also present in both sets are the variation selectors 1 through 256 
> (U+FE00–U+FE0F, U+E0100–U+E01EF). It seems they are of limited use for the 
> operator characters, unless you count the emoji: 
> <http://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt 
> <http://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt>>
> 
> short url: <https://goo.gl/VKrisf <https://goo.gl/VKrisf>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%0d%0a%5b0-9%0d%0a%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE20-%5cuFE2F%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d%0d%0a%5b%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE00-%5cuFE0F%0d%0a%5cuFE20-%5cuFE2F%0d%0a%5cU000E0100-%5cU000E01EF%5d]
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%0d%0a%5b0-9%0d%0a%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE20-%5cuFE2F%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d%0d%0a%5b%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE00-%5cuFE0F%0d%0a%5cuFE20-%5cuFE2F%0d%0a%5cU000E0100-%5cU000E01EF%5d]>>
> 
> 
> ## Code points which should be illegal
> 
> There are several surprising non-printing characters, including:
> 
> - U+2064 INVISIBLE PLUS is currently an identifier
> - U+200B ZERO WIDTH SPACE is currently an identifier
> 
> No good will come of these. Invisible characters should probably be 
> disallowed (although some may be necessary for properly joining/splitting 
> characters in some other languages).
> 
> 
> ## Categories which are split between identifiers and operators
> 
> - Emoji and symbols: most of the newer emoji are identifiers, but many 
> emoji/pictographs are operators, especially those from "Miscellaneous 
> Symbols". The results are hilariously illogical:
> 
>   - ☹️ is an operator, but 🙂 is an identifier.
>   - ✌️ is an operator, but 🤘 is an identifier.
>   - 🔼 is an operator, but ▶️ is an identifier.
>   - ✳️ is an operator, but 🔯 is an identifier.
>   - ✈️ is an operator, but 🛩 is an identifier.
>   - ♠️ is an operator, but 🂡 is an identifier. (Presumably, 🂡 = A ♠️ 🂠!)
> 
>   (But the counterintuitive examples extend outside the emoji too: + is an 
> operator, while ₊ and ⁺ are identifiers.)
> 
> - Currency symbols: ¢ £ ¤ ¥ are operators, but ₪ € ₱ ₹ ฿ and many others are 
> identifiers, and $ is allowed in an identifier.
> 
> 
> ## Missing characters
> 
> A handful of characters are neither operators nor identifiers. This list 
> mostly makes sense (reserved characters and whitespace), but I wonder about a 
> few which seem like they could easily be operators: ⑊ ⑀ ﹅ etc.
> 
> short url: <https://goo.gl/U0GVNn <https://goo.gl/U0GVNn>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%5Cu0001-%5CU0010FFFF%5D-%5B%5B%2F%3D%5C-%2B!*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D%5D%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%5Cu0001-%5CU0010FFFF%5D-%5B%5B%2F%3D%5C-%2B!*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D%5D%5D>>
> 
> 
> # Solutions
> 
> Still up for discussion — please reply to this thread!
> 
> Adopting (X)ID_Start/Continue for identifiers, or a simpler solution like 
> Haskell's use of "letter" categories, might work well.
> 
> (I've given up hope of finding some kind of "perfect" solution — how can it 
> be possible, when ᛏ is a letter, yet ↑ is not?)
> 
> Making the choice of operator characters more logical/standards-based would 
> be nice (not just a set of ranges). However, Haskell's approach of using all 
> punctuation & symbols is probably not right for Swift:
> 
> short url: <https://goo.gl/Ud4KqY <https://goo.gl/Ud4KqY>>
> 
> <http://unicode.org/cldr/utility/unicodeset.jsp?a=%5B%5B-%2F%3D%2B!*%25%3C%3E%5C%26%7C%5C%5E~?%5Cu00A1-%5Cu00A7%5Cu00A9%5Cu00AB%5Cu00AC%5Cu00AE%5Cu00B0-%5Cu00B1%5Cu00B6%5Cu00BB%5Cu00BF%5Cu00D7%5Cu00F7%5Cu2016-%5Cu2017%5Cu2020-%5Cu2027%5Cu2030-%5Cu203E%5Cu2041-%5Cu2053%5Cu2055-%5Cu205E%5Cu2190-%5Cu23FF%5Cu2500-%5Cu2775%5Cu2794-%5Cu2BFF%5Cu2E00-%5Cu2E7F%5Cu3001-%5Cu3003%5Cu3008-%5Cu3030%5Cu0300-%5Cu036F%5Cu1DC0-%5Cu1DFF%5Cu20D0-%5Cu20FF%5CuFE00-%5CuFE0F%5CuFE20-%5CuFE2F%5CU000E0100-%5CU000E01EF%5D%5D&b=%5B%5B:Currency_Symbol:%5D%5B:Modifier_Symbol:%5D%5B:Math_Symbol:%5D%5B:Other_Symbol:%5D%5B:Connector_Punctuation:%5D%5B:Dash_Punctuation:%5D%5B:Close_Punctuation:%5D%5B:Final_Punctuation:%5D%5B:Initial_Punctuation:%5D%5B:Other_Punctuation:%5D%5B:Open_Punctuation:%5D%5D
>  
> <http://unicode.org/cldr/utility/unicodeset.jsp?a=%5B%5B-%2F%3D%2B!*%25%3C%3E%5C%26%7C%5C%5E~?%5Cu00A1-%5Cu00A7%5Cu00A9%5Cu00AB%5Cu00AC%5Cu00AE%5Cu00B0-%5Cu00B1%5Cu00B6%5Cu00BB%5Cu00BF%5Cu00D7%5Cu00F7%5Cu2016-%5Cu2017%5Cu2020-%5Cu2027%5Cu2030-%5Cu203E%5Cu2041-%5Cu2053%5Cu2055-%5Cu205E%5Cu2190-%5Cu23FF%5Cu2500-%5Cu2775%5Cu2794-%5Cu2BFF%5Cu2E00-%5Cu2E7F%5Cu3001-%5Cu3003%5Cu3008-%5Cu3030%5Cu0300-%5Cu036F%5Cu1DC0-%5Cu1DFF%5Cu20D0-%5Cu20FF%5CuFE00-%5CuFE0F%5CuFE20-%5CuFE2F%5CU000E0100-%5CU000E01EF%5D%5D&b=%5B%5B:Currency_Symbol:%5D%5B:Modifier_Symbol:%5D%5B:Math_Symbol:%5D%5B:Other_Symbol:%5D%5B:Connector_Punctuation:%5D%5B:Dash_Punctuation:%5D%5B:Close_Punctuation:%5D%5B:Final_Punctuation:%5D%5B:Initial_Punctuation:%5D%5B:Other_Punctuation:%5D%5B:Open_Punctuation:%5D%5D>>
> 
> I'm not really sure what to do with emoji — they're a very cute novelty 
> feature, but I don't know what the motivation is for including these as valid 
> operators/identifiers.
> 
> At the least, we should try to gather them all into one of the two 
> categories. My inclination would be to keep them as identifiers, which would 
> mean moving the following out of the operator category:
> 
> short url: <https://goo.gl/CBJEKX <https://goo.gl/CBJEKX>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AEmoji%3A%5D%26%5B%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5D%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AEmoji%3A%5D%26%5B%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5D%5D>>
> 
> 
> # Concurrently-discussable topics
> 
> There are a few relevant topics that came to mind, which I think are worth 
> discussing around the same time.
> 
> ## Dollar signs ($)
> 
> $ is currently allowed in identifiers, but it can't begin an identifier 
> except for the magic implicit closure params ($0, $1, ...) and 
> LLDB/REPL-related uses.
> 
> It's arguable, but I feel that $ would be more effective as an operator 
> character than an identifier character. There's precedent in Haskell for 
> operators like `<$>` and being able to replicate these in Swift would be nice.
> 
> 
> ## Diagnostics improvements
> 
> Regardless of what ends up being the ultimate solution, it would be great to 
> improve diagnostics for cases when the wrong types of characters are used.
> 
> `infix operator abc` produces `'abc' is considered to be an identifier, not 
> an operator`. That's not too bad.
> 
> `let +++ = 3` produces `expected pattern`.
> 
> `let $foo = 3` produces `expected numeric value following '$'`.
> 
> 
> ## Security and сοnfuѕаbIе characters
> 
> Confusable characters (e vs. е, o vs. ο, ; vs. ;) are an issue not taken 
> lightly in the world of web security (cf. domain names). I haven't found much 
> information about whether this has been considered a major security issue in 
> programming languages, but I would think so (one can imagine such characters 
> being introduced to a codebase subtly over time, hiding malicious 
> functionality).
> 
> It'd be pretty cool if Swift could detect whether two identifiers might be 
> confusable, and produce a warning.
> 
> <http://www.unicode.org/reports/tr36/#Recommendations_General 
> <http://www.unicode.org/reports/tr36/#Recommendations_General>>
> <http://unicode.org/reports/tr39/#Confusable_Detection 
> <http://unicode.org/reports/tr39/#Confusable_Detection>>
> 

We have had a patch sitting in the queue for a long time now 
<https://github.com/apple/swift/pull/732> that does diagnostics for confusables 
if you want to take that up again.

> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to