Let me tl;dr'er this even more: ☹️ is an operator, but 🙂 is an identifier.

-- E, succinct, who thinks there's room for improvement


> On Sep 18, 2016, at 1:33 PM, Jacob Bandes-Storch via swift-evolution 
> <swift-evolution@swift.org> wrote:
> 
> TL;DR:
> 
> Swift 4 Stage 1 seeks to prioritize "Source stability features". Most 
> source-breaking changes were done with in Swift 3; however, the 
> categorization of Unicode characters into identifiers & operators was never 
> thoroughly discussed on swift-evolution. This seems like it might be our last 
> chance, and I think there are some big improvements to be had.
> 
> I've gathered some information+thoughts into an early-stage pitch / 
> pre-proposal. It doesn't really have a conclusion, so I'm hoping we can 
> discuss these issues and come up with good (pragmatic) solutions here. I 
> imagine this can morph into a proposal later.
> 
> You can read the following in nicer HTML form at 
> https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59 
> <https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59>
> 
> I look forward to the discussion!
> 
> -Jacob
> 
> # Background and motivation
> 
> To ease lexing/parsing and avoid user confusion, the names of custom 
> identifiers (type names, variable names, etc.) and operators in Swift can be 
> composed of (mostly) separate sets of characters.
> 
> Using terminology from TSPL:
> 
> `identifier-head`/`operator-head` are characters which can begin an 
> identifier or operator.
> 
> `identifier-character`/`operator-character` are characters which can appear 
> anywhere in an identifier or operator (these are supersets of the `-head` 
> sets).
> 
> <https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html
>  
> <https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html>>
> 
> (Note also that some particular arrangements of characters are reserved; for 
> instance, `$` followed by digits for an implicit closure parameter, and "If 
> an operator doesn’t begin with a dot, it can’t contain a dot elsewhere." 
> There are also special characters in the language which are neither 
> identifiers nor operators, such as: `()[]{},:@#`)
> 
> 
> ## Prior discussion on swift-evolution
> 
> "Request to add middle dot (U+00B7) as operator character?"
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html
>  
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html>>
> 
> "Free the '$' Symbol!"
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html
>  
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html>>
> 
> "Proposal: Allow Single Dollar Sign as Valid Identifier"
> <https://github.com/apple/swift-evolution/pull/354 
> <https://github.com/apple/swift-evolution/pull/354>>
> 
> 
> Chris Lattner has said:
> 
> > "...our current operator space (particularly the unicode segments covered) 
> > is not super well considered.  It would be great for someone to take a more 
> > systematic pass over them to rationalize things."
> 
> > "We need a token to be unambiguously an operator or identifier - we can 
> > have different rules for the leading and subsequent characters though."
> 
> 
> # Current state of affairs
> 
> Swift's `identifier-head` and `identifier-character` mostly conform to the 
> recommendations in 
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3146.html 
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3146.html>>  
> <https://github.com/apple/swift/blob/08e7963/lib/Parse/Lexer.cpp#L421-L489 
> <https://github.com/apple/swift/blob/08e7963/lib/Parse/Lexer.cpp#L421-L489>>
> 
> The allowed operator characters include "Unicode math, symbol, arrow, 
> dingbat, and line/box drawing chars", however I don't believe this aligns 
> with any particular spec:
> <https://github.com/apple/swift/blob/08e7963/include/swift/AST/Identifier.h#L87-L121
>  
> <https://github.com/apple/swift/blob/08e7963/include/swift/AST/Identifier.h#L87-L121>>
>   
> <https://github.com/apple/swift/commit/a2341a4 
> <https://github.com/apple/swift/commit/a2341a4>>
> 
> 
> 
> ## Identifiers/operators elsewhere
> 
> There is an Unicode Standard Annex "identifier and pattern syntax" 
> <http://unicode.org/reports/tr31/ <http://unicode.org/reports/tr31/>> which 
> defines the categories `ID_Start`/`ID_Continue`.
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AID_Continue%3A%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AID_Continue%3A%5D>>
> 
> ### ECMAScript 2015 "ES6"
> 
> Uses `ID_Start` and `ID_Continue`, as well as `Other_ID_Start` / 
> `Other_ID_Continue`.
> <http://www.ecma-international.org/ecma-262/6.0/#sec-names-and-keywords 
> <http://www.ecma-international.org/ecma-262/6.0/#sec-names-and-keywords>>
> 
> ### Haskell
> 
> Distinguishes identifiers/operators by their general category (such as "any 
> Unicode lowercase letter", "any Unicode symbol or punctuation", etc.).  
> <http://www.fileformat.info/info/unicode/category/index.htm 
> <http://www.fileformat.info/info/unicode/category/index.htm>>
> 
> In particular, identifiers can start with any lowercase letter or _, and may 
> contain any letter/digit/'/_. This would seem to include letters like δ and 
> Я, and digits like ٢.
> 
> <https://www.haskell.org/onlinereport/syntax-iso.html 
> <https://www.haskell.org/onlinereport/syntax-iso.html>>  
> <https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973
>  
> <https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973>>
> 
> 
> 
> # Current problems
> 
> ## Weird identifier code points
> 
> The current `identifier-character` set contains many characters which 
> wouldn't make good identifiers:
> 
> - 11 entire planes of characters (U+20000–U+2FFFD, etc.) which are currently 
> unassigned.
> - The middle dot · which looks like an operator.
> - Many non-combining "modifiers" and accent marks, such as ´ and ¨ and ꓻ 
> which don't really make sense on their own.
> - "Tone marks" from various languages, including ˫ (similar to a box-drawing 
> character ├ which is an operator).
> - The "Greek question mark" ;
> - Symbols which are simply not linguistic, such as ۞ and ༒.
> 
> short url: <https://goo.gl/tyn0Cz <https://goo.gl/tyn0Cz>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D>>
> 
> ## Weird operator code points
> 
> The current `operator-character` set has a lot of characters that are clearly 
> operator-esque (≈ ∈ ⊕ ⊅), but some things are not so obviously desirable:
> 
> - Box-drawing characters
> - Combining accents and other characters
> - Various symbols, e.g. ⚄ and ♄ (this category also overlaps with emoji)
> - Braille patterns such as ⠟ — should they not be treated as letter-like 
> (thus identifiers)?
> - A plethora of arrows
> 
> short url: <https://goo.gl/s136Nh <https://goo.gl/s136Nh>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D>>
> 
> 
> ## Code points which are both
> 
> A handful of characters are accepted both as `identifier-head` and 
> `operator-head` (which seems pointless and might have been unintentional):
> 
> U+3021–U+3029, Suzhou numerals  〡〢〣〤〥〦〧〨〩 
> <https://en.wikipedia.org/wiki/Suzhou_numerals 
> <https://en.wikipedia.org/wiki/Suzhou_numerals>>  
> 
> U+302A–U+302F, ideographic & hangul tone marks   〪  〫  〬  〭  〮  〯
> 
>     let 〨 = 2
>     infix operator <〨>
> 
> (Note that `infix operator 〨` doesn't work because the lexer greedily treats 
> this as an identifier. Also, interestingly, the corresponding ideographic 
> zero 〇 is only an identifier char.)
> 
> short url: <https://goo.gl/lZcMqO <https://goo.gl/lZcMqO>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d]
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d]>>
> 
> In addition to the numerals and tone marks above, many (all?) combining marks 
> are accepted as `identifier-character` and `operator-character`. These may be 
> necessary for natural-looking words in some languages, but they don't seem 
> necessary for operators.
> 
> Also present in both sets are the variation selectors 1 through 256 
> (U+FE00–U+FE0F, U+E0100–U+E01EF). It seems they are of limited use for the 
> operator characters, unless you count the emoji: 
> <http://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt 
> <http://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt>>
> 
> short url: <https://goo.gl/VKrisf <https://goo.gl/VKrisf>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%0d%0a%5b0-9%0d%0a%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE20-%5cuFE2F%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d%0d%0a%5b%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE00-%5cuFE0F%0d%0a%5cuFE20-%5cuFE2F%0d%0a%5cU000E0100-%5cU000E01EF%5d]
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%0d%0a%5b0-9%0d%0a%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE20-%5cuFE2F%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d%0d%0a%5b%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE00-%5cuFE0F%0d%0a%5cuFE20-%5cuFE2F%0d%0a%5cU000E0100-%5cU000E01EF%5d]>>
> 
> 
> ## Code points which should be illegal
> 
> There are several surprising non-printing characters, including:
> 
> - U+2064 INVISIBLE PLUS is currently an identifier
> - U+200B ZERO WIDTH SPACE is currently an identifier
> 
> No good will come of these. Invisible characters should probably be 
> disallowed (although some may be necessary for properly joining/splitting 
> characters in some other languages).
> 
> 
> ## Categories which are split between identifiers and operators
> 
> - Emoji and symbols: most of the newer emoji are identifiers, but many 
> emoji/pictographs are operators, especially those from "Miscellaneous 
> Symbols". The results are hilariously illogical:
> 
>   - ☹️ is an operator, but 🙂 is an identifier.
>   - ✌️ is an operator, but 🤘 is an identifier.
>   - 🔼 is an operator, but ▶️ is an identifier.
>   - ✳️ is an operator, but 🔯 is an identifier.
>   - ✈️ is an operator, but 🛩 is an identifier.
>   - ♠️ is an operator, but 🂡 is an identifier. (Presumably, 🂡 = A ♠️ 🂠!)
>   
>   (But the counterintuitive examples extend outside the emoji too: + is an 
> operator, while ₊ and ⁺ are identifiers.)
> 
> - Currency symbols: ¢ £ ¤ ¥ are operators, but ₪ € ₱ ₹ ฿ and many others are 
> identifiers, and $ is allowed in an identifier.
> 
> 
> ## Missing characters
> 
> A handful of characters are neither operators nor identifiers. This list 
> mostly makes sense (reserved characters and whitespace), but I wonder about a 
> few which seem like they could easily be operators: ⑊ ⑀ ﹅ etc.
> 
> short url: <https://goo.gl/U0GVNn <https://goo.gl/U0GVNn>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%5Cu0001-%5CU0010FFFF%5D-%5B%5B%2F%3D%5C-%2B!*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D%5D%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%5Cu0001-%5CU0010FFFF%5D-%5B%5B%2F%3D%5C-%2B!*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D%5D%5D>>
> 
> 
> # Solutions
> 
> Still up for discussion — please reply to this thread!
> 
> Adopting (X)ID_Start/Continue for identifiers, or a simpler solution like 
> Haskell's use of "letter" categories, might work well.
> 
> (I've given up hope of finding some kind of "perfect" solution — how can it 
> be possible, when ᛏ is a letter, yet ↑ is not?)
> 
> Making the choice of operator characters more logical/standards-based would 
> be nice (not just a set of ranges). However, Haskell's approach of using all 
> punctuation & symbols is probably not right for Swift:
> 
> short url: <https://goo.gl/Ud4KqY <https://goo.gl/Ud4KqY>>
> 
> <http://unicode.org/cldr/utility/unicodeset.jsp?a=%5B%5B-%2F%3D%2B!*%25%3C%3E%5C%26%7C%5C%5E~?%5Cu00A1-%5Cu00A7%5Cu00A9%5Cu00AB%5Cu00AC%5Cu00AE%5Cu00B0-%5Cu00B1%5Cu00B6%5Cu00BB%5Cu00BF%5Cu00D7%5Cu00F7%5Cu2016-%5Cu2017%5Cu2020-%5Cu2027%5Cu2030-%5Cu203E%5Cu2041-%5Cu2053%5Cu2055-%5Cu205E%5Cu2190-%5Cu23FF%5Cu2500-%5Cu2775%5Cu2794-%5Cu2BFF%5Cu2E00-%5Cu2E7F%5Cu3001-%5Cu3003%5Cu3008-%5Cu3030%5Cu0300-%5Cu036F%5Cu1DC0-%5Cu1DFF%5Cu20D0-%5Cu20FF%5CuFE00-%5CuFE0F%5CuFE20-%5CuFE2F%5CU000E0100-%5CU000E01EF%5D%5D&b=%5B%5B:Currency_Symbol:%5D%5B:Modifier_Symbol:%5D%5B:Math_Symbol:%5D%5B:Other_Symbol:%5D%5B:Connector_Punctuation:%5D%5B:Dash_Punctuation:%5D%5B:Close_Punctuation:%5D%5B:Final_Punctuation:%5D%5B:Initial_Punctuation:%5D%5B:Other_Punctuation:%5D%5B:Open_Punctuation:%5D%5D
>  
> <http://unicode.org/cldr/utility/unicodeset.jsp?a=%5B%5B-%2F%3D%2B!*%25%3C%3E%5C%26%7C%5C%5E~?%5Cu00A1-%5Cu00A7%5Cu00A9%5Cu00AB%5Cu00AC%5Cu00AE%5Cu00B0-%5Cu00B1%5Cu00B6%5Cu00BB%5Cu00BF%5Cu00D7%5Cu00F7%5Cu2016-%5Cu2017%5Cu2020-%5Cu2027%5Cu2030-%5Cu203E%5Cu2041-%5Cu2053%5Cu2055-%5Cu205E%5Cu2190-%5Cu23FF%5Cu2500-%5Cu2775%5Cu2794-%5Cu2BFF%5Cu2E00-%5Cu2E7F%5Cu3001-%5Cu3003%5Cu3008-%5Cu3030%5Cu0300-%5Cu036F%5Cu1DC0-%5Cu1DFF%5Cu20D0-%5Cu20FF%5CuFE00-%5CuFE0F%5CuFE20-%5CuFE2F%5CU000E0100-%5CU000E01EF%5D%5D&b=%5B%5B:Currency_Symbol:%5D%5B:Modifier_Symbol:%5D%5B:Math_Symbol:%5D%5B:Other_Symbol:%5D%5B:Connector_Punctuation:%5D%5B:Dash_Punctuation:%5D%5B:Close_Punctuation:%5D%5B:Final_Punctuation:%5D%5B:Initial_Punctuation:%5D%5B:Other_Punctuation:%5D%5B:Open_Punctuation:%5D%5D>>
> 
> I'm not really sure what to do with emoji — they're a very cute novelty 
> feature, but I don't know what the motivation is for including these as valid 
> operators/identifiers.
> 
> At the least, we should try to gather them all into one of the two 
> categories. My inclination would be to keep them as identifiers, which would 
> mean moving the following out of the operator category:
> 
> short url: <https://goo.gl/CBJEKX <https://goo.gl/CBJEKX>>
> 
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AEmoji%3A%5D%26%5B%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5D%5D
>  
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AEmoji%3A%5D%26%5B%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5D%5D>>
> 
> 
> # Concurrently-discussable topics
> 
> There are a few relevant topics that came to mind, which I think are worth 
> discussing around the same time.
> 
> ## Dollar signs ($)
> 
> $ is currently allowed in identifiers, but it can't begin an identifier 
> except for the magic implicit closure params ($0, $1, ...) and 
> LLDB/REPL-related uses.
> 
> It's arguable, but I feel that $ would be more effective as an operator 
> character than an identifier character. There's precedent in Haskell for 
> operators like `<$>` and being able to replicate these in Swift would be nice.
> 
> 
> ## Diagnostics improvements
> 
> Regardless of what ends up being the ultimate solution, it would be great to 
> improve diagnostics for cases when the wrong types of characters are used.
> 
> `infix operator abc` produces `'abc' is considered to be an identifier, not 
> an operator`. That's not too bad.
> 
> `let +++ = 3` produces `expected pattern`.
> 
> `let $foo = 3` produces `expected numeric value following '$'`.
> 
> 
> ## Security and сοnfuѕаbIе characters
> 
> Confusable characters (e vs. е, o vs. ο, ; vs. ;) are an issue not taken 
> lightly in the world of web security (cf. domain names). I haven't found much 
> information about whether this has been considered a major security issue in 
> programming languages, but I would think so (one can imagine such characters 
> being introduced to a codebase subtly over time, hiding malicious 
> functionality).
> 
> It'd be pretty cool if Swift could detect whether two identifiers might be 
> confusable, and produce a warning.
> 
> <http://www.unicode.org/reports/tr36/#Recommendations_General 
> <http://www.unicode.org/reports/tr36/#Recommendations_General>>  
> <http://unicode.org/reports/tr39/#Confusable_Detection 
> <http://unicode.org/reports/tr39/#Confusable_Detection>>
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to