Warning ! This definition of allowed identifiers has severe security risks: it does not support any kind of normalization or canonical equivalence, and it's impossible to use normalization in the language lexer/parser while making sure that they will be stable over the set of unassigned characters that may be assigned later.
This could cause unecpected bindings initially impossible to enter in collision later with new normalizations (notably if unassigned code poitns get assigned to combining characters with non-zero combining class, or to base characters with combining class 0 but forbidden from recombining (i.e. disallowed in standard normalization forms). No programming language should allow using unassigned characters, they should be checked and marked as invalid (note; this check can work in a compiled version of the language, but will not work in a repository of source code where the only check is possible by parsing all source files in a repositry to make sure that there's no unassigned codepoint anywhere in their source text ; the source repository should enforce this by defining clearly the UCS version it accepts for source files, but as far as I know, no usual source repositories perform this check, that can only be done by extracting all sources from it using some bot script that will detect unassigned code points in these sources). The alternative of not allowing any normalization of identifiers is not safe when source code editors may easily renormalize the identifiers, or when these source may be edited by different users using different input methods. 2014-06-05 17:27 GMT+02:00 "Martin v. Löwis" <mar...@v.loewis.de>: > Am 04.06.14 11:28, schrieb Andre Schappo: > > The restrictions seem a little like IDNA2008. Anyone have links to > > info giving a detailed explanation/tabulation of allowed and non > > allowed Unicode chars for Swift Variable and Constant names? > > The language reference is at > > > https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html > > For reference, the definition of identifier-character is (read each > line as an alternative) > > identifier-character → Digit 0 through 9 > identifier-character → U+0300–U+036F, U+1DC0–U+1DFF, U+20D0–U+20FF, or > U+FE20–U+FE2F > identifier-character → identifier-head > > where identifier-head is > > identifier-head → Upper- or lowercase letter A through Z > identifier-head → U+00A8, U+00AA, U+00AD, U+00AF, U+00B2–U+00B5, or > U+00B7–U+00BA > identifier-head → U+00BC–U+00BE, U+00C0–U+00D6, U+00D8–U+00F6, or > U+00F8–U+00FF > identifier-head → U+0100–U+02FF, U+0370–U+167F, U+1681–U+180D, or > U+180F–U+1DBF > identifier-head → U+1E00–U+1FFF > identifier-head → U+200B–U+200D, U+202A–U+202E, U+203F–U+2040, U+2054, > or U+2060–U+206F > identifier-head → U+2070–U+20CF, U+2100–U+218F, U+2460–U+24FF, or > U+2776–U+2793 > identifier-head → U+2C00–U+2DFF or U+2E80–U+2FFF > identifier-head → U+3004–U+3007, U+3021–U+302F, U+3031–U+303F, or > U+3040–U+D7FF > identifier-head → U+F900–U+FD3D, U+FD40–U+FDCF, U+FDF0–U+FE1F, or > U+FE30–U+FE44 > identifier-head → U+FE47–U+FFFD > identifier-head → U+10000–U+1FFFD, U+20000–U+2FFFD, U+30000–U+3FFFD, or > U+40000–U+4FFFD > identifier-head → U+50000–U+5FFFD, U+60000–U+6FFFD, U+70000–U+7FFFD, or > U+80000–U+8FFFD > identifier-head → U+90000–U+9FFFD, U+A0000–U+AFFFD, U+B0000–U+BFFFD, or > U+C0000–U+CFFFD > identifier-head → U+D0000–U+DFFFD or U+E0000–U+EFFFD > > As the construction principle for this list, they say > > "Identifiers begin with an upper case or lower case letter A through Z, > an underscore (_), a noncombining alphanumeric Unicode character in the > Basic Multilingual Plane, or a character outside the Basic Multilingual > Plan that isn’t in a Private Use Area. After the first character, digits > and combining Unicode characters are also allowed." > > Regards, > Martin > _______________________________________________ > Unicode mailing list > Unicode@unicode.org > http://unicode.org/mailman/listinfo/unicode >
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode