Comment #11 on issue 1972 by [email protected]: Incorrect treatment of unicode escapes in keywords
https://code.google.com/p/v8/issues/detail?id=1972

Err, meant to say:

        
Project Member Reported by [email protected], Feb 24, 2012
The spec is not particularly clear on this, but I gather that unicode escapes in identifier names are supposed to be decoded _before_ distinguishing between keywords and identifiers. That is,

v\u0061r x = 0
eval("v\\u0061r y = 1")

should parse as valid declarations. That's what FF does, and it's also in line with other languages like Java. V8 rejects these examples with SyntaxError. JSC seems to be inconsistent and disallows the first but accepts the second.

Unfortunately, test262 has no tests for this.

Feb 24, 2012 Project Member #1 [email protected]
Conversely, the following should be syntax errors:

var v\u0061r = 9
eval("var v\\u0061r = 9")

FF rejects them, V8 accepts them, introducing a variable named "var".
Feb 24, 2012 Project Member #2 [email protected]
When looking more closely at the spec I totally agree with your interpretation. Nice catch!
Feb 24, 2012 #3 [email protected]
Keywords cannot contain unicode escapes, so
  v\u0061r x = 0
is a syntax error on two fronts.
The sequence "v\u0061r" can only be tokenized as an IdentifierName.
It's not the "var" keyword, since that can't contain escapes. It is equivalent to the IdentifierName "var". However, that is not an Identifier, since Identifier is "IdentifierName but not ReservedWord" and "var" is a reserved word.

So, the only place you can use "v\u0061r" is as an object literal property name or after a ".", where you can use a plain IdentifierName.

Browsers generally (AFAIR) used to treat "v\u0061r" as an Identifier, so "var v\u0061r = 9" would work. Some might have changed that.
Feb 24, 2012 Project Member #4 [email protected]
Well, at least as far as the language in the spec is concerned, I don't think this is clear at all. I filed a bug against the spec and test262, so that this can get clarified.

Feb 24, 2012 #5 lassern
I absolutely agree that it's not clear :)

I read the "var" in the production for variable declarations as a "terminal" (5.1.6), which says that it must be occur in the source exactly as written (no escapes). But then, the list of, e.g., keywords, are also given as terminals, but they are really lists/sets of strings to compare (post-escape-resolution) identifier names against.

(FWIW, the ES3 spec wasn't any better).
Oct 8, 2014 Project Member #7 [email protected]
From ES6 draft 11.6

Unicode escape sequences are permitted in an IdentifierName, where they contribute a single Unicode code point to the IdentifierName. The code point is expressed by the HexDigits of the UnicodeEscapeSequence (see 11.8.4). The \ preceding the UnicodeEscapeSequence and the u and { } code units, if they appear, do not contribute code points to the IdentifierName. A UnicodeEscapeSequence cannot be used to put a code point into an IdentifierName that would otherwise be illegal. In other words, if a \ UnicodeEscapeSequence sequence were replaced by the SourceCharacter it contributes, the result must still be a valid IdentifierName that has the exact same sequence of SourceCharacter elements as the original IdentifierName. All interpretations of IdentifierName within this specification are based upon their actual code points regardless of whether or not an escape sequence was used to contribute any particular code point.

So I guess this is pretty clear now.
Cc: [email protected]
Oct 8, 2014 Project Member #8 [email protected]
(No comment was entered for this change.)
Labels: Harmony
Oct 8, 2014 Project Member #9 [email protected]
Adding marja@ since she may have some ideas how to fix this in the parser?
Cc: [email protected]
Today (moments ago) Delete comment Project Member #10 [email protected]
ATM the spec draft also says:

The ReservedWord definitions are specified as literal sequences of specific SourceCharacter elements. A code point in a ReservedWord cannot be expressed by a \ UnicodeEscapeSequence.

http://people.mozilla.org/~jorendorff/es6-draft.html#sec-reserved-words

So... based on this, I *don't* think

v\u0061r x = 0
eval("v\\u0061r y = 1")

should be *legal*

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to