Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mathias Bynens

 On 28 Jan 2015, at 11:36, Marja Hölttä ma...@chromium.org wrote:
 
 TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ?
 
 The ES6 unicode regexp spec is not very clear regarding what should happen if 
 the regexp or the matched string contains lonely surrogates (a lead surrogate 
 without a trail, or a trail without a lead). For example, for the . operator, 
 the relevant parts of the spec speak about characters:
 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-atom
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-canonicalize-abstract-operation
 
 E.g.,
 “Let A be the set of all *characters* except LineTerminator.”
 “Let ch be the *character* Input[e].”
 
 But is a lonely surrogate a character? According to the Unicode standard, 
 it’s not. If it's not, what will ch be if the input string contains a lonely 
 surrogate in the relevant position?
 
 Q1: Are lonely surrogates allowed in /u regexps?
 
 E.g., /foo\uD83D/u; (note lonely lead surrogate), should this be allowed? 
 Will it match a lead surrogate inside a surrogate pair?
 
 Suggestion: we shouldn't allow lonely surrogates in /u regexps.
 
 If users actually want to match lonely surrogates (e.g., to check for them or 
 remove them) then they can use non-/u regexps.

You’re proposing to define “characters” in terms of Unicode scalar values in 
the case `/u` is used. I could get behind that — it reinforces the idea that 
`/u` is like a strict mode for regular expressions.

Playing devil’s advocate, the problem is that regular expressions and strings 
go hand in hand, and there is no guarantee that JavaScript strings only consist 
of valid code points. Making `.` not match lone surrogates breaks the developer 
expectation that `(.)` matches every “part” of the string. Having to avoid `/u` 
to prevent this seems like a potentially bad thing.

 The regexp syntax treats a lonely surrogate as a normal unicode escape, and 
 the rules say e.g., The production RegExpUnicodeEscapeSequence :: u 
 Hex4Digits evaluates as follows: Return the character whose code is the SV of 
 Hex4Digits. - it's also unclear what this means if no valid character has 
 this code.
 
 Q2: If the string contains a lonely surrogate, what should it match? Should 
 it match .? Should it match [^a] ? (Or is it undefined behavior?)
 
 Test cases:
 /foo.bar/u.test(foo\uD83Dbar) == ?
 /foo.bar/u.test(foo\uDC00bar) == ?
 /foo[^a]bar/u.test(foo\uD83Dbar) == ?
 /foo[^a]bar/u.test(foo\uDC00bar) == ?
 /foo/u.test(bar\uD83Dbarfoo) == ?
 /foo/u.test(bar\uDC00barfoo) == ?
 /foo(.*)bar\1/u.test(foo\uD834bar\uD834\uDC00) == ? // Should the 
 backreference be allowed to match the lead surrogate of a surrogate pair?
 /^(.+)\1$/u.test(\uDC00foobar\uD83D\uDC00foobar\uD83D) == ?? // Should we 
 allow splitting the surrogate pair like this?
 
 Suggestion: a lonely surrogate should not be a character and it should not 
 match . or [^a] etc. However, a lonely surrogate in the input string 
 shouldn't prevent some other part of the string from matching.
 
 If a lonely surrogate is treated as a character, the matching rule for . gets 
 complicated and difficult / slow to implement: . should not match individual 
 surrogates inside a surrogate pair, but if it has to match a lonely 
 surrogate, we'll end up needing lookahead and lookbehind logic to implement 
 that behavior.
 
 For example, the current version of Mathias’s ES6 Unicode regular expression 
 transpiler ( https://mothereff.in/regexpu ) converts /a.b/u into 
 /a(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b/
  and afaics it’s not yet fully consistent wrt lonely surrogates, so, a 
 consistent implementation is going to be more complex than this.

This is indeed an incomplete solution. The lack of lookbehind support in ES 
makes this hard to transpile correctly. Ideas welcome!

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull

On 1/28/2015 2:51 PM, André Bargull wrote:

For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk
1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works:

foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so,
generally, lonely surrogates match /./.

Backreferences are allowed to consume the leading surrogate of a valid
surrogate pair:

Ex1: foo\uD834bar\uD834\uDC00 matches foo(.+)bar\1

But surprisingly:

Ex2: \uDC00foobar\uD834\uDC00foobar\uD834 doesn't match ^(.+)\1$

... So Ex2 works as if the input string was converted to UTF-32 before
matching, but Ex1 works as if it was def not. Idk what's the correct mental
model where both Ex1 and Ex2 would make sense.


java.util.regex.Pattern matches back references by comparing (Java) chars [1], but reads patterns 
as a sequence of code points [2]. That should help to explain the differences between ex1 and ex2.


[1] 
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l4890
[2] 
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l1671


Err, the part about how patterns are read is not important here. What I should have written is that 
the input string is (also) read as a sequence of code points [3]. So in ex2 `\uD834\uDC00` is read 
as a single code point (and not split into \uD834 and \uDC00 during backtracking).


[3] 
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l3773

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Maximum String length

2015-01-28 Thread Jordan Harband
Typically, implementation-specific things aren't specified in the spec
(like Math precision, etc) - although usually when it's
implementation-specific, it's explicitly noted as such (
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-date.parse ,
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-math.hypot ,
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-ecmascript-language-types-number-type
, https://people.mozilla.org/~jorendorff/es6-draft.html#sec-object.keys ,
etc)

Strings are only defined in ES6 as a primitive value that is a finite
ordered sequence of zero or more 16-bit unsigned integer (
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-terms-and-definitions-string-value
) and are not noted as having any implementation-specific or
implementation-dependent qualities.

To me, finite here means `Number.MAX_VALUE` - ie, the highest number I
can get before I reach Infinity. An alternative reading is any number
greater than zero that's not Infinity - but at that point an
implementation conforms if it's max length is 1, which obviously would be
silly.

However, Chrome 40 and Opera 26-27 have a limit of `0xFF0` (`2**28 -
2**4`), Firefox 35 and IE 9-11 all have a limit of `0xFFF` (`2**28 -
1`), and Safari 8 has `0x7FFF` (`2**31 - 1`). There's many more
browsers I haven't tested of course but it'd be interesting to know how
wide these numbers deviate.

1) Should an engine's max string length be exposed, like
`Number.MAX_VALUE`, as `String.MAX_LENGTH`? This will help, for example, my
`String#repeat` polyfill throw an earlier `RangeError` rather than having
to try to build a string of that length.
2) Should the spec require a minimum maximum string length, or at least be
more specific in how it defines finite?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk
1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works:

foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so,
generally, lonely surrogates match /./.

Backreferences are allowed to consume the leading surrogate of a valid
surrogate pair:

Ex1: foo\uD834bar\uD834\uDC00 matches foo(.+)bar\1

But surprisingly:

Ex2: \uDC00foobar\uD834\uDC00foobar\uD834 doesn't match ^(.+)\1$

... So Ex2 works as if the input string was converted to UTF-32 before
matching, but Ex1 works as if it was def not. Idk what's the correct mental
model where both Ex1 and Ex2 would make sense.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
I think the cleanest mental model is where UTF-16 or UTF-8 strings are
interpreted as if they were transformed into UTF-32.

While that is generally feasible, it often represents a cost in performance
which is not acceptable in practice. So you see various approaches that
involve some deviation from that mental model.


Mark https://google.com/+MarkDavis

*— Il meglio è l’inimico del bene —*

On Wed, Jan 28, 2015 at 2:15 PM, Marja Hölttä ma...@chromium.org wrote:

 For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk
 1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works:

 foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so,
 generally, lonely surrogates match /./.

 Backreferences are allowed to consume the leading surrogate of a valid
 surrogate pair:

 Ex1: foo\uD834bar\uD834\uDC00 matches foo(.+)bar\1

 But surprisingly:

 Ex2: \uDC00foobar\uD834\uDC00foobar\uD834 doesn't match ^(.+)\1$

 ... So Ex2 works as if the input string was converted to UTF-32 before
 matching, but Ex1 works as if it was def not. Idk what's the correct mental
 model where both Ex1 and Ex2 would make sense.


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Maximum String length

2015-01-28 Thread Andreas Rossberg
On 28 January 2015 at 13:14, Claude Pache claude.pa...@gmail.com wrote:

 To me, finite is just to be taken in the common mathematical sense of
 the term; in particular you could have theoretically a string of length
 10^1. But yes, it would be reasonable to restrict oneself to strings of
 length at most 2^52, so that `string.length` could always return an exact
 answer.


To me it would be reasonable to restrict oneself to much shorter strings,
since no existing machine has the memory to represent a string of length
2^52, nor will any in the foreseeable future. ;)

VMs can always run into out-of-memory conditions. In general, there is no
way to predict those. Even strings with less then the hard-coded length
limit might cause you to go OOM. So providing reflection on a constant like
that might do little but giving a false sense of safety.

/Andreas
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull

For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk
1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works:

foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so,
generally, lonely surrogates match /./.

Backreferences are allowed to consume the leading surrogate of a valid
surrogate pair:

Ex1: foo\uD834bar\uD834\uDC00 matches foo(.+)bar\1

But surprisingly:

Ex2: \uDC00foobar\uD834\uDC00foobar\uD834 doesn't match ^(.+)\1$

... So Ex2 works as if the input string was converted to UTF-32 before
matching, but Ex1 works as if it was def not. Idk what's the correct mental
model where both Ex1 and Ex2 would make sense.


java.util.regex.Pattern matches back references by comparing (Java) chars [1], but reads patterns as 
a sequence of code points [2]. That should help to explain the differences between ex1 and ex2.


[1] 
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l4890
[2] 
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l1671

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Based on Ex1, looks like the input string is not read as a sequence of code
points when we try to find a match for \1. So it's mostly read as a
sequence of code points except when it's not. :/

On Wed, Jan 28, 2015 at 3:11 PM, André Bargull andre.barg...@udo.edu
wrote:

 On 1/28/2015 2:51 PM, André Bargull wrote:

 For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk
 1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works:

 foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so,
 generally, lonely surrogates match /./.

 Backreferences are allowed to consume the leading surrogate of a valid
 surrogate pair:

 Ex1: foo\uD834bar\uD834\uDC00 matches foo(.+)bar\1

 But surprisingly:

 Ex2: \uDC00foobar\uD834\uDC00foobar\uD834 doesn't match ^(.+)\1$

 ... So Ex2 works as if the input string was converted to UTF-32 before
 matching, but Ex1 works as if it was def not. Idk what's the correct
 mental
 model where both Ex1 and Ex2 would make sense.


 java.util.regex.Pattern matches back references by comparing (Java) chars
 [1], but reads patterns as a sequence of code points [2]. That should help
 to explain the differences between ex1 and ex2.

 [1] http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/
 c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l4890
 [2] http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/
 c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l1671


 Err, the part about how patterns are read is not important here. What I
 should have written is that the input string is (also) read as a sequence
 of code points [3]. So in ex2 `\uD834\uDC00` is read as a single code point
 (and not split into \uD834 and \uDC00 during backtracking).

 [3] http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/
 c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l3773

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Hello es-discuss,

TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ?

The ES6 unicode regexp spec is not very clear regarding what should happen
if the regexp or the matched string contains lonely surrogates (a lead
surrogate without a trail, or a trail without a lead). For example, for the
. operator, the relevant parts of the spec speak about characters:

https://people.mozilla.org/~jorendorff/es6-draft.html#sec-atom
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-canonicalize-abstract-operation

E.g.,
“Let A be the set of all *characters* except LineTerminator.”
“Let ch be the *character* Input[e].”

But is a lonely surrogate a character? According to the Unicode standard,
it’s not. If it's not, what will ch be if the input string contains a
lonely surrogate in the relevant position?

Q1: Are lonely surrogates allowed in /u regexps?

E.g., /foo\uD83D/u; (note lonely lead surrogate), should this be allowed?
Will it match a lead surrogate inside a surrogate pair?

Suggestion: we shouldn't allow lonely surrogates in /u regexps.

If users actually want to match lonely surrogates (e.g., to check for them
or remove them) then they can use non-/u regexps.

The regexp syntax treats a lonely surrogate as a normal unicode escape, and
the rules say e.g., The production RegExpUnicodeEscapeSequence :: u
Hex4Digits evaluates as follows: Return the character whose code is the SV
of Hex4Digits. - it's also unclear what this means if no valid character
has this code.

Q2: If the string contains a lonely surrogate, what should it match? Should
it match .? Should it match [^a] ? (Or is it undefined behavior?)

Test cases:
/foo.bar/u.test(foo\uD83Dbar) == ?
/foo.bar/u.test(foo\uDC00bar) == ?
/foo[^a]bar/u.test(foo\uD83Dbar) == ?
/foo[^a]bar/u.test(foo\uDC00bar) == ?
/foo/u.test(bar\uD83Dbarfoo) == ?
/foo/u.test(bar\uDC00barfoo) == ?
/foo(.*)bar\1/u.test(foo\uD834bar\uD834\uDC00) == ? // Should the
backreference be allowed to match the lead surrogate of a surrogate pair?
/^(.+)\1$/u.test(\uDC00foobar\uD83D\uDC00foobar\uD83D) == ?? // Should we
allow splitting the surrogate pair like this?

Suggestion: a lonely surrogate should not be a character and it should not
match . or [^a] etc. However, a lonely surrogate in the input string
shouldn't prevent some other part of the string from matching.

If a lonely surrogate is treated as a character, the matching rule for .
gets complicated and difficult / slow to implement: . should not match
individual surrogates inside a surrogate pair, but if it has to match a
lonely surrogate, we'll end up needing lookahead and lookbehind logic to
implement that behavior.

For example, the current version of Mathias’s ES6 Unicode regular
expression transpiler ( https://mothereff.in/regexpu ) converts /a.b/u into
/a(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b/
and afaics it’s not yet fully consistent wrt lonely surrogates, so, a
consistent implementation is going to be more complex than this.

If we convert the string into UC-32 before matching, then the lonely
surrogate is a character behavior gets easier to implement, but we
wouldn't want to be forced to do that. The intention behind the ES6 spec
seems to be that strings can / should still be stored as UC-16. Converting
strings to UC-32 before matching with /u regexps would require an
additional pass over the string which we'd want to avoid, and converting
only when strictly needed for the lonely surrogate is a character
implementation adds complexity. E.g., with some regexps we don't need to
scan the whole input string to find a match, and also most input strings,
even for /u regexps, probably won't contain surrogates (to find that out
we'd also need to scan the whole string, or some logic to fall back to
UC-32 matching when we see a surrogate).

BR,
Marja
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull

On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä marja at chromium.org  
https://mail.mozilla.org/listinfo/es-discuss wrote:

/  The ES6 unicode regexp spec is not very clear regarding what should happen
//  if the regexp or the matched string contains lonely surrogates (a lead
//  surrogate without a trail, or a trail without a lead). For example, for the
//  . operator, the relevant parts of the spec speak about characters:
//
/
​Just a bit of terminology.

The term character is overloaded, so Unicode provides the unambiguous
term code point. For example, U+0378​ is not (currently) an encoded
character according to Unicode, but it would certainly be a terrible idea
to disregard it, or not match it. It is a reserved code point that may be
assigned as an encoded character in the future. So both U+D83D and U+0378
are not characters.

If a ES spec uses the term character instead of code point, then at
some point in the text it needs to disambiguate what is meant.


character is defined in 21.2.2 Pattern Semantics [1]:

In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit 
Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern 
“character” means a UTF-16 encoded code point.



[1] https://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
Good, that sounds right.


Mark https://google.com/+MarkDavis

*— Il meglio è l’inimico del bene —*

On Wed, Jan 28, 2015 at 12:57 PM, André Bargull andre.barg...@udo.edu
wrote:

  On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä marja at chromium.org 
 https://mail.mozilla.org/listinfo/es-discuss wrote:

 * The ES6 unicode regexp spec is not very clear regarding what should happen
 ** if the regexp or the matched string contains lonely surrogates (a lead
 ** surrogate without a trail, or a trail without a lead). For example, for 
 the
 ** . operator, the relevant parts of the spec speak about characters:
 *
 ​Just a bit of terminology.

 The term character is overloaded, so Unicode provides the unambiguous
 term code point. For example, U+0378​ is not (currently) an encoded
 character according to Unicode, but it would certainly be a terrible idea
 to disregard it, or not match it. It is a reserved code point that may be
 assigned as an encoded character in the future. So both U+D83D and U+0378
 are not characters.

 If a ES spec uses the term character instead of code point, then at
 some point in the text it needs to disambiguate what is meant.


 character is defined in 21.2.2 Pattern Semantics [1]:

 In the context of describing the behaviour of a BMP pattern “character”
 means a single 16-bit Unicode BMP code point. In the context of describing
 the behaviour of a Unicode pattern “character” means a UTF-16 encoded code
 point.



 [1]
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Erik Corry
On Wed, Jan 28, 2015 at 11:45 AM, Mathias Bynens math...@qiwi.be wrote:


  On 28 Jan 2015, at 11:36, Marja Hölttä ma...@chromium.org wrote:
 
  For example, the current version of Mathias’s ES6 Unicode regular
 expression transpiler ( https://mothereff.in/regexpu ) converts /a.b/u
 into
 /a(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b/
 and afaics it’s not yet fully consistent wrt lonely surrogates, so, a
 consistent implementation is going to be more complex than this.

 This is indeed an incomplete solution. The lack of lookbehind support in
 ES makes this hard to transpile correctly. Ideas welcome!


I don't think your transpiler can work without lookbehind.  If you could
guarantee that none of your transpiled regexp matches a substring that ends
in the middle of a pair, then I think you could get it right without
lookbehind, but consider:

TxL-TxLT.test(/(...)-\1./);

Where L stands for a lead surrogate, and T stands for a trailing
surrogate.  There's no way to stop the backreference from swallowing the
last L, and without lookbehind there is no way to stop the . from matching
the final T.  A second issue is having a match that starts in the middle of
a pair. You could test for this after the matching if JS gave you the index
of the match in the string, but I don't think it does.

Ignoring the start-of-match-in-the-middle-of-a-pair issue, and the
backreferences case, I think you can do without the backreference.
Assuming the lonely-surrogates-are-a-character scenario, the period (.)
transpiles to (ignore spaces added for readability):

(?:  \L(?!\T)  | \L\T  |  \T  |  [^\L\T\N])

where \L means leading surrogates, \T means trailing surrogates, \N means
all newlines.  Whatever comes before the . is not allowed to match a half

As an optimization, .x can transpile to (?: \L\T | . )x where the x stands
in for any literal characters.

For a JS engine implementor, like Marja, it is of course possible to add
1-character negative lookbehind (\b already has elements of this).  Then
your in-engine transpiler turns . into

(?:  \L(?!\T)  | \L\T  |  (?!\L)\T  |  [^\L\T\N])

Which is going to be truly horrible in terms of code size and performance.
It's not like the period operator is a rare thing in a regexp, and other
common things like [^a-z] and [^\d] will expand into similar horrors.

On the other hand, in the lonely-surrogates-match-nothing scenario, the .
transpiles to

(?: \l\t  |  [^\l\t\n] )

which is quite a lot nicer and faster.  In this scenario, .x expands to (?:
\L\T | [^\T\L\N )  which still has no lookaheads and lookbehinds.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Maximum String length

2015-01-28 Thread Claude Pache

 Le 28 janv. 2015 à 09:58, Jordan Harband ljh...@gmail.com a écrit :
 
 Typically, implementation-specific things aren't specified in the spec (like 
 Math precision, etc) - although usually when it's implementation-specific, 
 it's explicitly noted as such ( 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-date.parse 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-date.parse , 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-math.hypot 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-math.hypot , 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-ecmascript-language-types-number-type
  
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-ecmascript-language-types-number-type
  , https://people.mozilla.org/~jorendorff/es6-draft.html#sec-object.keys 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-object.keys , etc)
 
 Strings are only defined in ES6 as a primitive value that is a finite 
 ordered sequence of zero or more 16-bit unsigned integer ( 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-terms-and-definitions-string-value
  
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-terms-and-definitions-string-value
  ) and are not noted as having any implementation-specific or 
 implementation-dependent qualities.
 
 To me, finite here means `Number.MAX_VALUE` - ie, the highest number I can 
 get before I reach Infinity. An alternative reading is any number greater 
 than zero that's not Infinity - but at that point an implementation conforms 
 if it's max length is 1, which obviously would be silly.
 

To me, finite is just to be taken in the common mathematical sense of the 
term; in particular you could have theoretically a string of length 10^1. 
But yes, it would be reasonable to restrict oneself to strings of length at 
most 2^52, so that `string.length` could always return an exact answer.

—Claude

 However, Chrome 40 and Opera 26-27 have a limit of `0xFF0` (`2**28 - 
 2**4`), Firefox 35 and IE 9-11 all have a limit of `0xFFF` (`2**28 - 1`), 
 and Safari 8 has `0x7FFF` (`2**31 - 1`). There's many more browsers I 
 haven't tested of course but it'd be interesting to know how wide these 
 numbers deviate.
 
 1) Should an engine's max string length be exposed, like `Number.MAX_VALUE`, 
 as `String.MAX_LENGTH`? This will help, for example, my `String#repeat` 
 polyfill throw an earlier `RangeError` rather than having to try to build a 
 string of that length.
 2) Should the spec require a minimum maximum string length, or at least be 
 more specific in how it defines finite?

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä ma...@chromium.org wrote:

 The ES6 unicode regexp spec is not very clear regarding what should happen
 if the regexp or the matched string contains lonely surrogates (a lead
 surrogate without a trail, or a trail without a lead). For example, for the
 . operator, the relevant parts of the spec speak about characters:


​Just a bit of terminology.

The term character is overloaded, so Unicode provides the unambiguous
term code point. For example, U+0378​ is not (currently) an encoded
character according to Unicode, but it would certainly be a terrible idea
to disregard it, or not match it. It is a reserved code point that may be
assigned as an encoded character in the future. So both U+D83D and U+0378
are not characters.

If a ES spec uses the term character instead of code point, then at
some point in the text it needs to disambiguate what is meant.

As to how this should be handled in regex expressions, I'd suggest looking
at Java's approach.

Mark https://google.com/+MarkDavis

*— Il meglio è l’inimico del bene —*
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Wes Garland
Some interesting questions here.

1 - What is a character? Is it a Unicode Code Point?
2 - Should we be able to match all possible JS Strings?
3 - Should we be able to match all possible Unicode Strings?
4 - What do we do if there is a character in a String we cannot match?
5 - Do unmatchable characters match . ?
6 - Are subsections of unmatchable strings matchable if they contain only
matchable characters?

It is important to remember in these discussions that the Unicode
specification allows strings which contain unmatched surrogates. Do we want
regular expressions that can't match some Unicode strings? Do we extend the
regexp syntax to have a symbol which matches an unmatched surrogate?  How
about reserved code points?  What happens when they become assigned?


On 28 January 2015 at 05:36, Marja Hölttä ma...@chromium.org wrote:

 Hello es-discuss,

 TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ?

 The ES6 unicode regexp spec is not very clear regarding what should happen
 if the regexp or the matched string contains lonely surrogates (a lead
 surrogate without a trail, or a trail without a lead). For example, for the
 . operator, the relevant parts of the spec speak about characters:

 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-atom

 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation

 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-canonicalize-abstract-operation

 E.g.,
 “Let A be the set of all *characters* except LineTerminator.”
 “Let ch be the *character* Input[e].”

 But is a lonely surrogate a character? According to the Unicode standard,
 it’s not. If it's not, what will ch be if the input string contains a
 lonely surrogate in the relevant position?

 Q1: Are lonely surrogates allowed in /u regexps?

 E.g., /foo\uD83D/u; (note lonely lead surrogate), should this be allowed?
 Will it match a lead surrogate inside a surrogate pair?

 Suggestion: we shouldn't allow lonely surrogates in /u regexps.

 If users actually want to match lonely surrogates (e.g., to check for them
 or remove them) then they can use non-/u regexps.

 The regexp syntax treats a lonely surrogate as a normal unicode escape,
 and the rules say e.g., The production RegExpUnicodeEscapeSequence :: u
 Hex4Digits evaluates as follows: Return the character whose code is the SV
 of Hex4Digits. - it's also unclear what this means if no valid character
 has this code.

 Q2: If the string contains a lonely surrogate, what should it match?
 Should it match .? Should it match [^a] ? (Or is it undefined behavior?)

 Test cases:
 /foo.bar/u.test(foo\uD83Dbar) == ?
 /foo.bar/u.test(foo\uDC00bar) == ?
 /foo[^a]bar/u.test(foo\uD83Dbar) == ?
 /foo[^a]bar/u.test(foo\uDC00bar) == ?
 /foo/u.test(bar\uD83Dbarfoo) == ?
 /foo/u.test(bar\uDC00barfoo) == ?
 /foo(.*)bar\1/u.test(foo\uD834bar\uD834\uDC00) == ? // Should the
 backreference be allowed to match the lead surrogate of a surrogate pair?
 /^(.+)\1$/u.test(\uDC00foobar\uD83D\uDC00foobar\uD83D) == ?? // Should
 we allow splitting the surrogate pair like this?

 Suggestion: a lonely surrogate should not be a character and it should not
 match . or [^a] etc. However, a lonely surrogate in the input string
 shouldn't prevent some other part of the string from matching.

 If a lonely surrogate is treated as a character, the matching rule for .
 gets complicated and difficult / slow to implement: . should not match
 individual surrogates inside a surrogate pair, but if it has to match a
 lonely surrogate, we'll end up needing lookahead and lookbehind logic to
 implement that behavior.

 For example, the current version of Mathias’s ES6 Unicode regular
 expression transpiler ( https://mothereff.in/regexpu ) converts /a.b/u
 into
 /a(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b/
 and afaics it’s not yet fully consistent wrt lonely surrogates, so, a
 consistent implementation is going to be more complex than this.

 If we convert the string into UC-32 before matching, then the lonely
 surrogate is a character behavior gets easier to implement, but we
 wouldn't want to be forced to do that. The intention behind the ES6 spec
 seems to be that strings can / should still be stored as UC-16. Converting
 strings to UC-32 before matching with /u regexps would require an
 additional pass over the string which we'd want to avoid, and converting
 only when strictly needed for the lonely surrogate is a character
 implementation adds complexity. E.g., with some regexps we don't need to
 scan the whole input string to find a match, and also most input strings,
 even for /u regexps, probably won't contain surrogates (to find that out
 we'd also need to scan the whole string, or some logic to fall back to
 UC-32 matching when we see a surrogate).

 BR,
 Marja


 

RE: Maximum String length

2015-01-28 Thread Domenic Denicola
From: es-discuss [mailto:es-discuss-boun...@mozilla.org] On Behalf Of Jordan 
Harband

 Strings can't possibly have a length larger than Number.MAX_SAFE_INTEGER - 
 otherwise, you'd be able to have a string whose length is not a number 
 representable in JavaScript.

So? That's a bit inconvenient, but no reason to argue that such a string can't 
exist.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread Allen Wirfs-Brock

On Jan 28, 2015, at 4:40 PM, John-David Dalton john.david.dal...@gmail.com 
wrote:

 Kind of a bummer. The isTypedArray example from  
 https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59
  is incorrect. Is there an updated reference somewhere?
 The toStringTag result is handy because it allows checking against several 
 tags at once without having to invoke multiple functions each with their own 
 try-catch and all that perf baggage.
How is it incorrect?  Are you referring to the fact that both typed arrays and 
DataView objects have a [[ViewedArrayBuffer]] internal slot.  If so, I think 
this is a specification but that I should fix.

Allen

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread John-David Dalton
Primary issue is in  isTypedArray(a):
Uin32Array.prototype.buffer.call(a);


Besides the typos, accessing .buffer throws in at least Chrome  Firefox.
Then .buffer is an object so if it doesn't throw there's no .call to
execute.

-JDD


On Wed, Jan 28, 2015 at 4:55 PM, Allen Wirfs-Brock al...@wirfs-brock.com
wrote:


 On Jan 28, 2015, at 4:40 PM, John-David Dalton 
 john.david.dal...@gmail.com wrote:

 Kind of a bummer. The isTypedArray example from
 https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59
  is
 incorrect. Is there an updated reference somewhere?
 The toStringTag result is handy because it allows checking against several
 tags at once without having to invoke multiple functions each with their
 own try-catch and all that perf baggage.

 How is it incorrect?  Are you referring to the fact that both typed arrays
 and DataView objects have a [[ViewedArrayBuffer]] internal slot.  If so,
 I think this is a specification but that I should fix.

 Allen


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Maximum String length

2015-01-28 Thread Jordan Harband
Strings can't possibly have a length larger than Number.MAX_SAFE_INTEGER -
otherwise, you'd be able to have a string whose length is not a number
representable in JavaScript. So, at the least, I think it would make sense
to define a maximum string length as Number.MAX_SAFE_INTEGER, even if that
provides no guarantees that strings of that length will work (ie, OOM
errors etc are fine), whether it's exposed on String or not.

It might also be nice if the spec included a non-normative note that
suggested a lower bound for a maximum string length (where strings are
guaranteed to work), so that at least there's a guideline.

Thoughts?

On Wed, Jan 28, 2015 at 6:53 AM, Mark S. Miller erig...@google.com wrote:

 On Wed, Jan 28, 2015 at 5:44 AM, Andreas Rossberg rossb...@google.com
 wrote:

 On 28 January 2015 at 13:14, Claude Pache claude.pa...@gmail.com wrote:

 To me, finite is just to be taken in the common mathematical sense of
 the term; in particular you could have theoretically a string of length
 10^1. But yes, it would be reasonable to restrict oneself to strings of
 length at most 2^52, so that `string.length` could always return an exact
 answer.


 To me it would be reasonable to restrict oneself to much shorter strings,
 since no existing machine has the memory to represent a string of length
 2^52, nor will any in the foreseeable future. ;)


 That's just four petabytes. If present trends...



 VMs can always run into out-of-memory conditions. In general, there is no
 way to predict those. Even strings with less then the hard-coded length
 limit might cause you to go OOM. So providing reflection on a constant like
 that might do little but giving a false sense of safety.


 Yes, OOM is always possible earlier and we can't set any limits on that. I
 agree that we shouldn't provide anything like String.MAX_LENGTH. But I also
 don't see how we could pleasantly support strings above 2^53. Array indexes
 are limited to 2^31 or so, and many integer operations truncate to that
 (,|), and strings support [] indexing, so it may make sense to agree on
 one of those as an *upper bound* -- you may not support strings longer than
 that.


 --
 Cheers,
 --MarkM

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread Jordan Harband
To summarize the discussion at today's TC39 meeting:

Given that the style of checks that Allen proposed (
https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59
) (using non-side-effecty non-generic methods that rely on internal slots,
in a try/catch) is indeed reliable in ES3, and will continue to be reliable
in ES6, any security-conscious code should update itself to use these kinds
of checks rather than an Object.prototype.toString.call check. v8 (and any
other implementations that are working on @@toStringTag) will leave
Symbol.toStringTag behind a flag for a full two months, to give the
relevant code time to release updates.

In addition, anybody who modifies a builtin so that, say, a Boolean reports
itself as a Number, surely intends the effects of this change, and so there
is no concern about them. In accordance with this, step 17b of
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-object.prototype.tostring
will be removed - if a developer wants to make a non-builtin value
masquerade as a builtin, they similarly are intending those effects.

I've updated and/or released the following npm packages to remain resilient
with respect to this change in case anyone wants some specific examples of
how to implement this:
 - https://www.npmjs.com/package/is-equal
 - https://www.npmjs.com/package/is-date-object
 - https://www.npmjs.com/package/is-number-object
 - https://www.npmjs.com/package/is-regex
 - https://www.npmjs.com/package/is-symbol

In addition, I've closed and added similar comments to the spec bug I
originally filed: https://bugs.ecmascript.org/show_bug.cgi?id=3506

Thanks, everyone, for your thoughts and time!

- Jordan

On Sat, Jan 24, 2015 at 2:59 PM, Mark Miller erig...@gmail.com wrote:

 Put better, the spec requires that Object.freeze(Object.prototype) works.


 On Sat, Jan 24, 2015 at 2:57 PM, Mark Miller erig...@gmail.com wrote:



 On Sat, Jan 24, 2015 at 2:42 PM, Isiah Meadows impinb...@gmail.com
 wrote:

  From: Mark S. Miller erig...@google.com
  To: Gary Guo nbdd0...@hotmail.com
  Cc: es-discuss@mozilla.org es-discuss@mozilla.org
  Date: Sat, 24 Jan 2015 07:11:35 -0800
  Subject: Re: @@toStringTag spoofing for null and undefined
  Of course it can, by tamper proofing (essentially, freezing)
 Object.prototype. None of these protections are relevant anyway in an
 environment in which the primordials are not locked down.

 Yeah, pretty much. That proverbial inch was given a long time ago. And
 the proverbial mile taken. And I highly doubt the spec is going to require
 `Object.freeze(Object.prototype)`,

 Of course not. The key is the spec allows it. SES makes use of that.





 since that prohibits future polyfills and prolyfills of the Object
 prototype. Also, you could always straight up overwrite it, but that's even
 harder to protect against. (And how many cases do you know of literally
 overwriting built-in prototypes?)

 Or, to throw out an analog to Java, it is perfectly possible to call or
 even override a private method through reflection. JavaScript simply has
 more accessible reflection, more often useful since it's a more dynamic
 prototype-based OO language as opposed to a stricter class-based language.

 
  On Sat, Jan 24, 2015 at 6:11 AM, Gary Guo nbdd0...@hotmail.com
 wrote:
 
  Now I have a tendency to support the suggestion that cuts the
 anti-spoofing part. If coder wants to make an object and pretend it's a
 built-in, let it be. The anti-spoofing algorithm could not prevent this
 case:
  ```
  Object.prototype.toString = function(){
return '[object I_Can_Be_Anything]';
  }
  ```
 

 Or this:
 ```js
 function handler() {
   throw new Error(No prototype for you!);
 }

 Object.defineProperty(
   Object,
   'prototype',
   {
 get: handler,
 set: handler,
 enumerable: true
   });
 ```

 Me thinks this isn't going to get fixed.

  ___
  es-discuss mailing list
  es-discuss@mozilla.org
  https://mail.mozilla.org/listinfo/es-discuss
 
 
 
 
  --
  Cheers,
  --MarkM
 
 

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss




 --
 Text by me above is hereby placed in the public domain

   Cheers,
   --MarkM




 --
 Text by me above is hereby placed in the public domain

   Cheers,
   --MarkM

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread John-David Dalton
Kind of a bummer. The isTypedArray example from
https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59
is
incorrect. Is there an updated reference somewhere?
The toStringTag result is handy because it allows checking against several
tags at once without having to invoke multiple functions each with their
own try-catch and all that perf baggage.

- JDD



On Wed, Jan 28, 2015 at 4:29 PM, Jordan Harband ljh...@gmail.com wrote:

 To summarize the discussion at today's TC39 meeting:

 Given that the style of checks that Allen proposed (
 https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59
 ) (using non-side-effecty non-generic methods that rely on internal slots,
 in a try/catch) is indeed reliable in ES3, and will continue to be reliable
 in ES6, any security-conscious code should update itself to use these kinds
 of checks rather than an Object.prototype.toString.call check. v8 (and any
 other implementations that are working on @@toStringTag) will leave
 Symbol.toStringTag behind a flag for a full two months, to give the
 relevant code time to release updates.

 In addition, anybody who modifies a builtin so that, say, a Boolean
 reports itself as a Number, surely intends the effects of this change, and
 so there is no concern about them. In accordance with this, step 17b of
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-object.prototype.tostring
 will be removed - if a developer wants to make a non-builtin value
 masquerade as a builtin, they similarly are intending those effects.

 I've updated and/or released the following npm packages to remain
 resilient with respect to this change in case anyone wants some specific
 examples of how to implement this:
  - https://www.npmjs.com/package/is-equal
  - https://www.npmjs.com/package/is-date-object
  - https://www.npmjs.com/package/is-number-object
  - https://www.npmjs.com/package/is-regex
  - https://www.npmjs.com/package/is-symbol

 In addition, I've closed and added similar comments to the spec bug I
 originally filed: https://bugs.ecmascript.org/show_bug.cgi?id=3506

 Thanks, everyone, for your thoughts and time!

 - Jordan

 On Sat, Jan 24, 2015 at 2:59 PM, Mark Miller erig...@gmail.com wrote:

 Put better, the spec requires that Object.freeze(Object.prototype) works.


 On Sat, Jan 24, 2015 at 2:57 PM, Mark Miller erig...@gmail.com wrote:



 On Sat, Jan 24, 2015 at 2:42 PM, Isiah Meadows impinb...@gmail.com
 wrote:

  From: Mark S. Miller erig...@google.com
  To: Gary Guo nbdd0...@hotmail.com
  Cc: es-discuss@mozilla.org es-discuss@mozilla.org
  Date: Sat, 24 Jan 2015 07:11:35 -0800
  Subject: Re: @@toStringTag spoofing for null and undefined
  Of course it can, by tamper proofing (essentially, freezing)
 Object.prototype. None of these protections are relevant anyway in an
 environment in which the primordials are not locked down.

 Yeah, pretty much. That proverbial inch was given a long time ago. And
 the proverbial mile taken. And I highly doubt the spec is going to require
 `Object.freeze(Object.prototype)`,

 Of course not. The key is the spec allows it. SES makes use of that.





 since that prohibits future polyfills and prolyfills of the Object
 prototype. Also, you could always straight up overwrite it, but that's even
 harder to protect against. (And how many cases do you know of literally
 overwriting built-in prototypes?)

 Or, to throw out an analog to Java, it is perfectly possible to call or
 even override a private method through reflection. JavaScript simply has
 more accessible reflection, more often useful since it's a more dynamic
 prototype-based OO language as opposed to a stricter class-based language.

 
  On Sat, Jan 24, 2015 at 6:11 AM, Gary Guo nbdd0...@hotmail.com
 wrote:
 
  Now I have a tendency to support the suggestion that cuts the
 anti-spoofing part. If coder wants to make an object and pretend it's a
 built-in, let it be. The anti-spoofing algorithm could not prevent this
 case:
  ```
  Object.prototype.toString = function(){
return '[object I_Can_Be_Anything]';
  }
  ```
 

 Or this:
 ```js
 function handler() {
   throw new Error(No prototype for you!);
 }

 Object.defineProperty(
   Object,
   'prototype',
   {
 get: handler,
 set: handler,
 enumerable: true
   });
 ```

 Me thinks this isn't going to get fixed.

  ___
  es-discuss mailing list
  es-discuss@mozilla.org
  https://mail.mozilla.org/listinfo/es-discuss
 
 
 
 
  --
  Cheers,
  --MarkM
 
 

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss




 --
 Text by me above is hereby placed in the public domain

   Cheers,
   --MarkM




 --
 Text by me above is hereby placed in the public domain

   Cheers,
   --MarkM

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 

Re: Maximum String length

2015-01-28 Thread Jordan Harband
I suppose we could change the spec, but
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-ecmascript-language-types-string-type
requires that The length of a String is the number of elements (i.e.,
16-bit values) within it. - if the number can't be represented, then it
seems that requirement can't be satisfied. I'm sure one can come up with a
counterintuitive reading of the spec, but is that a realistic
interpretation of it?

On Wed, Jan 28, 2015 at 4:37 PM, Domenic Denicola d...@domenic.me wrote:

 From: es-discuss [mailto:es-discuss-boun...@mozilla.org] On Behalf Of
 Jordan Harband

  Strings can't possibly have a length larger than Number.MAX_SAFE_INTEGER
 - otherwise, you'd be able to have a string whose length is not a number
 representable in JavaScript.

 So? That's a bit inconvenient, but no reason to argue that such a string
 can't exist.


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread Allen Wirfs-Brock

On Jan 28, 2015, at 5:03 PM, John-David Dalton john.david.dal...@gmail.com 
wrote:

 Primary issue is in  isTypedArray(a):
 Uin32Array.prototype.buffer.call(a);
 
 
 Besides the typos, accessing .buffer throws in at least Chrome  Firefox.
 Then .buffer is an object so if it doesn't throw there's no .call to execute.

 the ES6 definition of %TypedArray%.prototype.buffer:

%TypedArray%.prototype.buffer is an accessor property whose set accessor 
function is undefined. Its get accessor function performs the following steps:

1. Let O be the this value.
2. If Type(O) is not Object, throw a TypeError exception.
3. If O does not have a [[ViewedArrayBuffer]] internal slot throw a 
TypeError exception.
4. Let buffer be the value of O’s [[ViewedArrayBuffer]] internal slot.
5. Return buffer.

ES6 expects buffer to be implemented as an accessor property.  That means that 
the probe in my test should be:
Object.getOwnProperty(Uint32Array.prototype.__proto__, ‘buffer’).get.call(a);
Allen






 
 -JDD
 
 
 On Wed, Jan 28, 2015 at 4:55 PM, Allen Wirfs-Brock al...@wirfs-brock.com 
 wrote:
 
 On Jan 28, 2015, at 4:40 PM, John-David Dalton john.david.dal...@gmail.com 
 wrote:
 
 Kind of a bummer. The isTypedArray example from  
 https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59
  is incorrect. Is there an updated reference somewhere?
 The toStringTag result is handy because it allows checking against several 
 tags at once without having to invoke multiple functions each with their own 
 try-catch and all that perf baggage.
 How is it incorrect?  Are you referring to the fact that both typed arrays 
 and DataView objects have a [[ViewedArrayBuffer]] internal slot.  If so, I 
 think this is a specification but that I should fix.
 
 Allen
 
 

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Maximum String length

2015-01-28 Thread Mark S. Miller
On Wed, Jan 28, 2015 at 5:44 AM, Andreas Rossberg rossb...@google.com
wrote:

 On 28 January 2015 at 13:14, Claude Pache claude.pa...@gmail.com wrote:

 To me, finite is just to be taken in the common mathematical sense of
 the term; in particular you could have theoretically a string of length
 10^1. But yes, it would be reasonable to restrict oneself to strings of
 length at most 2^52, so that `string.length` could always return an exact
 answer.


 To me it would be reasonable to restrict oneself to much shorter strings,
 since no existing machine has the memory to represent a string of length
 2^52, nor will any in the foreseeable future. ;)


That's just four petabytes. If present trends...



 VMs can always run into out-of-memory conditions. In general, there is no
 way to predict those. Even strings with less then the hard-coded length
 limit might cause you to go OOM. So providing reflection on a constant like
 that might do little but giving a false sense of safety.


Yes, OOM is always possible earlier and we can't set any limits on that. I
agree that we shouldn't provide anything like String.MAX_LENGTH. But I also
don't see how we could pleasantly support strings above 2^53. Array indexes
are limited to 2^31 or so, and many integer operations truncate to that
(,|), and strings support [] indexing, so it may make sense to agree on
one of those as an *upper bound* -- you may not support strings longer than
that.


-- 
Cheers,
--MarkM
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull

On 1/28/2015 3:36 PM, Marja Hölttä wrote:

Based on Ex1, looks like the input string is not read as a sequence of code 
points when we try to
find a match for \1. So it's mostly read as a sequence of code points except 
when it's not. :/


Yep, back references are matched as a sequence of code units. The first link I've posted points to 
the relevant method in java.util.regex.Pattern. I've got no idea why it's implemented that way, for 
example when you enable case-insensitive matching, back references are no longer matched as a 
sequence of code units:


---
int[] flags = { 0, Pattern.CASE_INSENSITIVE, Pattern.UNICODE_CASE,
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE };

// Prints true, false, true, false
Arrays.stream(flags).mapToObj(f - Pattern.compile(foo(.+)bar\\1, f))
.map(p - p.matcher(foo\uD834bar\uD834\uDC00).find())
.forEach(System.out::println);
---





On Wed, Jan 28, 2015 at 3:11 PM, André Bargull andre.barg...@udo.edu
mailto:andre.barg...@udo.edu wrote:

On 1/28/2015 2:51 PM, André Bargull wrote:

For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and 
openjdk
1.7.0_65) Pattern.UNICODE_CHARACTER___CLASS works:

foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so,
generally, lonely surrogates match /./.

Backreferences are allowed to consume the leading surrogate of a 
valid
surrogate pair:

Ex1: foo\uD834bar\uD834\uDC00 matches foo(.+)bar\1

But surprisingly:

Ex2: \uDC00foobar\uD834\__uDC00foobar\uD834 doesn't match ^(.+)\1$

... So Ex2 works as if the input string was converted to UTF-32 
before
matching, but Ex1 works as if it was def not. Idk what's the 
correct mental
model where both Ex1 and Ex2 would make sense.


java.util.regex.Pattern matches back references by comparing (Java) 
chars [1], but reads
patterns as a sequence of code points [2]. That should help to explain 
the differences
between ex1 and ex2.

[1]

http://hg.openjdk.java.net/__jdk8u/jdk8u/jdk/file/__c46daef6edb5/src/share/__classes/java/util/regex/__Pattern.java#l4890

http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l4890
[2]

http://hg.openjdk.java.net/__jdk8u/jdk8u/jdk/file/__c46daef6edb5/src/share/__classes/java/util/regex/__Pattern.java#l1671

http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l1671


Err, the part about how patterns are read is not important here. What I 
should have written is
that the input string is (also) read as a sequence of code points [3]. So 
in ex2 `\uD834\uDC00`
is read as a single code point (and not split into \uD834 and \uDC00 during 
backtracking).

[3]

http://hg.openjdk.java.net/__jdk8u/jdk8u/jdk/file/__c46daef6edb5/src/share/__classes/java/util/regex/__Pattern.java#l3773

http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/c46daef6edb5/src/share/classes/java/util/regex/Pattern.java#l3773



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Maximum String length

2015-01-28 Thread Claude Pache



 Le 29 janv. 2015 à 01:49, Jordan Harband ljh...@gmail.com a écrit :
 
 I suppose we could change the spec, but 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-ecmascript-language-types-string-type
  requires that The length of a String is the number of elements (i.e., 
 16-bit values) within it. - if the number can't be represented, then it 
 seems that requirement can't be satisfied. I'm sure one can come up with a 
 counterintuitive reading of the spec, but is that a realistic interpretation 
 of it?

It's not a requirement, it's a definition. But more on the point, the length of 
a String is simply a nonnegative integer, not a Number value representing such 
a integer. Not to be confused with the value of the length property of that 
String, which is necessarily a Number value.

—Claude ___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread John-David Dalton
At the moment that throws too. Anyways it's something to hammer on a bit.
Maybe Jordan can kick it around too.

Thanks,
-JDD

On Wed, Jan 28, 2015 at 5:16 PM, Allen Wirfs-Brock al...@wirfs-brock.com
wrote:


 On Jan 28, 2015, at 5:03 PM, John-David Dalton 
 john.david.dal...@gmail.com wrote:

 Primary issue is in  isTypedArray(a):
 Uin32Array.prototype.buffer.call(a);


 Besides the typos, accessing .buffer throws in at least Chrome  Firefox.
 Then .buffer is an object so if it doesn't throw there's no .call to
 execute.


  the ES6 definition of %TypedArray%.prototype.buffer:

 %TypedArray%.prototype.buffer is an *accessor property* whose set
 accessor function is undefined. Its get accessor function performs the
 following steps:

 1. Let O be the this value.
 2. If Type(O) is not Object, throw a TypeError exception.
 3. If O does not have a [[ViewedArrayBuffer]] internal slot throw
 a TypeError exception.
 4. Let buffer be the value of O’s [[ViewedArrayBuffer]] internal slot.
 5. Return buffer.

 ES6 expects buffer to be implemented as an accessor property.  That means
 that the probe in my test should be:

 Object.getOwnProperty(Uint32Array.prototype.__proto__, ‘buffer’).get.call(a);

 Allen








 -JDD


 On Wed, Jan 28, 2015 at 4:55 PM, Allen Wirfs-Brock al...@wirfs-brock.com
 wrote:


 On Jan 28, 2015, at 4:40 PM, John-David Dalton 
 john.david.dal...@gmail.com wrote:

 Kind of a bummer. The isTypedArray example from
 https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59
  is
 incorrect. Is there an updated reference somewhere?
 The toStringTag result is handy because it allows checking against
 several tags at once without having to invoke multiple functions each with
 their own try-catch and all that perf baggage.

 How is it incorrect?  Are you referring to the fact that both typed
 arrays and DataView objects have a [[ViewedArrayBuffer]] internal slot.
 If so, I think this is a specification but that I should fix.

 Allen




___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Allen Wirfs-Brock

On Jan 28, 2015, at 5:26 AM, Mark Davis ☕️ m...@macchiato.com wrote:

 I think the cleanest mental model is where UTF-16 or UTF-8 strings are 
 interpreted as if they were transformed into UTF-32.

This is exactly the approach used  in the ES6 spec (except that it doesn’t deal 
with UTF-8)

 
 While that is generally feasible, it often represents a cost in performance 
 which is not acceptable in practice. So you see various approaches that 
 involve some deviation from that mental model.

While ES6 uses this approach in its specification, implementations are free to 
use any implementation technique that produces the same result.

Allen


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Cool, thanks for clarifications!

To make sure, as per the intended semantics, we never allow splitting a
valid surrogate pair (= matching only one of the surrogates but not the
other), and thus we'll differ from the Java implementation here:

/foo(.+)bar\1/u.test(foo\uD834bar\uD834\uDC00); we say false, Java says
true.

(In addition, /^(.+)\1$/u.test(\uDC00foobar\uD834\uDC00foobar\uD834) ==
false.)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Allen Wirfs-Brock

On Jan 28, 2015, at 2:36 AM, Marja Hölttä ma...@chromium.org wrote:

 Hello es-discuss,
 
 TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ?
 
 The ES6 unicode regexp spec is not very clear regarding what should happen if 
 the regexp or the matched string contains lonely surrogates (a lead surrogate 
 without a trail, or a trail without a lead). For example, for the . operator, 
 the relevant parts of the spec speak about characters:

TL;DR: in a unicode regexp lonely surrogates are considered to be a single 
“character”. 

As André has already covered “character” has a very specific meaning within the 
context of the ES6 RegExp specification in the second paragraph of  
http://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics . 
The specification uses the same set of algorithms to describe both BCP (i.e., 
16-bit elements) and unicode (i.e., 32-bit elements) patterns and matching 
semantics.  “Character” is used in those algorithm to refer to a single element 
of the mode that is currently operating within.

I think the ambiguity you find is in step 2.1 of 
http://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern :

2.  Return an internal closure that takes two arguments, a String str and an 
integer index, and performs the following:
1. If Unicode is true, let Input be a List consisting of the sequence of code 
points of str interpreted as a UTF-16 encoded Unicode string. Otherwise, let 
Input be a List consisting of the sequence of code units that are the elements 
of str. Input will be used throughout the algorithms in 21.2.2. Each element of 
Input is considered to be a character. 

Apparently I don’t have an adequate definition of “interpreted as a UTF-16 
encoded Unicode string”. If you submit a bug to bugs.emncascript.org) I will 
provided one in the next spec. revisions.  The intended semantics is that:
   In ascending string index order:
Each valid UTF-16 surrogate pair is interpreted as a signal code point 
that is the UTF-16 encoded value
Each “lonely” surrogate is interpreted as  single code point that is 
the surrogate value
Every other 16-bit code unit is interpreted as a single code point.

Allen






 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-atom
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-canonicalize-abstract-operation
 
 E.g.,
 “Let A be the set of all *characters* except LineTerminator.”
 “Let ch be the *character* Input[e].”
 
 But is a lonely surrogate a character? According to the Unicode standard, 
 it’s not. If it's not, what will ch be if the input string contains a lonely 
 surrogate in the relevant position?
 
 Q1: Are lonely surrogates allowed in /u regexps?
 
 E.g., /foo\uD83D/u; (note lonely lead surrogate), should this be allowed? 
 Will it match a lead surrogate inside a surrogate pair?
 
 Suggestion: we shouldn't allow lonely surrogates in /u regexps.
 
 If users actually want to match lonely surrogates (e.g., to check for them or 
 remove them) then they can use non-/u regexps.
 
 The regexp syntax treats a lonely surrogate as a normal unicode escape, and 
 the rules say e.g., The production RegExpUnicodeEscapeSequence :: u 
 Hex4Digits evaluates as follows: Return the character whose code is the SV of 
 Hex4Digits. - it's also unclear what this means if no valid character has 
 this code.
 
 Q2: If the string contains a lonely surrogate, what should it match? Should 
 it match .? Should it match [^a] ? (Or is it undefined behavior?)
 
 Test cases:
 /foo.bar/u.test(foo\uD83Dbar) == ?
 /foo.bar/u.test(foo\uDC00bar) == ?
 /foo[^a]bar/u.test(foo\uD83Dbar) == ?
 /foo[^a]bar/u.test(foo\uDC00bar) == ?
 /foo/u.test(bar\uD83Dbarfoo) == ?
 /foo/u.test(bar\uDC00barfoo) == ?
 /foo(.*)bar\1/u.test(foo\uD834bar\uD834\uDC00) == ? // Should the 
 backreference be allowed to match the lead surrogate of a surrogate pair?
 /^(.+)\1$/u.test(\uDC00foobar\uD83D\uDC00foobar\uD83D) == ?? // Should we 
 allow splitting the surrogate pair like this?
 
 Suggestion: a lonely surrogate should not be a character and it should not 
 match . or [^a] etc. However, a lonely surrogate in the input string 
 shouldn't prevent some other part of the string from matching.
 
 If a lonely surrogate is treated as a character, the matching rule for . gets 
 complicated and difficult / slow to implement: . should not match individual 
 surrogates inside a surrogate pair, but if it has to match a lonely 
 surrogate, we'll end up needing lookahead and lookbehind logic to implement 
 that behavior.
 
 For example, the current version of Mathias’s ES6 Unicode regular expression 
 transpiler ( https://mothereff.in/regexpu ) converts /a.b/u into 
 

RE: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Domenic Denicola
From: Mark S. Miller [mailto:erig...@google.com] 

 On Tue, Jan 27, 2015 at 5:53 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 I'd like to understand better the suggestion here, because I'm not sure I'm 
 entirely following it.  Specifically, I'd like to understand it in terms of 
 the internal methods defined by 
 https://github.com/domenic/window-proxy-spec.

 Presumably you're proposing that we keep all of that as-is except for 
 [[DefineOwnProperty]], right?

 For [[DefineOwnProperty]], are we basically talking about changing step 1 to:

 1)  If the [[Configurable]] field of Desc is present and 
 Desc.[[Configurable]] is false, then throw a TypeError exception.

 while keeping everything else as-is,

 Exactly correct. I didn't realize until reading your reply is that this is 
 all that's necessary -- that it successfully covers all the cases I was 
 thinking about without any further case division.

I'm having a bit of trouble understanding how this maps to the solution 
described in your previous message, Mark. Your I didn't realize until reading 
your reply is that this is all that's necessary indicates I'm probably just 
missing something, so help appreciated.

My question is, what happens if Desc.[[Configurable]] is not present, and P 
does not already exist on W? By my reading, we then fall through to calling the 
[[DefineOwnProperty]] internal method of W with arguments P and Desc.

Assuming W's [[DefineOwnProperty]] is that of an ordinary object, I believe 
that takes us through OrdinaryDefineOwnProperty(W, P, Desc). Since P does not 
exist on W, and W is extensible, that takes us to 
ValidateAndApplyPropertyDescriptor(O, P, true, Desc, undefined). Then according 
to step 2.c,  If the value of an attribute field of Desc is absent, the 
attribute of the newly created property is set to its default value. The 
default value is false, right? So won't this try to define a non-configurable 
property on W?

I would have thought the modification needed to be more like:

[[DefineOwnProperty]] (P, Desc)

1. If desc.[[Configurable]] is not present, set desc.[[Configurable]] to true.
2. If desc.[[Configurable]] is false, then throw a TypeError exception.
3. Return the result of calling the [[DefineOwnProperty]] internal method of W 
with arguments P and Desc.

(here I have inserted step 1, but step 2 and 3 are unchanged from the previous 
incarnation).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Allen Wirfs-Brock

On Jan 28, 2015, at 4:54 AM, Wes Garland w...@page.ca wrote:

 Some interesting questions here.

These aren't discussion points.  These are all things that must have answers 
that are directly derivable from the ES6 spec.  If, after developing an 
adequate understand of that part of the specification, you can’t find the 
answer to these questions then there is probably something that needs to be 
clarified in the spec.
 
 1 - What is a character? Is it a Unicode Code Point?
defined in: paragraph 2 
http://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics 
 2 - Should we be able to match all possible JS Strings?
yes, there is nothing in the algorithms that restrict JS String values
 3 - Should we be able to match all possible Unicode Strings?
yes, subject to what you mean but “Unicode Strings” as within JS Strings 
supplemental code points must be UTF-16 encoded.
 4 - What do we do if there is a character in a String we cannot match?
RegExp.exec returns null if a string cannot be matched by a pattern
 5 - Do unmatchable characters match . ?
there is no concept in the specification of an “unmatchable” character
 6 - Are subsections of unmatchable strings matchable if they contain only 
 matchable characters?
there is no concept in the specification of an “unmatchable” character
 
 It is important to remember in these discussions that the Unicode 
 specification allows strings which contain unmatched surrogates.
and ES6 //u patterns can match them
 Do we want regular expressions that can't match some Unicode strings?
No, the ES6 specificaiton can match all possible strings
 Do we extend the regexp syntax to have a symbol which matches an unmatched 
 surrogate?
we already have it: \u{D83D}
   How about reserved code points?  What happens when they become assigned?
Other than the initial decoding of valid surrogate pairs into 32-bit code 
points, the ES6 //u RegExp spec. applies no semantics to any code points in the 
string that is being matched.

Allen

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull

Cool, thanks for clarifications!

To make sure, as per the intended semantics, we never allow splitting a
valid surrogate pair (= matching only one of the surrogates but not the
other), and thus we'll differ from the Java implementation here:

/foo(.+)bar\1/u.test(foo\uD834bar\uD834\uDC00); we say false, Java says
true.


Correct, the captures List entry is [\uD834], so when performing 21.2.2.9 AtomEscape, \uD834 is 
matched against \uD834\uDC00 in step 8 which results in a failure state.





(In addition, /^(.+)\1$/u.test(\uDC00foobar\uD834\uDC00foobar\uD834) ==
false.)


Yes, this expression also returns false.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Mark S. Miller
On Wed, Jan 28, 2015 at 8:51 AM, Domenic Denicola d...@domenic.me wrote:

 From: Mark S. Miller [mailto:erig...@google.com]

  On Tue, Jan 27, 2015 at 5:53 PM, Boris Zbarsky bzbar...@mit.edu wrote:
  I'd like to understand better the suggestion here, because I'm not sure
 I'm entirely following it.  Specifically, I'd like to understand it in
 terms of the internal methods defined by 
 https://github.com/domenic/window-proxy-spec.
 
  Presumably you're proposing that we keep all of that as-is except for
 [[DefineOwnProperty]], right?
 
  For [[DefineOwnProperty]], are we basically talking about changing step
 1 to:
 
  1)  If the [[Configurable]] field of Desc is present and
 Desc.[[Configurable]] is false, then throw a TypeError exception.
 
  while keeping everything else as-is,
 
  Exactly correct. I didn't realize until reading your reply is that this
 is all that's necessary -- that it successfully covers all the cases I was
 thinking about without any further case division.

 I'm having a bit of trouble understanding how this maps to the solution
 described in your previous message, Mark. Your I didn't realize until
 reading your reply is that this is all that's necessary indicates I'm
 probably just missing something, so help appreciated.

 My question is, what happens if Desc.[[Configurable]] is not present, and
 P does not already exist on W? By my reading, we then fall through to
 calling the [[DefineOwnProperty]] internal method of W with arguments P and
 Desc.

 Assuming W's [[DefineOwnProperty]] is that of an ordinary object, I
 believe that takes us through OrdinaryDefineOwnProperty(W, P, Desc). Since
 P does not exist on W, and W is extensible, that takes us to
 ValidateAndApplyPropertyDescriptor(O, P, true, Desc, undefined). Then
 according to step 2.c,  If the value of an attribute field of Desc is
 absent, the attribute of the newly created property is set to its default
 value. The default value is false, right? So won't this try to define a
 non-configurable property on W?


In this situation, it will try and succeed. This more closely obeys the
intent in the original code (e.g., the comment in the jQuery code), since
it creates a non-configurable property on the *Window* W. It does not
violate any invariant, since all that's observable on the *WindowProxy*
(given the rest of your draft spec, which remain unchanged) is a
configurable property of the same name.





 I would have thought the modification needed to be more like:

 [[DefineOwnProperty]] (P, Desc)

 1. If desc.[[Configurable]] is not present, set desc.[[Configurable]] to
 true.
 2. If desc.[[Configurable]] is false, then throw a TypeError exception.
 3. Return the result of calling the [[DefineOwnProperty]] internal method
 of W with arguments P and Desc.

 (here I have inserted step 1, but step 2 and 3 are unchanged from the
 previous incarnation).




-- 
Cheers,
--MarkM
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Mark S. Miller
On Wed, Jan 28, 2015 at 11:08 AM, Domenic Denicola d...@domenic.me wrote:

 From: Mark S. Miller [mailto:erig...@google.com]

  In this situation, it will try and succeed. This more closely obeys the
 intent in the original code (e.g., the comment in the jQuery code), since
 it creates a non-configurable property on the *Window* W. It does not
 violate any invariant, since all that's observable on the *WindowProxy*
 (given the rest of your draft spec, which remain unchanged) is a
 configurable property of the same name.

 Ah, I see! So then another non-intuitive (but invariant-preserving)
 consequence would be:

 ```js
 Object.defineProperty(window, prop, { value: foo });

 var propDesc = Object.getOwnPropertyDescriptor(window, prop);

 if (propDesc.configurable) {
   Object.defineProperty(window, prop, { value: bar });

   // this will fail, even though the property is supposedly configurable,
   // since when it forwards from the WindowProxy `window` to the underlying
   // Window object, it the Window's [[DefineOwnProperty]] fails.
 }
 ```

 Am I getting this right?


Exactly, yes. And again, if window is an ES6 proxy rather that a
WindowProxy, it could also cause this behavior, so it doesn't create any
situation which is not otherwise possible.

The key points are:

1) The throw does (arguably) better obey the code's intent, since the
property mostly acts like a non-configurable property until the window is
navigated.

2) If a window navigation happens between your first step and your second,
the second step may well succeed, which is what we (arguably) want, but
which would have been prohibited if propDesc.configurable evaluated to true.


-- 
Cheers,
--MarkM
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


RE: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Domenic Denicola
From: Mark S. Miller [mailto:erig...@google.com] 

 In this situation, it will try and succeed. This more closely obeys the 
 intent in the original code (e.g., the comment in the jQuery code), since it 
 creates a non-configurable property on the *Window* W. It does not violate 
 any invariant, since all that's observable on the *WindowProxy* (given the 
 rest of your draft spec, which remain unchanged) is a configurable property 
 of the same name.

Ah, I see! So then another non-intuitive (but invariant-preserving) consequence 
would be:

```js
Object.defineProperty(window, prop, { value: foo });

var propDesc = Object.getOwnPropertyDescriptor(window, prop);

if (propDesc.configurable) {
  Object.defineProperty(window, prop, { value: bar });

  // this will fail, even though the property is supposedly configurable,
  // since when it forwards from the WindowProxy `window` to the underlying
  // Window object, it the Window's [[DefineOwnProperty]] fails.
}
```

Am I getting this right?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Brendan Eich

Mark S. Miller wrote:


Exactly correct. I didn't realize until reading your reply is that
this is all that's necessary -- that it successfully covers all
the cases I was thinking about without any further case division.


Here's another option, not clearly better or worse:


  [[DefineOwnProperty]] (P, Desc)

 1. let R be the result of calling the [[DefineOwnProperty]] internal
method of/W/with arguments/P/and /Desc/.
 2. If/desc/.[[Configurable]] is present and*false*, then throw
a*TypeError*exception.
 3. return R.

This is exactly like your solution, but with the order of the two 
steps switched. Perhaps the next breakage we see will tell us which of 
these to choose. If both are web compatible, then we need only pick 
which one we like better.


I like the shorter one (filling in from cited text below, here it is in 
full:



 [[DefineOwnProperty]] (P, Desc)

1. If /desc/.[[Configurable]] is present and/desc/.[[Configurable]] is
   *false*, then throw a *TypeError* exception.
2. Return the result of calling the [[DefineOwnProperty]] internal
   method of /W/ with arguments /P/ and /Desc/.


Besides being shorter, this doesn't call through to [[DOP]], which could 
have effects, and only then maybe-throw.


/be



as opposed to the behavior I'd understood we were aiming for,
which was:

1)  If the [[Configurable]] field of Desc is not present or
Desc.[[Configurable]] is false, then throw a TypeError exception. 



?  If so, that's certainly a change that is much more likely
to be web-compatible...


Good! It certainly takes care of the one concrete breakage we know
about so far.


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss