Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mathias Bynens
On 28 Jan 2015, at 11:36, Marja Hölttä ma...@chromium.org wrote: TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ? The ES6 unicode regexp spec is not very clear regarding what should happen if the regexp or the matched string contains lonely surrogates (a lead surrogate without a trail, or a

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull
On 1/28/2015 2:51 PM, André Bargull wrote: For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk 1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works: foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so, generally, lonely surrogates match /./. Backreferences are

Maximum String length

2015-01-28 Thread Jordan Harband
Typically, implementation-specific things aren't specified in the spec (like Math precision, etc) - although usually when it's implementation-specific, it's explicitly noted as such ( https://people.mozilla.org/~jorendorff/es6-draft.html#sec-date.parse ,

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk 1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works: foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so, generally, lonely surrogates match /./. Backreferences are allowed to consume the leading surrogate of a

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
I think the cleanest mental model is where UTF-16 or UTF-8 strings are interpreted as if they were transformed into UTF-32. While that is generally feasible, it often represents a cost in performance which is not acceptable in practice. So you see various approaches that involve some deviation

Re: Maximum String length

2015-01-28 Thread Andreas Rossberg
On 28 January 2015 at 13:14, Claude Pache claude.pa...@gmail.com wrote: To me, finite is just to be taken in the common mathematical sense of the term; in particular you could have theoretically a string of length 10^1. But yes, it would be reasonable to restrict oneself to strings of

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull
For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk 1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works: foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so, generally, lonely surrogates match /./. Backreferences are allowed to consume the leading surrogate of a

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Based on Ex1, looks like the input string is not read as a sequence of code points when we try to find a match for \1. So it's mostly read as a sequence of code points except when it's not. :/ On Wed, Jan 28, 2015 at 3:11 PM, André Bargull andre.barg...@udo.edu wrote: On 1/28/2015 2:51 PM,

Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Hello es-discuss, TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ? The ES6 unicode regexp spec is not very clear regarding what should happen if the regexp or the matched string contains lonely surrogates (a lead surrogate without a trail, or a trail without a lead). For example, for the . operator,

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull
On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä marja at chromium.org https://mail.mozilla.org/listinfo/es-discuss wrote: / The ES6 unicode regexp spec is not very clear regarding what should happen // if the regexp or the matched string contains lonely surrogates (a lead // surrogate

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
Good, that sounds right. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Wed, Jan 28, 2015 at 12:57 PM, André Bargull andre.barg...@udo.edu wrote: On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä marja at chromium.org https://mail.mozilla.org/listinfo/es-discuss

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Erik Corry
On Wed, Jan 28, 2015 at 11:45 AM, Mathias Bynens math...@qiwi.be wrote: On 28 Jan 2015, at 11:36, Marja Hölttä ma...@chromium.org wrote: For example, the current version of Mathias’s ES6 Unicode regular expression transpiler ( https://mothereff.in/regexpu ) converts /a.b/u into

Re: Maximum String length

2015-01-28 Thread Claude Pache
Le 28 janv. 2015 à 09:58, Jordan Harband ljh...@gmail.com a écrit : Typically, implementation-specific things aren't specified in the spec (like Math precision, etc) - although usually when it's implementation-specific, it's explicitly noted as such (

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä ma...@chromium.org wrote: The ES6 unicode regexp spec is not very clear regarding what should happen if the regexp or the matched string contains lonely surrogates (a lead surrogate without a trail, or a trail without a lead). For example, for the

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Wes Garland
Some interesting questions here. 1 - What is a character? Is it a Unicode Code Point? 2 - Should we be able to match all possible JS Strings? 3 - Should we be able to match all possible Unicode Strings? 4 - What do we do if there is a character in a String we cannot match? 5 - Do unmatchable

RE: Maximum String length

2015-01-28 Thread Domenic Denicola
From: es-discuss [mailto:es-discuss-boun...@mozilla.org] On Behalf Of Jordan Harband Strings can't possibly have a length larger than Number.MAX_SAFE_INTEGER - otherwise, you'd be able to have a string whose length is not a number representable in JavaScript. So? That's a bit inconvenient,

Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread Allen Wirfs-Brock
On Jan 28, 2015, at 4:40 PM, John-David Dalton john.david.dal...@gmail.com wrote: Kind of a bummer. The isTypedArray example from https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59 is incorrect. Is there an updated reference somewhere? The toStringTag

Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread John-David Dalton
Primary issue is in isTypedArray(a): Uin32Array.prototype.buffer.call(a); Besides the typos, accessing .buffer throws in at least Chrome Firefox. Then .buffer is an object so if it doesn't throw there's no .call to execute. -JDD On Wed, Jan 28, 2015 at 4:55 PM, Allen Wirfs-Brock

Re: Maximum String length

2015-01-28 Thread Jordan Harband
Strings can't possibly have a length larger than Number.MAX_SAFE_INTEGER - otherwise, you'd be able to have a string whose length is not a number representable in JavaScript. So, at the least, I think it would make sense to define a maximum string length as Number.MAX_SAFE_INTEGER, even if that

Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread Jordan Harband
To summarize the discussion at today's TC39 meeting: Given that the style of checks that Allen proposed ( https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59 ) (using non-side-effecty non-generic methods that rely on internal slots, in a try/catch) is indeed

Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread John-David Dalton
Kind of a bummer. The isTypedArray example from https://esdiscuss.org/topic/tostringtag-spoofing-for-null-and-undefined#content-59 is incorrect. Is there an updated reference somewhere? The toStringTag result is handy because it allows checking against several tags at once without having to invoke

Re: Maximum String length

2015-01-28 Thread Jordan Harband
I suppose we could change the spec, but https://people.mozilla.org/~jorendorff/es6-draft.html#sec-ecmascript-language-types-string-type requires that The length of a String is the number of elements (i.e., 16-bit values) within it. - if the number can't be represented, then it seems that

Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread Allen Wirfs-Brock
On Jan 28, 2015, at 5:03 PM, John-David Dalton john.david.dal...@gmail.com wrote: Primary issue is in isTypedArray(a): Uin32Array.prototype.buffer.call(a); Besides the typos, accessing .buffer throws in at least Chrome Firefox. Then .buffer is an object so if it doesn't throw there's

Re: Maximum String length

2015-01-28 Thread Mark S. Miller
On Wed, Jan 28, 2015 at 5:44 AM, Andreas Rossberg rossb...@google.com wrote: On 28 January 2015 at 13:14, Claude Pache claude.pa...@gmail.com wrote: To me, finite is just to be taken in the common mathematical sense of the term; in particular you could have theoretically a string of length

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull
On 1/28/2015 3:36 PM, Marja Hölttä wrote: Based on Ex1, looks like the input string is not read as a sequence of code points when we try to find a match for \1. So it's mostly read as a sequence of code points except when it's not. :/ Yep, back references are matched as a sequence of code

Re: Maximum String length

2015-01-28 Thread Claude Pache
Le 29 janv. 2015 à 01:49, Jordan Harband ljh...@gmail.com a écrit : I suppose we could change the spec, but https://people.mozilla.org/~jorendorff/es6-draft.html#sec-ecmascript-language-types-string-type requires that The length of a String is the number of elements (i.e., 16-bit

Re: @@toStringTag spoofing for null and undefined

2015-01-28 Thread John-David Dalton
At the moment that throws too. Anyways it's something to hammer on a bit. Maybe Jordan can kick it around too. Thanks, -JDD On Wed, Jan 28, 2015 at 5:16 PM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: On Jan 28, 2015, at 5:03 PM, John-David Dalton john.david.dal...@gmail.com wrote:

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Allen Wirfs-Brock
On Jan 28, 2015, at 5:26 AM, Mark Davis ☕️ m...@macchiato.com wrote: I think the cleanest mental model is where UTF-16 or UTF-8 strings are interpreted as if they were transformed into UTF-32. This is exactly the approach used in the ES6 spec (except that it doesn’t deal with UTF-8)

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Cool, thanks for clarifications! To make sure, as per the intended semantics, we never allow splitting a valid surrogate pair (= matching only one of the surrogates but not the other), and thus we'll differ from the Java implementation here: /foo(.+)bar\1/u.test(foo\uD834bar\uD834\uDC00); we say

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Allen Wirfs-Brock
On Jan 28, 2015, at 2:36 AM, Marja Hölttä ma...@chromium.org wrote: Hello es-discuss, TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ? The ES6 unicode regexp spec is not very clear regarding what should happen if the regexp or the matched string contains lonely surrogates (a lead surrogate

RE: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Domenic Denicola
From: Mark S. Miller [mailto:erig...@google.com] On Tue, Jan 27, 2015 at 5:53 PM, Boris Zbarsky bzbar...@mit.edu wrote: I'd like to understand better the suggestion here, because I'm not sure I'm entirely following it.  Specifically, I'd like to understand it in terms of the internal

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Allen Wirfs-Brock
On Jan 28, 2015, at 4:54 AM, Wes Garland w...@page.ca wrote: Some interesting questions here. These aren't discussion points. These are all things that must have answers that are directly derivable from the ES6 spec. If, after developing an adequate understand of that part of the

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread André Bargull
Cool, thanks for clarifications! To make sure, as per the intended semantics, we never allow splitting a valid surrogate pair (= matching only one of the surrogates but not the other), and thus we'll differ from the Java implementation here: /foo(.+)bar\1/u.test(foo\uD834bar\uD834\uDC00); we

Re: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Mark S. Miller
On Wed, Jan 28, 2015 at 8:51 AM, Domenic Denicola d...@domenic.me wrote: From: Mark S. Miller [mailto:erig...@google.com] On Tue, Jan 27, 2015 at 5:53 PM, Boris Zbarsky bzbar...@mit.edu wrote: I'd like to understand better the suggestion here, because I'm not sure I'm entirely following

Re: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Mark S. Miller
On Wed, Jan 28, 2015 at 11:08 AM, Domenic Denicola d...@domenic.me wrote: From: Mark S. Miller [mailto:erig...@google.com] In this situation, it will try and succeed. This more closely obeys the intent in the original code (e.g., the comment in the jQuery code), since it creates a

RE: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Domenic Denicola
From: Mark S. Miller [mailto:erig...@google.com] In this situation, it will try and succeed. This more closely obeys the intent in the original code (e.g., the comment in the jQuery code), since it creates a non-configurable property on the *Window* W. It does not violate any invariant,

Re: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-28 Thread Brendan Eich
Mark S. Miller wrote: Exactly correct. I didn't realize until reading your reply is that this is all that's necessary -- that it successfully covers all the cases I was thinking about without any further case division. Here's another option, not clearly better or worse: