Re: why is regexp /\-/u a syntax-error?

2019-09-20 Thread kai zhu
fyi, googling "tc39 regexp unicode" led to web-compat reasoning (learned something new) @ https://github.com/tc39/proposal-regexp-unicode-property-escapes#what-about-backwards-compatibility What about backwards compatibility? In regular expressions without the u flag, the

Re: why is regexp /\-/u a syntax-error?

2019-09-20 Thread Mathias Bynens
Think of the `u` flag as a strict mode for regular expressions. `/\a/u` throws, because there is no reason to escape `a` as `\a` -- therefore, if such an escape sequence is present, it's likely a user error. The same goes for `/\-/u`. `-` only has special meaning within character classes

why is regexp /\-/u a syntax-error?

2019-09-20 Thread kai zhu
jslint previously warned against unescaped literal "-" in regexp. however, escaping "-" together with unicode flag "u", causes syntax error in chrome/firefox/edge (and jslint has since removed warning): ```javascript let rgx = /\-/u VM21:1 Uncaught SyntaxErr

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-29 Thread Carsten Bormann
On Oct 29, 2018, at 21:55, J Decker wrote: > > https://esdiscuss.org/topic/expectations-around-line-ending-behavior-for-u-2028-and-u-2029#content-10 Your message was non-surprising to me: Most editors indeed do not heed the Unicode lore on 2028 and 2029, as nobody uses these char

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-29 Thread J Decker
On Mon, Oct 29, 2018 at 1:50 PM Carsten Bormann wrote: > On Oct 26, 2018, at 10:48, Claude Pache wrote: > > > > I have just tried to open a file containing U+2028 and U+2029 in four > different text editors / integrated environments on my Mac. All of them > recognise both c

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-29 Thread Carsten Bormann
On Oct 26, 2018, at 10:48, Claude Pache wrote: > > I have just tried to open a file containing U+2028 and U+2029 in four > different text editors / integrated environments on my Mac. All of them > recognise both characters as newlines (and increment the line number for > tho

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-29 Thread Isiah Meadows
spec, is that all existing parsers and tooling for all languages > > would also be updated to have line numbering that include U+2028/29 > > There is also the somewhat widespread opinion that Unicode goofed by > adding those characters and that the best thing to do with them is to &g

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-29 Thread Boris Zbarsky
On 10/29/18 2:04 PM, Logan Smyth wrote: This means that the expectation, from the standpoint of Unicode spec, is that all existing parsers and tooling for all languages would also be updated to have line numbering that include U+2028/29 There is also the somewhat widespread opinion

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-29 Thread Logan Smyth
Sounds good. This means that the expectation, from the standpoint of Unicode spec, is that all existing parsers and tooling for all languages would also be updated to have line numbering that include U+2028/29, or else that the line numbers would indefinitely be out of sync with the line numbers

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-26 Thread Isiah Meadows
:49 PM Logan Smyth wrote: > > Great, thank you for that resource Allen, it's helpful to have something > concrete to consider. > > What you'd prefer is that that other languages should also be rendered with > U+2028/29 as creating new lines, even though their specifications do

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-26 Thread Logan Smyth
Great, thank you for that resource Allen, it's helpful to have something concrete to consider. What you'd prefer is that that other languages should also be rendered with U+2028/29 as creating new lines, even though their specifications do not define them as lines? That means that any parser

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-26 Thread Claude Pache
> > Would it be worth exploring a definition of U+2028/29 in the spec such that > they behave as line terminators for ASI, but otherwise do not increment > things line number counts and behave as whitespace characters? Diverging the definition of line terminator for the pur

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-26 Thread Claude Pache
> Le 24 oct. 2018 à 21:58, Logan Smyth a écrit : > > On the other hand, it seems like every editor that I've looked at so far will > not render these characters as newlines, I have just tried to open a file containing U+2028 and U+2029 in four different text editors

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-26 Thread Carsten Bormann
On Oct 26, 2018, at 02:17, Allen Wirfs-Brock wrote: > > see https://www.unicode.org/versions/Unicode11.0.0/ch05.pdf#G10213 Please explain how this is even remotely relevant for a programming language. (Clearly, this was written by people who were trying to encode word processing text. The

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-25 Thread Allen Wirfs-Brock
> On Oct 25, 2018, at 4:49 PM, Logan Smyth wrote: > > > Tools that do not consider U+2028/29 to be line breaks are not behaving as > > they should according to the latest Unicode standard. > > That's part of what I'm attempting to understand. What specifically

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-25 Thread Logan Smyth
> Tools that do not consider U+2028/29 to be line breaks are not behaving as they should according to the latest Unicode standard. That's part of what I'm attempting to understand. What specifically does Unicode require for these code points? What are the expectations for languages that h

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-25 Thread Waldemar Horwat
the original source. As currently specified, a line number in a stack trace takes U+2028/29 into account, and thus requires any consumer of this source code and line number value needs to have a special case for JS code. It seems unrealistic to expect every piece of tooling that works with source code

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-25 Thread Carsten Bormann
On Oct 25, 2018, at 18:24, Logan Smyth wrote: > > 3. Diverge the definition of current source-code line from the current > LineTerminatorSequence lexical grammar such that source line number is always > /\r?\n/, which is what the user is realistically going to see in their edit

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-25 Thread Logan Smyth
in a stack trace takes U+2028/29 into account, and thus requires any consumer of this source code and line number value needs to have a special case for JS code. It seems unrealistic to expect every piece of tooling that works with source code would have a special case for JS code to take these 2

Re: Expectations around line ending behavior for U+2028 and U+2029

2018-10-24 Thread Richard Gibson
this sense. Editors and HTML are free to do what they want, but in my opinion ECMAScript tooling at least should not pretend that these input elements don't terminate lines. On Wed, Oct 24, 2018 at 3:58 PM Logan Smyth wrote: > Something I've recently realized just how much U+2028 and U+2029 bei

Expectations around line ending behavior for U+2028 and U+2029

2018-10-24 Thread Logan Smyth
Something I've recently realized just how much U+2028 and U+2029 being newlines introduces a mismatch between different parts of a dev environment, and I'm curious for thoughts. Engines understandable take these two characters into account when defining their line number offsets in stack traces

Re: escaping - in /u RegExp

2015-01-20 Thread Allen Wirfs-Brock
actually it looks to be like a better place to put it is: ClassEscape[U] :: [+U] - allen On Jan 19, 2015, at 9:45 PM, Norbert Lindenberg wrote: I think the change proposed by Allen is fine. The main point of the new definition of IdentityEscape is to reserve \p, \X, and other escape

Re: escaping - in /u RegExp

2015-01-19 Thread Norbert Lindenberg
, what is the fix? This construction for Identity Escape goes back to Norbert's original proposal http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html Perhaps we need to add a: ClassAttom[U] :: [+U] \- production or some such to the pattern grammar

Re: escaping - in /u RegExp

2015-01-14 Thread Mathias Bynens
to Norbert's original proposal http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html Perhaps we need to add a: ClassAttom[U] :: [+U] \- production or some such to the pattern grammar. I think it’s a bug — see https://codereview.chromium.org/788043005

RE: escaping - in /u RegExp

2015-01-13 Thread Gary Guo
I think it s a bug, and I think your proposal is appropriate. From: al...@wirfs-brock.com Subject: escaping - in /u RegExp Date: Tue, 13 Jan 2015 13:23:54 -0800 To: es-discuss@mozilla.org Would those of you who consider yourselves RegExp experts take a look at https://bugs.ecmascript.org

escaping - in /u RegExp

2015-01-13 Thread Allen Wirfs-Brock
-supplementary-characters/index.html Perhaps we need to add a: ClassAttom[U] :: [+U] \- production or some such to the pattern grammar. Allen___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss

Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-23 Thread Allen Wirfs-Brock
On Nov 22, 2013, at 11:02 PM, Mathias Bynens wrote: It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to (1) and (2) because of the following: RegExpUnicodeEscapeSequence[U] :: [+U] LeadSurrogate \u TrailSurrogate …but I was looking for confirmation

Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-22 Thread Mathias Bynens
One more related question: are these three regular expression literals equivalent? 1. `/[-]/u`: raw astral symbols 2. `/[\u{1F4A9}-\u{1F4AB}]/u`: astral symbols represented using Unicode code point escape sequences 3. `/[\uD83D\uDCA9-\uD83D\uDCAB]/u`: astral symbols represented as a surrogate

Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-22 Thread Allen Wirfs-Brock
Bynens wrote: One more related question: are these three regular expression literals equivalent? 1. `/[-]/u`: raw astral symbols 2. `/[\u{1F4A9}-\u{1F4AB}]/u`: astral symbols represented using Unicode code point escape sequences 3. `/[\uD83D\uDCA9-\uD83D\uDCAB]/u`: astral symbols

Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-22 Thread Mathias Bynens
/~jorendorff/es6-draft.html#sec-patterns It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to (1) and (2) because of the following: RegExpUnicodeEscapeSequence[U] :: [+U] LeadSurrogate \u TrailSurrogate …but I was looking for confirmation

Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-21 Thread Mathias Bynens
If I’m reading the latest draft correctly, `RegExpUnicodeEscapeSequence`s aren’t allowed in regular expressions without the `u` flag. Why is that? AFAICT, the only situations that require looking at code points rather than UCS-2/UTF-16 code units in order to support full Unicode

Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-21 Thread Erik Arvidsson
On Thu, Nov 21, 2013 at 2:41 PM, Mathias Bynens math...@qiwi.be wrote: I’d suggest allowing `\u{xx}`-style escape sequences everywhere, and simply changing the behavior of the resulting regular expression depending on the `u` flag. There’s no good reason to disallow e.g. `/\u{20}/` or even

Re: Backwards compatibility and U+2E2F in `Identifier`s

2013-09-18 Thread Anne van Kesteren
On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens math...@qiwi.be wrote: After comparing the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE in `IdentifierStart` and `IdentifierPart`, but ECMAScript 6 / Unicode

Re: Backwards compatibility and U+2E2F in `Identifier`s

2013-09-18 Thread Mathias Bynens
On 18 Sep 2013, at 21:05, Anne van Kesteren ann...@annevk.nl wrote: On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens math...@qiwi.be wrote: After comparing the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE

Re: Backwards compatibility and U+2E2F in `Identifier`s

2013-08-24 Thread Norbert Lindenberg
I had no intentions specific to U+2E2F when I proposed relying on UTR 31 - the change is simply the effect of the character properties that the Unicode Technical Committee assigned to this character. I don't think there's a real problem. U+2E2F was added in Unicode version 5.1. ECMAScript 5.1

Re: Backwards compatibility and U+2E2F in `Identifier`s

2013-08-24 Thread Mathias Bynens
I had no intentions specific to U+2E2F when I proposed relying on UTR 31 - the change is simply the effect of the character properties that the Unicode Technical Committee assigned to this character. I don't think there's a real problem. U+2E2F was added in Unicode version 5.1

Re: Backwards compatibility and U+2E2F in `Identifier`s

2013-08-23 Thread Brendan Eich
the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE in `IdentifierStart` and `IdentifierPart`, but ECMAScript 6 / Unicode TR31 doesn’t. Was this potentially breaking change intentional? I’m fine with disallowing U+2E2F

Backwards compatibility and U+2E2F in `Identifier`s

2013-08-19 Thread Mathias Bynens
to Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax (http://www.unicode.org/reports/tr31/). After comparing the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE in `IdentifierStart` and `IdentifierPart

Re: u

2012-03-16 Thread Herby Vojčík
Brendan Eich wrote: Herby Vojčík wrote: I am probably writing densely and you had little time. I have written at the beginning of 1.: 'class ...}' as a sugar for 'function ...}.prototype' (I put similar texts describing the idea to the header of 2. and 3. as well) I get it, but it is not

Re: u

2012-03-16 Thread Brendan Eich
Herby Vojčík wrote: what incoherency is there? It behaves consistently all over. You are mixing coherent and consistent here. I explicitly distinguished them. Making a declarative form that looks like a function declaration have an expression form that evaluates differently is IMHO

u

2012-03-15 Thread Herby Vojčík
Brendan Eich wrote: Definitely, but classes have bigger issues than private syntax, and have for a while. Class-side inheritance, body syntax, whether there should be any declarative public syntax, what nested classes mean, static or 'class' members -- that's a partial list from memory. Minimal

Re: u

2012-03-15 Thread Herby Vojčík
Sorry for the strange subject, I have written object literal based class too minimal? (was: Re: @name) but Postbox Express somehow ate it. Herby ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss

Re: u

2012-03-15 Thread Brendan Eich
Herby Vojčík wrote: Brendan Eich wrote: Definitely, but classes have bigger issues than private syntax, and have for a while. Class-side inheritance, body syntax, whether there should be any declarative public syntax, what nested classes mean, static or 'class' members -- that's a partial list

Re: u

2012-03-15 Thread Herby Vojčík
Brendan Eich wrote: Herby Vojčík wrote: Brendan Eich wrote: Definitely, but classes have bigger issues than private syntax, and have for a while. Class-side inheritance, body syntax, whether there should be any declarative public syntax, what nested classes mean, static or 'class' members --

Re: u

2012-03-15 Thread Brendan Eich
Herby Vojčík wrote: class List (n) { this.@arr = n === +n ? new Array(n) : []; }.{ at (i) { i = +i; if (i=0 ithis.@arr.length) { return this.@arr[i]; } else throw Out of bounds: +i; } size () { return this.@arr.length; } } [snip...] List.{ from (array) { var r =

Re: u

2012-03-15 Thread Herby Vojčík
Brendan Eich wrote: That is coherent with new Foo - 'Foo is the class' means 'new Foo returns new instance'. Yes, but your first example, class List(n) {...} cited above at the very top, uses .{ to add what looks like prototype methods at and size. If class List(n){...} evaluates to the

Re: u

2012-03-15 Thread Brendan Eich
Herby Vojčík wrote: I am probably writing densely and you had little time. I have written at the beginning of 1.: 'class ...}' as a sugar for 'function ...}.prototype' (I put similar texts describing the idea to the header of 2. and 3. as well) I get it, but it is not coherent. A function

Re: Should Decode accept U+FFFE or U+FFFF (and other Unicode non-characters)?

2011-07-15 Thread Allen Wirfs-Brock
this requirement as: reject overlong UTF-8 sequences, and otherwise reject only unpaired or mispaired surrogate code points. Is this exactly what ES5 requires? And if it is, should it be? Firefox has also treated otherwise-valid-looking encodings of U+FFFE and U+ as specifying

Re: Should Decode accept U+FFFE or U+FFFF (and other Unicode non-characters)?

2011-07-14 Thread Jeff Walden
, and otherwise reject only unpaired or mispaired surrogate code points. Is this exactly what ES5 requires? And if it is, should it be? Firefox has also treated otherwise-valid-looking encodings of U+FFFE and U+ as specifying that the replacement character U+FFFD be used. And the rationale

Should Decode accept U+FFFE or U+FFFF (and other Unicode non-characters)?

2009-10-08 Thread Jeff Walden
). After SpiderMonkey made that change I noticed some non-standard extra behavior: U+FFFE and U+ decode to the replacement character. ES5 doesn't say to do this -- the decode table categorizes only [0xD800, 0xDFFF] as invalid (when not in a surrogate pair) and resulting in a URIError