fyi, googling "tc39 regexp unicode" led to web-compat reasoning (learned
something new) @
https://github.com/tc39/proposal-regexp-unicode-property-escapes#what-about-backwards-compatibility
What about backwards compatibility?
In regular expressions without the u flag, the
Think of the `u` flag as a strict mode for regular expressions.
`/\a/u` throws, because there is no reason to escape `a` as `\a` --
therefore, if such an escape sequence is present, it's likely a user error.
The same goes for `/\-/u`. `-` only has special meaning within character
classes
jslint previously warned against unescaped literal "-" in regexp.
however, escaping "-" together with unicode flag "u", causes syntax error
in chrome/firefox/edge (and jslint has since removed warning):
```javascript
let rgx = /\-/u
VM21:1 Uncaught SyntaxErr
On Oct 29, 2018, at 21:55, J Decker wrote:
>
> https://esdiscuss.org/topic/expectations-around-line-ending-behavior-for-u-2028-and-u-2029#content-10
Your message was non-surprising to me: Most editors indeed do not heed the
Unicode lore on 2028 and 2029, as nobody uses these char
On Mon, Oct 29, 2018 at 1:50 PM Carsten Bormann wrote:
> On Oct 26, 2018, at 10:48, Claude Pache wrote:
> >
> > I have just tried to open a file containing U+2028 and U+2029 in four
> different text editors / integrated environments on my Mac. All of them
> recognise both c
On Oct 26, 2018, at 10:48, Claude Pache wrote:
>
> I have just tried to open a file containing U+2028 and U+2029 in four
> different text editors / integrated environments on my Mac. All of them
> recognise both characters as newlines (and increment the line number for
> tho
spec, is that all existing parsers and tooling for all languages
> > would also be updated to have line numbering that include U+2028/29
>
> There is also the somewhat widespread opinion that Unicode goofed by
> adding those characters and that the best thing to do with them is to
&g
On 10/29/18 2:04 PM, Logan Smyth wrote:
This means that the expectation, from the standpoint of
Unicode spec, is that all existing parsers and tooling for all languages
would also be updated to have line numbering that include U+2028/29
There is also the somewhat widespread opinion
Sounds good. This means that the expectation, from the standpoint of
Unicode spec, is that all existing parsers and tooling for all languages
would also be updated to have line numbering that include U+2028/29, or
else that the line numbers would indefinitely be out of sync with the line
numbers
:49 PM Logan Smyth wrote:
>
> Great, thank you for that resource Allen, it's helpful to have something
> concrete to consider.
>
> What you'd prefer is that that other languages should also be rendered with
> U+2028/29 as creating new lines, even though their specifications do
Great, thank you for that resource Allen, it's helpful to have something
concrete to consider.
What you'd prefer is that that other languages should also be rendered with
U+2028/29 as creating new lines, even though their specifications do not
define them as lines? That means that any parser
>
> Would it be worth exploring a definition of U+2028/29 in the spec such that
> they behave as line terminators for ASI, but otherwise do not increment
> things line number counts and behave as whitespace characters?
Diverging the definition of line terminator for the pur
> Le 24 oct. 2018 à 21:58, Logan Smyth a écrit :
>
> On the other hand, it seems like every editor that I've looked at so far will
> not render these characters as newlines,
I have just tried to open a file containing U+2028 and U+2029 in four different
text editors
On Oct 26, 2018, at 02:17, Allen Wirfs-Brock wrote:
>
> see https://www.unicode.org/versions/Unicode11.0.0/ch05.pdf#G10213
Please explain how this is even remotely relevant for a programming language.
(Clearly, this was written by people who were trying to encode word processing
text.
The
> On Oct 25, 2018, at 4:49 PM, Logan Smyth wrote:
>
> > Tools that do not consider U+2028/29 to be line breaks are not behaving as
> > they should according to the latest Unicode standard.
>
> That's part of what I'm attempting to understand. What specifically
> Tools that do not consider U+2028/29 to be line breaks are not behaving
as they should according to the latest Unicode standard.
That's part of what I'm attempting to understand. What specifically does
Unicode require for these code points? What are the expectations for
languages that h
the original source. As currently
specified, a line number in a stack trace takes U+2028/29 into account, and
thus requires any consumer of this source code and line number value needs to
have a special case for JS code. It seems unrealistic to expect every piece of
tooling that works with source code
On Oct 25, 2018, at 18:24, Logan Smyth wrote:
>
> 3. Diverge the definition of current source-code line from the current
> LineTerminatorSequence lexical grammar such that source line number is always
> /\r?\n/, which is what the user is realistically going to see in their edit
in a stack trace takes U+2028/29 into
account, and thus requires any consumer of this source code and line number
value needs to have a special case for JS code. It seems unrealistic to
expect every piece of tooling that works with source code would have a
special case for JS code to take these 2
this sense. Editors and HTML are free to do what they want, but in
my opinion ECMAScript tooling at least should not pretend that these input
elements don't terminate lines.
On Wed, Oct 24, 2018 at 3:58 PM Logan Smyth wrote:
> Something I've recently realized just how much U+2028 and U+2029 bei
Something I've recently realized just how much U+2028 and U+2029 being
newlines introduces a mismatch between different parts of a dev
environment, and I'm curious for thoughts.
Engines understandable take these two characters into account when defining
their line number offsets in stack traces
actually it looks to be like a better place to put it is:
ClassEscape[U] :: [+U] -
allen
On Jan 19, 2015, at 9:45 PM, Norbert Lindenberg wrote:
I think the change proposed by Allen is fine. The main point of the new
definition of IdentityEscape is to reserve \p, \X, and other escape
, what
is the fix?
This construction for Identity Escape goes back to Norbert's original
proposal
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
Perhaps we need to add a:
ClassAttom[U] :: [+U] \-
production or some such to the pattern grammar
to Norbert's original
proposal
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
Perhaps we need to add a:
ClassAttom[U] :: [+U] \-
production or some such to the pattern grammar.
I think it’s a bug — see
https://codereview.chromium.org/788043005
I think it s a bug, and I think your proposal is appropriate.
From: al...@wirfs-brock.com
Subject: escaping - in /u RegExp
Date: Tue, 13 Jan 2015 13:23:54 -0800
To: es-discuss@mozilla.org
Would those of you who consider yourselves RegExp experts take a look at
https://bugs.ecmascript.org
-supplementary-characters/index.html
Perhaps we need to add a:
ClassAttom[U] :: [+U] \-
production or some such to the pattern grammar.
Allen___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss
On Nov 22, 2013, at 11:02 PM, Mathias Bynens wrote:
It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to
(1) and (2) because of the following:
RegExpUnicodeEscapeSequence[U] ::
[+U] LeadSurrogate \u TrailSurrogate
…but I was looking for confirmation
One more related question: are these three regular expression literals
equivalent?
1. `/[-]/u`: raw astral symbols
2. `/[\u{1F4A9}-\u{1F4AB}]/u`: astral symbols represented using Unicode code
point escape sequences
3. `/[\uD83D\uDCA9-\uD83D\uDCAB]/u`: astral symbols represented as a surrogate
Bynens wrote:
One more related question: are these three regular expression literals
equivalent?
1. `/[-]/u`: raw astral symbols
2. `/[\u{1F4A9}-\u{1F4AB}]/u`: astral symbols represented using Unicode code
point escape sequences
3. `/[\uD83D\uDCA9-\uD83D\uDCAB]/u`: astral symbols
/~jorendorff/es6-draft.html#sec-patterns
It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to
(1) and (2) because of the following:
RegExpUnicodeEscapeSequence[U] ::
[+U] LeadSurrogate \u TrailSurrogate
…but I was looking for confirmation
If I’m reading the latest draft correctly, `RegExpUnicodeEscapeSequence`s
aren’t allowed in regular expressions without the `u` flag. Why is that?
AFAICT, the only situations that require looking at code points rather than
UCS-2/UTF-16 code units in order to support full Unicode
On Thu, Nov 21, 2013 at 2:41 PM, Mathias Bynens math...@qiwi.be wrote:
I’d suggest allowing `\u{xx}`-style escape sequences everywhere, and
simply changing the behavior of the resulting regular expression depending
on the `u` flag. There’s no good reason to disallow e.g. `/\u{20}/` or even
On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens math...@qiwi.be wrote:
After comparing the output, I noticed that both regular expressions are
identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE
in `IdentifierStart` and `IdentifierPart`, but ECMAScript 6 / Unicode
On 18 Sep 2013, at 21:05, Anne van Kesteren ann...@annevk.nl wrote:
On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens math...@qiwi.be wrote:
After comparing the output, I noticed that both regular expressions are
identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL
TILDE
I had no intentions specific to U+2E2F when I proposed relying on UTR 31 - the
change is simply the effect of the character properties that the Unicode
Technical Committee assigned to this character.
I don't think there's a real problem. U+2E2F was added in Unicode version 5.1.
ECMAScript 5.1
I had no intentions specific to U+2E2F when I proposed relying on UTR 31 -
the change is simply the effect of the character properties that the Unicode
Technical Committee assigned to this character.
I don't think there's a real problem. U+2E2F was added in Unicode version
5.1
the output, I noticed that both regular expressions are
identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE
in `IdentifierStart` and `IdentifierPart`, but ECMAScript 6 / Unicode TR31
doesn’t.
Was this potentially breaking change intentional? I’m fine with disallowing
U+2E2F
to Unicode Standard Annex #31: Unicode Identifier and Pattern
Syntax (http://www.unicode.org/reports/tr31/).
After comparing the output, I noticed that both regular expressions are
identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE
in `IdentifierStart` and `IdentifierPart
Brendan Eich wrote:
Herby Vojčík wrote:
I am probably writing densely and you had little time. I have written
at the beginning of 1.:
'class ...}' as a sugar for 'function ...}.prototype'
(I put similar texts describing the idea to the header of 2. and 3. as
well)
I get it, but it is not
Herby Vojčík wrote:
what incoherency is there? It behaves consistently all over.
You are mixing coherent and consistent here. I explicitly distinguished
them. Making a declarative form that looks like a function declaration
have an expression form that evaluates differently is IMHO
Brendan Eich wrote:
Definitely, but classes have bigger issues than private syntax, and have
for a while. Class-side inheritance, body syntax, whether there should
be any declarative public syntax, what nested classes mean, static or
'class' members -- that's a partial list from memory.
Minimal
Sorry for the strange subject, I have written
object literal based class too minimal? (was: Re: @name)
but Postbox Express somehow ate it.
Herby
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss
Herby Vojčík wrote:
Brendan Eich wrote:
Definitely, but classes have bigger issues than private syntax, and have
for a while. Class-side inheritance, body syntax, whether there should
be any declarative public syntax, what nested classes mean, static or
'class' members -- that's a partial list
Brendan Eich wrote:
Herby Vojčík wrote:
Brendan Eich wrote:
Definitely, but classes have bigger issues than private syntax, and have
for a while. Class-side inheritance, body syntax, whether there should
be any declarative public syntax, what nested classes mean, static or
'class' members --
Herby Vojčík wrote:
class List (n) {
this.@arr = n === +n ? new Array(n) : [];
}.{
at (i) {
i = +i;
if (i=0 ithis.@arr.length) { return this.@arr[i]; }
else throw Out of bounds: +i;
}
size () { return this.@arr.length; }
}
[snip...]
List.{
from (array) {
var r =
Brendan Eich wrote:
That is coherent with new Foo - 'Foo is the class' means 'new Foo
returns new instance'.
Yes, but your first example, class List(n) {...} cited above at the very
top, uses .{ to add what looks like prototype methods at and size. If
class List(n){...} evaluates to the
Herby Vojčík wrote:
I am probably writing densely and you had little time. I have written
at the beginning of 1.:
'class ...}' as a sugar for 'function ...}.prototype'
(I put similar texts describing the idea to the header of 2. and 3. as
well)
I get it, but it is not coherent.
A function
this requirement as:
reject overlong UTF-8 sequences, and otherwise reject only unpaired or
mispaired surrogate code points. Is this exactly what ES5 requires? And
if it is, should it be? Firefox has also treated otherwise-valid-looking
encodings of U+FFFE and U+ as specifying
, and otherwise reject only unpaired or mispaired
surrogate code points. Is this exactly what ES5 requires? And if it is, should it be? Firefox
has also treated otherwise-valid-looking encodings of U+FFFE and U+ as specifying that the replacement
character U+FFFD be used. And the rationale
).
After SpiderMonkey made that change I noticed some non-standard extra behavior: U+FFFE and U+
decode to the replacement character. ES5 doesn't say to do this -- the decode table categorizes
only [0xD800, 0xDFFF] as invalid (when not in a surrogate pair) and resulting in a URIError
50 matches
Mail list logo