For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk
1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works:
foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so,
generally, lonely surrogates match /./.
Backreferences are allowed to consume the leading surrogate of a
Based on Ex1, looks like the input string is not read as a sequence of code
points when we try to find a match for \1. So it's mostly read as a
sequence of code points except when it's not. :/
On Wed, Jan 28, 2015 at 3:11 PM, André Bargull andre.barg...@udo.edu
wrote:
On 1/28/2015 2:51 PM,
Hello es-discuss,
TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ?
The ES6 unicode regexp spec is not very clear regarding what should happen
if the regexp or the matched string contains lonely surrogates (a lead
surrogate without a trail, or a trail without a lead). For example, for the
. operator,
Cool, thanks for clarifications!
To make sure, as per the intended semantics, we never allow splitting a
valid surrogate pair (= matching only one of the surrogates but not the
other), and thus we'll differ from the Java implementation here:
/foo(.+)bar\1/u.test(foo\uD834bar\uD834\uDC00); we say
4 matches
Mail list logo