Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk 1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works: foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so, generally, lonely surrogates match /./. Backreferences are allowed to consume the leading surrogate of a

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Based on Ex1, looks like the input string is not read as a sequence of code points when we try to find a match for \1. So it's mostly read as a sequence of code points except when it's not. :/ On Wed, Jan 28, 2015 at 3:11 PM, André Bargull andre.barg...@udo.edu wrote: On 1/28/2015 2:51 PM,

Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Hello es-discuss, TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ? The ES6 unicode regexp spec is not very clear regarding what should happen if the regexp or the matched string contains lonely surrogates (a lead surrogate without a trail, or a trail without a lead). For example, for the . operator,

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Marja Hölttä
Cool, thanks for clarifications! To make sure, as per the intended semantics, we never allow splitting a valid surrogate pair (= matching only one of the surrogates but not the other), and thus we'll differ from the Java implementation here: /foo(.+)bar\1/u.test(foo\uD834bar\uD834\uDC00); we say