Re: Full Unicode based on UTF-16 proposal

2012-03-19 Thread Steven L.
Steven Levithan wrote: \w with Unicode should match [\p{L}\{Nd}_]. The best way to go for [[:alnum:]], for compatibility reasons, would probably be [\p{Ll}\p{Lu}\p{Lt}\p{Nd}]. This difference could be argued as a positive (if you like that exact set) or a negative (many users will think it's

Re: Full Unicode based on UTF-16 proposal

2012-03-19 Thread Steven L.
a lowercase (?u) flag for Unicode-aware case folding. -- Steven Levithan -Original Message- From: Steven L. Sent: Monday, March 19, 2012 12:21 PM To: Erik Corry Cc: es-discuss@mozilla.org Subject: Re: Full Unicode based on UTF-16 proposal Steven Levithan wrote: \w with Unicode should match

Re: Full Unicode based on UTF-16 proposal

2012-03-18 Thread Steven L.
Steven Levithan wrote: * \s == [\x09-\x0D] -- Java, PCRE, Ruby, Python (default). * \s == [\x09–\x0D\p{Z}] -- ES-current, .NET, Perl, Python (with (?u)). Oops. My ASCII-only version of \s is obviously missing space \x20 and no-break space \xAO (which are included in Unicode's \p{Z}). Erik

Re: RegExp lookbehind

2012-03-18 Thread Steven L.
Erik Corry wrote: Steven Levithan wrote: In practice, at least for [Java-style finite-length] lookbehind, this attempt to avoid far-back searches is kind of silly--e.g., Java lets you use a quantifier like {0,10} within lookbehind. At least this provides something to point the user at

Re: Default non-capturing regex flag [WAS: how to createstrawmanproposals?]

2012-03-18 Thread Steven L.
-Original Message- From: Steven L. Sent: Sunday, March 18, 2012 12:31 AM To: mikesam...@gmail.com ; EcmaScript Subject: Re: Default non-capturing regex flag [WAS: how to createstrawmanproposals?] I'm seeing this for the first time now. Sorry for reviving old news. On 2011-06-03, Brendan Eich

Re: RegExp lookbehind

2012-03-18 Thread Steven L.
Note: I've posted the tests from my previous email at http://stevenlevithan.com/regex/tests/lookbehind.html -- Steven Levithan ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-18 Thread Steven L.
Erik Corry wrote: Steven Levithan wrote: Anyway, this is probably all moot, unless someone wants to officially propose POSIX character classes for ES RegExp. ...In which case I'll be happy to state about a half-dozen reasons to not do so. :) Please do, they seem quite sensible to me. My

Re: RegExp lookbehind

2012-03-17 Thread Steven L.
Brendan Eich wrote: We'll get this strawman: http://wiki.ecmascript.org/doku.php?id=strawman:steve_levithan_regexp_api_improvements onto the agenda for the next TC39 meeting. Neat-o. I mentioned in the blog post referenced there [1] that I'd be doing a separate write-up with my opinionated

Re: Full Unicode based on UTF-16 proposal

2012-03-17 Thread Steven L.
Eric Corry wrote: However I think we probably do want the /u modifier on regexps to control the new backward-incompatible behaviour. There may be some way to relax this for regexp literals in opted in Harmony code, but for new RegExp(...) and for other string literals I think there are rather

Re: RegExp lookbehind

2012-03-17 Thread Steven L.
://msdn.microsoft.com/en-us/library/yd1hzczs.aspx#RightToLeft On Sat, Mar 17, 2012 at 1:07 PM, Lasse Reichstein reichsteinatw...@gmail.com wrote: On Sat, Mar 17, 2012 at 9:00 AM, Steven L. steves_l...@hotmail.com wrote: I thought so, too. See [2] from Waldemar (May 24-26 rough meeting notes

Re: Full Unicode based on UTF-16 proposal

2012-03-17 Thread Steven L.
Eric Corry wrote: Disagree with adding /u for this purpose and disagree with breaking backward compatibility to let `/./.exec(s)[0].length == 2`. Care to enlighten us with any thinking behind this disagreeing? Sorry for the rushed and overly ebullient message. I disagreed with /u for

Re: Full Unicode based on UTF-16 proposal

2012-03-17 Thread Steven L.
Eric Corry wrote: I further objected because I think the /u flag would be better used as a ASCII/Unicode mode switcher for \d\w\b. My proposal for this is based on Python's re.UNICODE or (?u) flag, which does the same thing except that it also covers \s (which is already Unicode-based in ES).

Re: RegExp lookbehind

2012-03-17 Thread Steven L.
Lasse Reichstein wrote: I would simply apply same logic we have already for the look ahead ... or you think that would cause problems? I'm not sure it even makes sense. ES RegExps are backtracking based, and it makes a difference in which order alternatives are tried. Greedy matching is

Re: Default non-capturing regex flag [WAS: how to create strawmanproposals?]

2012-03-17 Thread Steven L.
I'm seeing this for the first time now. Sorry for reviving old news. On 2011-06-03, Brendan Eich wrote: Kyle Simpson wrote: I propose a /n flag for regular expressions, which would swap the default capturing/non-capturing behavior between ( ) and (?: ) operators (that is, ( ) would not

RE: RegExp pet peeves (was: should calling RegExp constructor as function without arguments throw?)

2009-01-14 Thread Steven L .
Lasse R.H. Nielsen wrote: The only difference between an Atom and an Assertion is that the former can have a quantifier attached. There is absolutely no reason to put a quantifier on a look-ahead, and look-aheads are zero-width matches just like all assertions, so they would fit much better