Re: Suggested RegExp Improvements

2010-11-17 Thread Marc Harter


On Tue, 2010-11-16 at 13:09 +0100, Erik Corry wrote:
 2010/11/15 Marc Harter wav...@gmail.com:
  On Mon, 2010-11-15 at 14:06 +0100, Erik Corry wrote:
 
  Your proposal seems to allow variable length lookbehind.  This isn't
  allowed in perl as far as I know.  I just tried the following:
 
   perl -e 'foobarbaz =~ /a(?=(ob|bab))/;'
 
  which gives an error on perl5.  I think if we are going to allow
  variable length lookbehind we should first find out why they don't
  have it in perl.  I think the implementation is a little tricky if you
  want to support the full regexp language in lookbehinds.
 
  This was not my intention.  I am proposing zero-width lookbehind, which
  would not allow for the case you specified above.  I will update the
 
 The issue is not with the number of characters consumed by the
 assertion.  This is indeed zero.  The issue is with the width of the
 text matched by the disjunction inside the brackets.  This is not any
 disjunction, but rather a restricted part of the regexp language that
 can only match a particular number of characters.
 

Sorry about that.  I understand you now.

 It seems the .Net regexp library is able to handle arbitrary content
 in a lookbehind.  It is almost the only one.

Yes it appears that way, I wonder how beneficial that really is?  I
believe keeping the same disjunction we have for lookhead in ECMA-262
would make sense at this point in time but open to pushback.

 
 See http://www.regular-expressions.info/lookaround.html#lookbehind for
 more details.
 
 We could add this feature to JS.  As far as I can work out it
 presupposes the ability to reverse an arbitrary regexp and run it
 backwards (stepping back and backtracking forwards).  I don't think we
 should add it accidentally though, and perhaps the proposer should be
 the first to implement it.

I can take a stab at writing a more detailed description on how to
evaluate the Disjunction as Lasse Reichstein has pointed out
(http://www.mail-archive.com/es-discuss@mozilla.org/msg05218.html) but
wouldn't mind help if anyone else is interested or resources to
implementation specs for Perl lookbehind, I haven't found any yet, just
documentation.

 
  proposal.  It is my understanding that lookahead as implemented in
  ECMAScript also is zero-width and not variable.  This is also how Perl has
  implemented lookbehind.
 
  http://perldoc.perl.org/perlre.html#Extended-Patterns
 
  Updated Proposal:
  https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM
 
 The issue is not that the regexp doesn't match in perl. The issue is
 that it is not compiled at all.
 
 
  Is there an example of a language that supports the full regexp power
  in lookbehinds so we can look at their experiences with implementing
  it?
 
  As far as I know Perl is the de facto standard.
 
 
 
  2010/11/15 Marc Harter wav...@gmail.com:
  Brendan et al.,
 
  I have created a proposal for look-behind provided at this link:
 
 
  https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM
 
  I hope it is a format that will be helpful for discussion with TC39.
  Admittedly, I have never written one of these before so am completely open
  to any feedback or ways to improve the document from yourself or anyone
  else
  on this list.
 
  Marc
 
  On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:
 
  I would be game to write up a proposal for this.  When would you need
  this by to discuss w/ TC39?
 
  Thanks for your consideration,
  Marc
 
  On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:
 
  On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:
 
  After considering all the breadth this discussion could take maybe it
  would be wise to just focus on one issue at a time.  For me, the biggest
  missing feature is lookbehind.  Its common to most languages
  implementing the Perl-RegExp-syntax, it is very useful when looking for
  patterns that follow or don't follow a particular pattern.  I guess I'm
  confused why lookahead made it in but not lookbehind.
 
  This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but
  we
  proposed to ECMA TC39 TG1 (the JS group -- things were different then,
  including capitalization) something based on Perl 5. We didn't get
  everything, and we had to rationalize some obvious quirks.
 
  I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
  being left out on purpose. Waldemar may recall more, I'd handed him the
  JS
  keys inside netscape.com to go do mozilla.org.
 
  If you are game to write a proposal or mini-spec (in the style of ES5
  even), let me know. I'll chat with other TC39'ers next week about this.
 
  /be
 
 
  What do people
  think about including this feature?
 
  Marc
 
  On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
  I will start out with a disclaimer.  I have not read both ECMAScript
  specifications for 3 and now 5, so I admit that I am not an expert in
  the spec itself but 

Re: Suggested RegExp Improvements

2010-11-16 Thread Lasse Reichstein

On Mon, 15 Nov 2010 16:23:13 +0100, Marc Harter wav...@gmail.com wrote:

[look-behind allowing variable length body]


This was not my intention.  I am proposing zero-width lookbehind, which
would not allow for the case you specified above.


The grammar allows it. In ECMAScript it would be:
 foobarbaz.match(/a(?=(ob|bab)?)/
which would match the first a.
Had it been written
 foobarbaz.match(/a(?=(ob|bab)?.)/



I will update the
proposal.  It is my understanding that lookahead as implemented in
ECMAScript also is zero-width and not variable.  This is also how Perl
has implemented lookbehind.


The look-ahead in ECMAScript has a Disjunction as content, which basically  
means
that it can contain *any* RegExp (including quantified statements and  
other lookaheads).
This works fine because the semantics of the disjunction is the same as  
any other
disjunction in a RegExp: it's matched forwards from a position in the  
input.


Your proposal also uses a Disjunction as body, but it's not specified how  
to

evaluate that body so that it *ends* at the position of the assertion.
Executing a RegExp backwards isn't trivial. Well, mostly it is, by  
symmetry,

but it's not part of the spec.

The positive look-behind should probably be allowed to contain captures
that are still participating after the assertion succeeds (mirroring the  
semantics of

the positive look-ahead).

I believe PCRE allows variable length (but structurally simple)  
look-behinds, where
the structure ensures that it doesn't have to do backtracking while  
checking them, even
though Perl itself does not [1]. Whether that's a desired property or not  
is a different
question (I would actually prefer a full backwards-executed regexp to an  
artificial

restriction, but that's mainly ideology :).

/L
[1] http://www.regular-expressions.info/lookaround.html



http://perldoc.perl.org/perlre.html#Extended-Patterns

Updated Proposal:
https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM



Is there an example of a language that supports the full regexp power
in lookbehinds so we can look at their experiences with implementing
it?



As far as I know Perl is the de facto standard.





2010/11/15 Marc Harter wav...@gmail.com:
 Brendan et al.,

 I have created a proposal for look-behind provided at this link:

  
https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM


 I hope it is a format that will be helpful for discussion with TC39.
 Admittedly, I have never written one of these before so am completely  
open
 to any feedback or ways to improve the document from yourself or  
anyone else

 on this list.

 Marc

 On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?

 Thanks for your consideration,
 Marc

 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:

 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe  
it
 would be wise to just focus on one issue at a time.  For me, the  
biggest

 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking  
for
 patterns that follow or don't follow a particular pattern.  I guess  
I'm

 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!),  
but we
 proposed to ECMA TC39 TG1 (the JS group -- things were different  
then,

 including capitalization) something based on Perl 5. We didn't get
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
 being left out on purpose. Waldemar may recall more, I'd handed him  
the JS

 keys inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of ES5
 even), let me know. I'll chat with other TC39'ers next week about  
this.


 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both  
ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert  
in
 the spec itself but as I user of JavaScript, I would like to get  
some

 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared to  
Perl

 provided by (http://www.regular-expressions.info/javascript.html):

 * No \A or \Z anchors to match the start or end of the string.
   Use a caret or dollar instead.
 * Lookbehind is not supported at all. Lookahead is fully
   supported.
 * No atomic grouping or possessive quantifiers
 * No Unicode support, except for matching single characters  
with

   \u
 * No named capturing groups. Use 

Re: Suggested RegExp Improvements

2010-11-16 Thread Lasse Reichstein

[Unterminated statement detected, fixing ...]

On Tue, 16 Nov 2010 13:12:36 +0100, Lasse Reichstein  
reichsteinatw...@gmail.com wrote:



On Mon, 15 Nov 2010 16:23:13 +0100, Marc Harter wav...@gmail.com wrote:

[look-behind allowing variable length body]


This was not my intention.  I am proposing zero-width lookbehind, which
would not allow for the case you specified above.


The grammar allows it. In ECMAScript it would be:
  foobarbaz.match(/a(?=(ob|bab)?)/
which would match the first a.
Had it been written
  foobarbaz.match(/a(?=(ob|bab)?.)/


... then it would match a and capture ob, assuming semantics symmetric
to look-ahead.




I will update the
proposal.  It is my understanding that lookahead as implemented in
ECMAScript also is zero-width and not variable.  This is also how Perl
has implemented lookbehind.


The look-ahead in ECMAScript has a Disjunction as content, which  
basically means
that it can contain *any* RegExp (including quantified statements and  
other lookaheads).
This works fine because the semantics of the disjunction is the same as  
any other
disjunction in a RegExp: it's matched forwards from a position in the  
input.


Your proposal also uses a Disjunction as body, but it's not specified  
how to

evaluate that body so that it *ends* at the position of the assertion.
Executing a RegExp backwards isn't trivial. Well, mostly it is, by  
symmetry,

but it's not part of the spec.

The positive look-behind should probably be allowed to contain captures
that are still participating after the assertion succeeds (mirroring the  
semantics of

the positive look-ahead).

I believe PCRE allows variable length (but structurally simple)  
look-behinds, where
the structure ensures that it doesn't have to do backtracking while  
checking them, even
though Perl itself does not [1]. Whether that's a desired property or  
not is a different
question (I would actually prefer a full backwards-executed regexp to an  
artificial

restriction, but that's mainly ideology :).

/L
[1] http://www.regular-expressions.info/lookaround.html



http://perldoc.perl.org/perlre.html#Extended-Patterns

Updated Proposal:
https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM



Is there an example of a language that supports the full regexp power
in lookbehinds so we can look at their experiences with implementing
it?



As far as I know Perl is the de facto standard.





2010/11/15 Marc Harter wav...@gmail.com:
 Brendan et al.,

 I have created a proposal for look-behind provided at this link:

  
https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM


 I hope it is a format that will be helpful for discussion with TC39.
 Admittedly, I have never written one of these before so am  
completely open
 to any feedback or ways to improve the document from yourself or  
anyone else

 on this list.

 Marc

 On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?

 Thanks for your consideration,
 Marc

 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com  
wrote:


 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe  
it
 would be wise to just focus on one issue at a time.  For me, the  
biggest

 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when  
looking for
 patterns that follow or don't follow a particular pattern.  I  
guess I'm

 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!),  
but we
 proposed to ECMA TC39 TG1 (the JS group -- things were different  
then,

 including capitalization) something based on Perl 5. We didn't get
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July  
'98)
 being left out on purpose. Waldemar may recall more, I'd handed him  
the JS

 keys inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of  
ES5
 even), let me know. I'll chat with other TC39'ers next week about  
this.


 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both  
ECMAScript
 specifications for 3 and now 5, so I admit that I am not an  
expert in
 the spec itself but as I user of JavaScript, I would like to get  
some

 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared to  
Perl

 provided by (http://www.regular-expressions.info/javascript.html):

 * No \A or \Z anchors to match the start or end of the string.
   Use a caret or dollar instead.
 * Lookbehind is not 

Re: Suggested RegExp Improvements

2010-11-16 Thread Mike Samuel
2010/11/16 Erik Corry erik.co...@gmail.com:
 2010/11/15 Marc Harter wav...@gmail.com:
 On Mon, 2010-11-15 at 14:06 +0100, Erik Corry wrote:

 Your proposal seems to allow variable length lookbehind.  This isn't
 allowed in perl as far as I know.  I just tried the following:

  perl -e 'foobarbaz =~ /a(?=(ob|bab))/;'

 which gives an error on perl5.  I think if we are going to allow
 variable length lookbehind we should first find out why they don't
 have it in perl.  I think the implementation is a little tricky if you
 want to support the full regexp language in lookbehinds.

 This was not my intention.  I am proposing zero-width lookbehind, which
 would not allow for the case you specified above.  I will update the

 The issue is not with the number of characters consumed by the
 assertion.  This is indeed zero.  The issue is with the width of the
 text matched by the disjunction inside the brackets.  This is not any
 disjunction, but rather a restricted part of the regexp language that
 can only match a particular number of characters.

 It seems the .Net regexp library is able to handle arbitrary content
 in a lookbehind.  It is almost the only one.

 See http://www.regular-expressions.info/lookaround.html#lookbehind for
 more details.

 We could add this feature to JS.  As far as I can work out it
 presupposes the ability to reverse an arbitrary regexp and run it
 backwards (stepping back and backtracking forwards).  I don't think we
 should add it accidentally though, and perhaps the proposer should be
 the first to implement it.

Don't you already have to do that to efficiently handle a regexp that
ends at the end of the input (in JS, a non multiline $, or \z in
java.util.regex parlance)?
If you have the whole input string available in memory, and are trying
to figure out whether a lookbehind (?=x) matches at position p, can't
you just test /(?:x)$/ against the prefix of the input of length p.


 proposal.  It is my understanding that lookahead as implemented in
 ECMAScript also is zero-width and not variable.  This is also how Perl has
 implemented lookbehind.

 http://perldoc.perl.org/perlre.html#Extended-Patterns

 Updated Proposal:
 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 The issue is not that the regexp doesn't match in perl. The issue is
 that it is not compiled at all.


 Is there an example of a language that supports the full regexp power
 in lookbehinds so we can look at their experiences with implementing
 it?

 As far as I know Perl is the de facto standard.



 2010/11/15 Marc Harter wav...@gmail.com:
 Brendan et al.,

 I have created a proposal for look-behind provided at this link:


 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 I hope it is a format that will be helpful for discussion with TC39.
 Admittedly, I have never written one of these before so am completely open
 to any feedback or ways to improve the document from yourself or anyone
 else
 on this list.

 Marc

 On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?

 Thanks for your consideration,
 Marc

 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:

 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe it
 would be wise to just focus on one issue at a time.  For me, the biggest
 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking for
 patterns that follow or don't follow a particular pattern.  I guess I'm
 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but
 we
 proposed to ECMA TC39 TG1 (the JS group -- things were different then,
 including capitalization) something based on Perl 5. We didn't get
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
 being left out on purpose. Waldemar may recall more, I'd handed him the
 JS
 keys inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of ES5
 even), let me know. I'll chat with other TC39'ers next week about this.

 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared to Perl
 provided by (http://www.regular-expressions.info/javascript.html):

     * No \A or \Z anchors to match the start 

Re: Suggested RegExp Improvements

2010-11-16 Thread Erik Corry
2010/11/16 Mike Samuel mikesam...@gmail.com:
 2010/11/16 Erik Corry erik.co...@gmail.com:
 2010/11/15 Marc Harter wav...@gmail.com:
 On Mon, 2010-11-15 at 14:06 +0100, Erik Corry wrote:

 Your proposal seems to allow variable length lookbehind.  This isn't
 allowed in perl as far as I know.  I just tried the following:

  perl -e 'foobarbaz =~ /a(?=(ob|bab))/;'

 which gives an error on perl5.  I think if we are going to allow
 variable length lookbehind we should first find out why they don't
 have it in perl.  I think the implementation is a little tricky if you
 want to support the full regexp language in lookbehinds.

 This was not my intention.  I am proposing zero-width lookbehind, which
 would not allow for the case you specified above.  I will update the

 The issue is not with the number of characters consumed by the
 assertion.  This is indeed zero.  The issue is with the width of the
 text matched by the disjunction inside the brackets.  This is not any
 disjunction, but rather a restricted part of the regexp language that
 can only match a particular number of characters.

 It seems the .Net regexp library is able to handle arbitrary content
 in a lookbehind.  It is almost the only one.

 See http://www.regular-expressions.info/lookaround.html#lookbehind for
 more details.

 We could add this feature to JS.  As far as I can work out it
 presupposes the ability to reverse an arbitrary regexp and run it
 backwards (stepping back and backtracking forwards).  I don't think we
 should add it accidentally though, and perhaps the proposer should be
 the first to implement it.

 Don't you already have to do that to efficiently handle a regexp that
 ends at the end of the input (in JS, a non multiline $, or \z in
 java.util.regex parlance)?

V8 doesn't have a general form of that optimization.  Do the others?

 If you have the whole input string available in memory, and are trying
 to figure out whether a lookbehind (?=x) matches at position p, can't
 you just test /(?:x)$/ against the prefix of the input of length p.


 proposal.  It is my understanding that lookahead as implemented in
 ECMAScript also is zero-width and not variable.  This is also how Perl has
 implemented lookbehind.

 http://perldoc.perl.org/perlre.html#Extended-Patterns

 Updated Proposal:
 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 The issue is not that the regexp doesn't match in perl. The issue is
 that it is not compiled at all.


 Is there an example of a language that supports the full regexp power
 in lookbehinds so we can look at their experiences with implementing
 it?

 As far as I know Perl is the de facto standard.



 2010/11/15 Marc Harter wav...@gmail.com:
 Brendan et al.,

 I have created a proposal for look-behind provided at this link:


 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 I hope it is a format that will be helpful for discussion with TC39.
 Admittedly, I have never written one of these before so am completely open
 to any feedback or ways to improve the document from yourself or anyone
 else
 on this list.

 Marc

 On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?

 Thanks for your consideration,
 Marc

 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:

 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe it
 would be wise to just focus on one issue at a time.  For me, the biggest
 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking for
 patterns that follow or don't follow a particular pattern.  I guess I'm
 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but
 we
 proposed to ECMA TC39 TG1 (the JS group -- things were different then,
 including capitalization) something based on Perl 5. We didn't get
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
 being left out on purpose. Waldemar may recall more, I'd handed him the
 JS
 keys inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of ES5
 even), let me know. I'll chat with other TC39'ers next week about this.

 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared 

Re: Suggested RegExp Improvements

2010-11-15 Thread Erik Corry
Your proposal seems to allow variable length lookbehind.  This isn't
allowed in perl as far as I know.  I just tried the following:

 perl -e 'foobarbaz =~ /a(?=(ob|bab))/;'

which gives an error on perl5.  I think if we are going to allow
variable length lookbehind we should first find out why they don't
have it in perl.  I think the implementation is a little tricky if you
want to support the full regexp language in lookbehinds.

Is there an example of a language that supports the full regexp power
in lookbehinds so we can look at their experiences with implementing
it?




2010/11/15 Marc Harter wav...@gmail.com:
 Brendan et al.,

 I have created a proposal for look-behind provided at this link:

 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 I hope it is a format that will be helpful for discussion with TC39.
 Admittedly, I have never written one of these before so am completely open
 to any feedback or ways to improve the document from yourself or anyone else
 on this list.

 Marc

 On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?

 Thanks for your consideration,
 Marc

 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:

 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe it
 would be wise to just focus on one issue at a time.  For me, the biggest
 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking for
 patterns that follow or don't follow a particular pattern.  I guess I'm
 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we
 proposed to ECMA TC39 TG1 (the JS group -- things were different then,
 including capitalization) something based on Perl 5. We didn't get
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
 being left out on purpose. Waldemar may recall more, I'd handed him the JS
 keys inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of ES5
 even), let me know. I'll chat with other TC39'ers next week about this.

 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared to Perl
 provided by (http://www.regular-expressions.info/javascript.html):

 * No \A or \Z anchors to match the start or end of the string.
   Use a caret or dollar instead.
 * Lookbehind is not supported at all. Lookahead is fully
   supported.
 * No atomic grouping or possessive quantifiers
 * No Unicode support, except for matching single characters with
   \u
 * No named capturing groups. Use numbered capturing groups
   instead.
 * No mode modifiers to set matching options within the regular
   expression.
 * No conditionals.
 * No regular expression comments. Describe your regular
   expression with JavaScript // comments instead, outside the
   regular expression string.

 I don't know if all of these need to be in the language but there
 have been some that I have personally wanted to use:

 * Lookbehind!  ECMAScript fully supports lookahead, why not
   lookbehind?  Seems like a big hole to me.
 * Named capturing groups and comments (e.g.
   http://xregexp.com/syntax/).  Mostly I argue for this because
   it makes RegExp matches more self-documenting.  Regular
   Expressions are already cryptic as it is.

 I do like some of the new flags proposed in
 (http://xregexp.com/flags/) but personally haven't used them but maybe
 that is something also for discussion.

 Marc Harter

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Suggested RegExp Improvements

2010-11-15 Thread Marc Harter
On Mon, 2010-11-15 at 14:06 +0100, Erik Corry wrote:

 Your proposal seems to allow variable length lookbehind.  This isn't
 allowed in perl as far as I know.  I just tried the following:
 
  perl -e 'foobarbaz =~ /a(?=(ob|bab))/;'
 
 which gives an error on perl5.  I think if we are going to allow
 variable length lookbehind we should first find out why they don't
 have it in perl.  I think the implementation is a little tricky if you
 want to support the full regexp language in lookbehinds.


This was not my intention.  I am proposing zero-width lookbehind, which
would not allow for the case you specified above.  I will update the
proposal.  It is my understanding that lookahead as implemented in
ECMAScript also is zero-width and not variable.  This is also how Perl
has implemented lookbehind.

http://perldoc.perl.org/perlre.html#Extended-Patterns

Updated Proposal:
https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM


 Is there an example of a language that supports the full regexp power
 in lookbehinds so we can look at their experiences with implementing
 it?


As far as I know Perl is the de facto standard.


 
 
 2010/11/15 Marc Harter wav...@gmail.com:
  Brendan et al.,
 
  I have created a proposal for look-behind provided at this link:
 
  https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM
 
  I hope it is a format that will be helpful for discussion with TC39.
  Admittedly, I have never written one of these before so am completely open
  to any feedback or ways to improve the document from yourself or anyone else
  on this list.
 
  Marc
 
  On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:
 
  I would be game to write up a proposal for this.  When would you need
  this by to discuss w/ TC39?
 
  Thanks for your consideration,
  Marc
 
  On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:
 
  On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:
 
  After considering all the breadth this discussion could take maybe it
  would be wise to just focus on one issue at a time.  For me, the biggest
  missing feature is lookbehind.  Its common to most languages
  implementing the Perl-RegExp-syntax, it is very useful when looking for
  patterns that follow or don't follow a particular pattern.  I guess I'm
  confused why lookahead made it in but not lookbehind.
 
  This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we
  proposed to ECMA TC39 TG1 (the JS group -- things were different then,
  including capitalization) something based on Perl 5. We didn't get
  everything, and we had to rationalize some obvious quirks.
 
  I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
  being left out on purpose. Waldemar may recall more, I'd handed him the JS
  keys inside netscape.com to go do mozilla.org.
 
  If you are game to write a proposal or mini-spec (in the style of ES5
  even), let me know. I'll chat with other TC39'ers next week about this.
 
  /be
 
 
  What do people
  think about including this feature?
 
  Marc
 
  On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
  I will start out with a disclaimer.  I have not read both ECMAScript
  specifications for 3 and now 5, so I admit that I am not an expert in
  the spec itself but as I user of JavaScript, I would like to get some
  expert discussion over this topic as proposed enhancements to the
  RegExp engine for Harmony.
 
  I will start with a list of lacking features in JS as compared to Perl
  provided by (http://www.regular-expressions.info/javascript.html):
 
  * No \A or \Z anchors to match the start or end of the string.
Use a caret or dollar instead.
  * Lookbehind is not supported at all. Lookahead is fully
supported.
  * No atomic grouping or possessive quantifiers
  * No Unicode support, except for matching single characters with
\u
  * No named capturing groups. Use numbered capturing groups
instead.
  * No mode modifiers to set matching options within the regular
expression.
  * No conditionals.
  * No regular expression comments. Describe your regular
expression with JavaScript // comments instead, outside the
regular expression string.
 
  I don't know if all of these need to be in the language but there
  have been some that I have personally wanted to use:
 
  * Lookbehind!  ECMAScript fully supports lookahead, why not
lookbehind?  Seems like a big hole to me.
  * Named capturing groups and comments (e.g.
http://xregexp.com/syntax/).  Mostly I argue for this because
it makes RegExp matches more self-documenting.  Regular
Expressions are already cryptic as it is.
 
  I do like some of the new flags proposed in
  (http://xregexp.com/flags/) but personally haven't used them but maybe
  that is something also for discussion.
 
  Marc Harter
 
  

Re: Suggested RegExp Improvements

2010-11-14 Thread Marc Harter
Brendan et al.,

I have created a proposal for look-behind provided at this link:

https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

I hope it is a format that will be helpful for discussion with TC39.
Admittedly, I have never written one of these before so am completely
open to any feedback or ways to improve the document from yourself or
anyone else on this list.

Marc

On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?
 
 Thanks for your consideration,
 Marc
 
 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:
 
  On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:
 
  After considering all the breadth this discussion could take maybe it
  would be wise to just focus on one issue at a time.  For me, the biggest
  missing feature is lookbehind.  Its common to most languages
  implementing the Perl-RegExp-syntax, it is very useful when looking for
  patterns that follow or don't follow a particular pattern.  I guess I'm
  confused why lookahead made it in but not lookbehind.
 
  This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we 
  proposed to ECMA TC39 TG1 (the JS group -- things were different then, 
  including capitalization) something based on Perl 5. We didn't get 
  everything, and we had to rationalize some obvious quirks.
 
  I don't remember lookbehind (which emerged in Perl 5.005 in July '98) being 
  left out on purpose. Waldemar may recall more, I'd handed him the JS keys 
  inside netscape.com to go do mozilla.org.
 
  If you are game to write a proposal or mini-spec (in the style of ES5 
  even), let me know. I'll chat with other TC39'ers next week about this.
 
  /be
 
 
  What do people
  think about including this feature?
 
  Marc
 
  On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
  I will start out with a disclaimer.  I have not read both ECMAScript
  specifications for 3 and now 5, so I admit that I am not an expert in
  the spec itself but as I user of JavaScript, I would like to get some
  expert discussion over this topic as proposed enhancements to the
  RegExp engine for Harmony.
 
  I will start with a list of lacking features in JS as compared to Perl
  provided by (http://www.regular-expressions.info/javascript.html):
 
  * No \A or \Z anchors to match the start or end of the string.
Use a caret or dollar instead.
  * Lookbehind is not supported at all. Lookahead is fully
supported.
  * No atomic grouping or possessive quantifiers
  * No Unicode support, except for matching single characters with
\u
  * No named capturing groups. Use numbered capturing groups
instead.
  * No mode modifiers to set matching options within the regular
expression.
  * No conditionals.
  * No regular expression comments. Describe your regular
expression with JavaScript // comments instead, outside the
regular expression string.
 
  I don't know if all of these need to be in the language but there
  have been some that I have personally wanted to use:
 
  * Lookbehind!  ECMAScript fully supports lookahead, why not
lookbehind?  Seems like a big hole to me.
  * Named capturing groups and comments (e.g.
http://xregexp.com/syntax/).  Mostly I argue for this because
it makes RegExp matches more self-documenting.  Regular
Expressions are already cryptic as it is.
 
  I do like some of the new flags proposed in
  (http://xregexp.com/flags/) but personally haven't used them but maybe
  that is something also for discussion.
 
  Marc Harter
 
  ___
  es-discuss mailing list
  es-discuss@mozilla.org
  https://mail.mozilla.org/listinfo/es-discuss
 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Suggested RegExp Improvements

2010-11-13 Thread Marc Harter
I would be game to write up a proposal for this.  When would you need
this by to discuss w/ TC39?

Thanks for your consideration,
Marc

On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:

 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe it
 would be wise to just focus on one issue at a time.  For me, the biggest
 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking for
 patterns that follow or don't follow a particular pattern.  I guess I'm
 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we 
 proposed to ECMA TC39 TG1 (the JS group -- things were different then, 
 including capitalization) something based on Perl 5. We didn't get 
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July '98) being 
 left out on purpose. Waldemar may recall more, I'd handed him the JS keys 
 inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of ES5 even), 
 let me know. I'll chat with other TC39'ers next week about this.

 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared to Perl
 provided by (http://www.regular-expressions.info/javascript.html):

 * No \A or \Z anchors to match the start or end of the string.
   Use a caret or dollar instead.
 * Lookbehind is not supported at all. Lookahead is fully
   supported.
 * No atomic grouping or possessive quantifiers
 * No Unicode support, except for matching single characters with
   \u
 * No named capturing groups. Use numbered capturing groups
   instead.
 * No mode modifiers to set matching options within the regular
   expression.
 * No conditionals.
 * No regular expression comments. Describe your regular
   expression with JavaScript // comments instead, outside the
   regular expression string.

 I don't know if all of these need to be in the language but there
 have been some that I have personally wanted to use:

 * Lookbehind!  ECMAScript fully supports lookahead, why not
   lookbehind?  Seems like a big hole to me.
 * Named capturing groups and comments (e.g.
   http://xregexp.com/syntax/).  Mostly I argue for this because
   it makes RegExp matches more self-documenting.  Regular
   Expressions are already cryptic as it is.

 I do like some of the new flags proposed in
 (http://xregexp.com/flags/) but personally haven't used them but maybe
 that is something also for discussion.

 Marc Harter

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Suggested RegExp Improvements

2010-11-12 Thread Marc Harter
After considering all the breadth this discussion could take maybe it
would be wise to just focus on one issue at a time.  For me, the biggest
missing feature is lookbehind.  Its common to most languages
implementing the Perl-RegExp-syntax, it is very useful when looking for
patterns that follow or don't follow a particular pattern.  I guess I'm
confused why lookahead made it in but not lookbehind.  What do people
think about including this feature?

Marc

On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.
 
 I will start with a list of lacking features in JS as compared to Perl
 provided by (http://www.regular-expressions.info/javascript.html):
 
   * No \A or \Z anchors to match the start or end of the string.
 Use a caret or dollar instead. 
   * Lookbehind is not supported at all. Lookahead is fully
 supported. 
   * No atomic grouping or possessive quantifiers 
   * No Unicode support, except for matching single characters with
 \u 
   * No named capturing groups. Use numbered capturing groups
 instead. 
   * No mode modifiers to set matching options within the regular
 expression. 
   * No conditionals. 
   * No regular expression comments. Describe your regular
 expression with JavaScript // comments instead, outside the
 regular expression string. 
 
 I don't know if all of these need to be in the language but there
 have been some that I have personally wanted to use:
 
   * Lookbehind!  ECMAScript fully supports lookahead, why not
 lookbehind?  Seems like a big hole to me. 
   * Named capturing groups and comments (e.g.
 http://xregexp.com/syntax/).  Mostly I argue for this because
 it makes RegExp matches more self-documenting.  Regular
 Expressions are already cryptic as it is.
 
 I do like some of the new flags proposed in
 (http://xregexp.com/flags/) but personally haven't used them but maybe
 that is something also for discussion.
 
 Marc Harter

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Suggested RegExp Improvements

2010-11-12 Thread Brendan Eich
On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe it
 would be wise to just focus on one issue at a time.  For me, the biggest
 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking for
 patterns that follow or don't follow a particular pattern.  I guess I'm
 confused why lookahead made it in but not lookbehind.

This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we 
proposed to ECMA TC39 TG1 (the JS group -- things were different then, 
including capitalization) something based on Perl 5. We didn't get everything, 
and we had to rationalize some obvious quirks.

I don't remember lookbehind (which emerged in Perl 5.005 in July '98) being 
left out on purpose. Waldemar may recall more, I'd handed him the JS keys 
inside netscape.com to go do mozilla.org.

If you are game to write a proposal or mini-spec (in the style of ES5 even), 
let me know. I'll chat with other TC39'ers next week about this.

/be


 What do people
 think about including this feature?
 
 Marc
 
 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.
 
 I will start with a list of lacking features in JS as compared to Perl
 provided by (http://www.regular-expressions.info/javascript.html):
 
  * No \A or \Z anchors to match the start or end of the string.
Use a caret or dollar instead. 
  * Lookbehind is not supported at all. Lookahead is fully
supported. 
  * No atomic grouping or possessive quantifiers 
  * No Unicode support, except for matching single characters with
\u 
  * No named capturing groups. Use numbered capturing groups
instead. 
  * No mode modifiers to set matching options within the regular
expression. 
  * No conditionals. 
  * No regular expression comments. Describe your regular
expression with JavaScript // comments instead, outside the
regular expression string. 
 
 I don't know if all of these need to be in the language but there
 have been some that I have personally wanted to use:
 
  * Lookbehind!  ECMAScript fully supports lookahead, why not
lookbehind?  Seems like a big hole to me. 
  * Named capturing groups and comments (e.g.
http://xregexp.com/syntax/).  Mostly I argue for this because
it makes RegExp matches more self-documenting.  Regular
Expressions are already cryptic as it is.
 
 I do like some of the new flags proposed in
 (http://xregexp.com/flags/) but personally haven't used them but maybe
 that is something also for discussion.
 
 Marc Harter
 
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Suggested RegExp Improvements

2010-11-12 Thread Waldemar Horwat

On 11/12/10 15:04, Brendan Eich wrote:

On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:


After considering all the breadth this discussion could take maybe it
would be wise to just focus on one issue at a time.  For me, the biggest
missing feature is lookbehind.  Its common to most languages
implementing the Perl-RegExp-syntax, it is very useful when looking for
patterns that follow or don't follow a particular pattern.  I guess I'm
confused why lookahead made it in but not lookbehind.


This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we 
proposed to ECMA TC39 TG1 (the JS group -- things were different then, 
including capitalization) something based on Perl 5. We didn't get everything, 
and we had to rationalize some obvious quirks.

I don't remember lookbehind (which emerged in Perl 5.005 in July '98) being 
left out on purpose. Waldemar may recall more, I'd handed him the JS keys 
inside netscape.com to go do mozilla.org.

If you are game to write a proposal or mini-spec (in the style of ES5 even), 
let me know. I'll chat with other TC39'ers next week about this.


The ES3 spec was based on what was stable at the time.  Perl had been 
experimenting with other constructs in regexp's, but there was some churn 
there, and I didn't want to go for features that were still in flux.

Waldemar
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss