[issue19055] Regular expressions: * does not match as many repetitions as possible.

2014-08-04 Thread Ezio Melotti

Ezio Melotti added the comment:

I agree.

--
resolution:  - works for me
stage: needs patch - resolved
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2014-08-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I think this issue can be closed. Behavior and documentation are correct and 
match one other. Nothing to do with it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-10-05 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
assignee:  - docs@python
components: +Documentation
keywords: +easy
nosy: +docs@python
stage:  - needs patch
type: behavior - enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-20 Thread Jason Stumpf

Jason Stumpf added the comment:

I like that clearer description.  as produce matches is more correct than as 
possible.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Jason Stumpf

New submission from Jason Stumpf:

 re.match('(a|ab)*',('aba')).group(0)
'a'

According to the documentation, the * should match as many repetitions as 
possible.  2 are possible, it matches 1.

Reversing the order of the operands of | changes the behaviour.

 re.match('(ab|a)*',('aba')).group(0)
'aba'

--
messages: 198116
nosy: Jason.Stumpf
priority: normal
severity: normal
status: open
title: Regular expressions: * does not match as many repetitions as possible.
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Jason Stumpf

Changes by Jason Stumpf jstu...@google.com:


--
components: +Regular Expressions
nosy: +ezio.melotti, mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread David Benbennick

Changes by David Benbennick dbenb...@gmail.com:


--
nosy: +dbenbenn

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
versions: +Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread janzert

janzert added the comment:

The documentation on the | operator in the re module pretty explicitly covers 
this. http://docs.python.org/2/library/re.html

A|B, where A and B can be arbitrary REs, creates a regular expression that 
will match either A or B. An arbitrary number of REs can be separated by the 
'|' in this way. This can be used inside groups (see below) as well. As the 
target string is scanned, REs separated by '|' are tried from left to right. 
When one pattern completely matches, that branch is accepted. This means that 
once A matches, B will not be tested further, even if it would produce a longer 
overall match. In other words, the '|' operator is never greedy.

--
nosy: +janzert

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Jason Stumpf

Jason Stumpf added the comment:

Even with the documentation to |, the documentation to * is wrong.

 re.match('(a|ab)*c',('abac')).group(0)
'abac'

From the doc: In general, if a string p matches A and another string q matches 
B, the string pq will match AB.

Since '(a|ab)*c' matches 'abac', and 'c' matches 'c', that means '(a|ab)*' 
matches 'aba'.  It does so with 2 repetitions.  Thus, in the example from my 
initial post, it was not matching with as many repetitions as possible.

I think what you mean is that * attempts to match again after each match of the 
preceding regular expression.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Jason Stumpf

Jason Stumpf added the comment:

Sorry, that implication was backwards.  I don't think I can prove from just the 
documentation that '(a|ab)*' can match 'aba' in certain contexts.

If the docs said: * attempts to match again after each match of the preceding 
regular expression. I think it would describe the observed behaviour.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Matthew Barnett

Matthew Barnett added the comment:

The behaviour is correct.

Here's a summary of what's happening:-


First iteration of the repeated group:

Try the first branch. Can match a.

Second iteration of the repeated group:

Try the first branch. Can't match a.
Try the second branch. Can't match ab.

Continue with the remainder of the pattern.

Can't match c, therefore backtrack to the first iteration of the repeated 
group:

Try the second branch. Can match ab.

Second iteration of the repeated group:

Try the first branch. Can match a.

Third iteration of the repeated group:

Try the first branch. Can't match a.
Try the second branch. Can't match ab.

Continue with the remainder of the pattern.

Can match c.

Reached the end of the pattern. It has matched abac.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Jason Stumpf

Jason Stumpf added the comment:

I understand what's happening, but that is not what the documentation 
describes.  If the behaviour is correct, the documentation is incorrect.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread R. David Murray

R. David Murray added the comment:

The documentation is correct and unambiguous.  Regular expressions just aren't 
very intuitive.

The documentation says Causes the resulting RE to match 0 or more repetitions 
of the preceding RE, as many repetitions as are possible.  as many 
repetitions of the preceding RE means that:

  (a|ab)*

is equivalent to

  (a|ab)(a|ab)(a|ab)...

where ... represents add copies of the RE until it doesn't match.

Then you have to look at the documentation of '|' to see what it matches, and 
that documentation (already quoted) explains why the expanded pattern only 
matches 'a' in 'aba'.

Perhaps it would be clearer if it read Causes the RE to be evaluated as if 
there were zero or more repetitions of the preceding RE, as many repetitions as 
produce matches?

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19055
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com