[issue27800] Regular expressions with multiple repeat codes

2016-10-14 Thread Martin Panter

Martin Panter added the comment:

I committed my patch as it was. I understand Silent Ghost’s objection was 
mainly that they thought the new paragraph or its positioning wouldn’t be very 
useful, but hopefully it is better than nothing. Perhaps in the future, the 
documentation could be restructured with subsections for repetition qualifiers 
and other kinds of special codes, which may help.

--
resolution:  -> fixed
stage: commit review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-10-14 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 5f7d7e079e39 by Martin Panter in branch '3.5':
Issue #27800: Document limitation and workaround for multiple RE repetitions
https://hg.python.org/cpython/rev/5f7d7e079e39

New changeset 1f2ca7e4b64e by Martin Panter in branch '3.6':
Issue #27800: Merge RE repetition doc from 3.5 into 3.6
https://hg.python.org/cpython/rev/1f2ca7e4b64e

New changeset 98456ab88ab0 by Martin Panter in branch 'default':
Issue #27800: Merge RE repetition doc from 3.6
https://hg.python.org/cpython/rev/98456ab88ab0

New changeset 94f02193f00f by Martin Panter in branch '2.7':
Issue #27800: Document limitation and workaround for multiple RE repetitions
https://hg.python.org/cpython/rev/94f02193f00f

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-10-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> martin.panter
stage: patch review -> commit review
versions: +Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-09-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

LGTM. Thanks Martin.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-09-03 Thread Martin Panter

Martin Panter added the comment:

Here is a patch for the documentation.

--
keywords: +patch
stage:  -> patch review
Added file: http://bugs.python.org/file44356/multiple-repeat.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-08-19 Thread Martin Panter

Martin Panter added the comment:

Okay so it sounds like my usage is valid if I add the brackets. I will try to 
come up with a documentation patch as some stage. The reason why it is not 
supported without brackets is to maintain a bit of consistency with the 
question mark (?), which modifies the preceding quantifier, and with the plus 
sign (+), which is also a modifier in other implementations.

For the record, Gnu grep does seem to accept my expression (although Posix says 
this is undefined, and neither support lazy or possessive quantifiers):

$ grep -E -o 'a{2}*' <<< "a"


However pcregrep, which supports lazy (?) and possessive (+) quantifiers, 
doesn’t like my expression:

$ pcregrep -o 'a{2}*' <<< "a"
pcregrep: Error in command-line regex at offset 4: nothing to repeat
[Exit 2]
$ pcregrep -o '(?:a{2})*' <<< "a"


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-08-19 Thread Terry J. Reedy

Terry J. Reedy added the comment:

This appears to be a doc issue to clarify that * cannot directly follow a 
repetition code.  I believe there have been other (non)bug reports like this 
before.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-08-19 Thread Matthew Barnett

Matthew Barnett added the comment:

"*" and the other quantifiers ("+", "?" and "{...}") operate on the preceding 
_item_, not the entire preceding expression. For example, "ab*" means "a" 
followed by zero or more repeats of "b".

You're not allowed to use multiple quantifiers together. The proper way is to 
use the non-capturing "(?:...)".

It's too late to change that because some of them already have a special 
meaning when used after another quantifier: "a*?" is a lazy quantifier, as are 
"a+?", "a??" and "a{1,4}?".

Many other regex implementations, including the "regex" module, use an 
additional "+" to signify a possessive quantifier: "a*+", "a++", "a?+" and 
"a{1,4}+".

That just leaves the additional "*", which is treated as an error in all the 
other regex implementations that I'm aware of.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-08-19 Thread R. David Murray

R. David Murray added the comment:

It seems perfectly logical and consistent to me.  {4} is a repeat count, as is 
*.  You get the same error if you do 'a?*', and the same bypass if you do 
'(a?)*' (though I haven't tested if that does anything useful :).  You don't 
need the ?:, as far as I can tell, you just need to have the * modifying a 
group, making the group the "preceding regular expression".

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27800] Regular expressions with multiple repeat codes

2016-08-19 Thread Martin Panter

New submission from Martin Panter:

In the documentation for the “re” module, it says repetition codes like {4} and 
“*” operate on the preceding regular expression. But even though “a{4}” is a 
valid expression, the obvious way to apply a “*” repetition to it fails:

>>> re.compile("a{4}*")
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/proj/python/cpython/Lib/re.py", line 223, in compile
return _compile(pattern, flags)
  File "/home/proj/python/cpython/Lib/re.py", line 292, in _compile
p = sre_compile.compile(pattern, flags)
  File "/home/proj/python/cpython/Lib/sre_compile.py", line 555, in compile
p = sre_parse.parse(p, flags)
  File "/home/proj/python/cpython/Lib/sre_parse.py", line 792, in parse
p = _parse_sub(source, pattern, 0)
  File "/home/proj/python/cpython/Lib/sre_parse.py", line 406, in _parse_sub
itemsappend(_parse(source, state))
  File "/home/proj/python/cpython/Lib/sre_parse.py", line 610, in _parse
source.tell() - here + len(this))
sre_constants.error: multiple repeat at position 4

As a workaround, I found I can wrap the inner repetition in (?:. . .):

>>> re.compile("(?:a{4})*")
re.compile('(?:a{4})*')

The problems with the workaround are (a) it is far from obvious, and (b) it 
adds more complicated syntax. Either this limitation should be documented, or 
if there is no good reason for it, it should be lifted. It is not clear if my 
workaround is entirely valid, or if I just found a way to bypass some sanity 
check.

My original use case was scanning a base-64 encoding for Issue 27799:

# Without the second level of brackets, this raises a "multiple repeat" error
chunk_re = br'(?: (?: [^A-Za-z0-9+/=]* [A-Za-z0-9+/=] ){4} )*'

--
components: Regular Expressions
messages: 273107
nosy: ezio.melotti, martin.panter, mrabarnett
priority: normal
severity: normal
status: open
title: Regular expressions with multiple repeat codes
type: behavior
versions: Python 2.7, Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com