[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Umm I saif that the attribution to Thompson was wrong, in fact it was correct. Thompson designed and documented the algorithm in 1968, long before the Aho/Seti/Ullman green book... so the algorithm is more than 40 years old, and still not in Python, Perl

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Anyway, there are ways to speedup regexps, even without instructing the regexps with anti-backtracking syntaxes. See http://swtch.com/~rsc/regexp/regexp1.html (article dated January 2007) Which discusses how Perl, PCRE (and PHP), Python, Java, Ruby, .NET

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: You said that this extension was not implemented anywhere, and you were wrong. I've found that it IS implemented in Perl 6! Look at this discussion: http://www.perlmonks.org/?node_id=602361 Look at how the matches in quantified capture groups are ret

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: >>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups() ('192', '1') > If I understood correctly what you are proposing, you would like it to return (['192'], ['168', '

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: >> And anyway, my suggestion is certainly much more useful than atomic >> groups and possessive groups that have much lower use [...] >Then why no one implemented it yet? :) That's because they had to use something else than regexps to do

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: > a "general" regex (e.g. for an ipv6 address) I know this problem, and I have already written about this. It is not possible to parse it in a single regexp if it is written without using repetitions. But in that case, the regexp becomes really

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: > Even with your solution, in most of the cases you will need additional steps to assemble the results (at least in the cases with some kind of separator, where you have to join the first element with the followings). Yes, but this step is trivial and fu

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: > That's why I wrote 'without checking if they are in range(256)'; the fact that this regex matches invalid digits was not relevant in my example (and it's usually easier to convert the digits to int and check if 0 <= digits <= 255

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: ezio said: >>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups() ('192', '1') > If I understood correctly what you are proposing, you would like it to return (['192'], ['168&#

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: I had read carefully ALL what ezio said, this is clear in the fact that I have summarized my responses to ALL the 4 points given by ezio. Capturing groups is a VERY useful feature of regular expressions, but they currently DON'T work as expected (in a u

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: And anyway, my suggestion is certainly much more useful than atomic groups and possessive groups that have much lower use, and which are already being tested in Perl but that Python (or PCRE, PHP, and most implementations of 'vi'/'ed',

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Summary of your points with my responses : > 1) it doesn't exist in any other implementation that I know; That's exactly why I proposed to discuss it with the developers of other implementations (I cited PCRE, Perl and PHP developers, ther

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: You're wrong, it WILL be compatible, because it is only conditioned by a FLAG. The flag is there specifically for instructing the parser to generate lists of values rather than single values. Without the regular compilation flag set, as I said, there wi

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: In addition, your suggested regexp for IPv4: '^(\d{1,3})(?:\.(\d{1,3})){3}$' is completely WRONG ! It will match INVALID IPv4 address formats like "000.000.000.000". Reread the RFCs... because "000.000.000.000" is CERTAINLY

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Note that I used the IPv4 address format only as an example. There are plenty of other more complex cases for which we really need to capture the multiple occurences of a capturing group within a repetition. I'm NOT asking you how to parse it using MUL

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Implementation details: Currently, the capturing groups behave quite randomly in the values returned by MachedObject, when backtracking occurs in a repetition. This proposal will help fix the behavior, because it will also be much easier to backtrack

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Rationale for the compilation flag: You could think that the compilation flag should not be needed. However, not using it would mean that a LOT of existing regular expressions that already contain capturing groups in repetitions, and for which the

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: I'd like to add that the same behavior should also affect the span(index) method of MatchObject, that should also not just return a single (start, end) pair, but that should in this case return a list of pairs, one for each occurence, when t

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
New submission from Philippe Verdy : For now, when capturing groups are used within repetitions, it is impossible to capure what they match individually within the list of matched repetitions. E.g. the following regular expression: (0|1[0-9]{0,2}|2(?:[0-4][0-9]?|5[0-5]?)?)(?:\.(0|1[0-9]{0,2