Philippe Verdy added the comment:
Umm I saif that the attribution to Thompson was wrong, in fact it
was correct. Thompson designed and documented the algorithm in 1968,
long before the Aho/Seti/Ullman green book... so the algorithm is more
than 40 years old, and still not in Python, Perl
Philippe Verdy added the comment:
Anyway, there are ways to speedup regexps, even without instructing the
regexps with anti-backtracking syntaxes.
See http://swtch.com/~rsc/regexp/regexp1.html
(article dated January 2007)
Which discusses how Perl, PCRE (and PHP), Python, Java, Ruby, .NET
Philippe Verdy added the comment:
You said that this extension was not implemented anywhere, and you were
wrong.
I've found that it IS implemented in Perl 6! Look at this discussion:
http://www.perlmonks.org/?node_id=602361
Look at how the matches in quantified capture groups are ret
Philippe Verdy added the comment:
>>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups()
('192', '1')
> If I understood correctly what you are proposing, you would like it to
return (['192'], ['168', '
Philippe Verdy added the comment:
>> And anyway, my suggestion is certainly much more useful than atomic
>> groups and possessive groups that have much lower use [...]
>Then why no one implemented it yet? :)
That's because they had to use something else than regexps to do
Philippe Verdy added the comment:
> a "general" regex (e.g. for an ipv6 address)
I know this problem, and I have already written about this. It is not
possible to parse it in a single regexp if it is written without using
repetitions. But in that case, the regexp becomes really
Philippe Verdy added the comment:
> Even with your solution, in most of the cases you will need additional
steps to assemble the results (at least in the cases with some kind of
separator, where you have to join the first element with the
followings).
Yes, but this step is trivial and fu
Philippe Verdy added the comment:
> That's why I wrote 'without checking if they are in range(256)'; the
fact that this regex matches invalid digits was not relevant in my
example (and it's usually easier to convert the digits to int and check
if 0 <= digits <= 255
Philippe Verdy added the comment:
ezio said:
>>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups()
('192', '1')
> If I understood correctly what you are proposing, you would like it to
return (['192'], ['168
Philippe Verdy added the comment:
I had read carefully ALL what ezio said, this is clear in the fact that
I have summarized my responses to ALL the 4 points given by ezio.
Capturing groups is a VERY useful feature of regular expressions, but
they currently DON'T work as expected (in a u
Philippe Verdy added the comment:
And anyway, my suggestion is certainly much more useful than atomic groups
and possessive groups that have much lower use, and which are already
being tested in Perl but that Python (or PCRE, PHP, and most
implementations of 'vi'/'ed',
Philippe Verdy added the comment:
Summary of your points with my responses :
> 1) it doesn't exist in any other implementation that I know;
That's exactly why I proposed to discuss it with the developers of other
implementations (I cited PCRE, Perl and PHP developers, ther
Philippe Verdy added the comment:
You're wrong, it WILL be compatible, because it is only conditioned by a
FLAG. The flag is there specifically for instructing the parser to
generate lists of values rather than single values.
Without the regular compilation flag set, as I said, there wi
Philippe Verdy added the comment:
In addition, your suggested regexp for IPv4:
'^(\d{1,3})(?:\.(\d{1,3})){3}$'
is completely WRONG ! It will match INVALID IPv4 address formats like
"000.000.000.000". Reread the RFCs... because "000.000.000.000" is
CERTAINLY
Philippe Verdy added the comment:
Note that I used the IPv4 address format only as an example. There are
plenty of other more complex cases for which we really need to capture the
multiple occurences of a capturing group within a repetition.
I'm NOT asking you how to parse it using MUL
Philippe Verdy added the comment:
Implementation details:
Currently, the capturing groups behave quite randomly in the values returned by
MachedObject, when backtracking occurs in a repetition. This
proposal will help fix the behavior, because it will also be much easier
to backtrack
Philippe Verdy added the comment:
Rationale for the compilation flag:
You could think that the compilation flag should not be needed. However,
not using it would mean that a LOT of existing regular expressions that
already contain capturing groups in repetitions, and for which the
Philippe Verdy added the comment:
I'd like to add that the same behavior should also affect the span(index)
method of MatchObject, that should also not just return a single (start,
end) pair, but that should in this case return a list of pairs, one for
each occurence, when t
New submission from Philippe Verdy :
For now, when capturing groups are used within repetitions, it is impossible to
capure what they match
individually within the list of matched repetitions.
E.g. the following regular expression:
(0|1[0-9]{0,2}|2(?:[0-4][0-9]?|5[0-5]?)?)(?:\.(0|1[0-9]{0,2
19 matches
Mail list logo