Raymond Hettinger <[email protected]> added the comment:
> I cannot see why changing the order of the alternation should have this
> effect.
The first regex, r'(a|ab)*?b', looks for the first alternative group by
matching left-to-right [1] stopping at the first matching alternation "a".
Roughly, the regex simplifies to r'(a)*?b' giving 'a' in the captured group.
The second regex, r'(ab|a)*?b', looks for the first alternative group by
matching left-to-right [1] stopping at the first matching alternation "ab".
Roughly, the regex simplifies to r'(ab)*?b' giving '' in the captured group.
>From there, I'm not clear on how a non-greedy kleene-star works with capturing
>groups and with the overall span(). A starting point would be to look at the
>re.DEBUG output for each pattern [2][3].
[1] From the re docs for the alternation operator:
As the target string is scanned, REs separated by '|' are tried from left to
right. When one pattern completely matches, that branch is accepted. This means
that once A matches, B will not be tested further, even if it would produce a
longer overall match. In other words, the '|' operator is never greedy.
[2] re.DEBUG output for r'(a|ab)*?b'
0. INFO 4 0b0 1 MAXREPEAT (to 5)
5: REPEAT 19 0 MAXREPEAT (to 25)
9. MARK 0
11. LITERAL 0x61 ('a')
13. BRANCH 3 (to 17)
15. JUMP 7 (to 23)
17: branch 5 (to 22)
18. LITERAL 0x62 ('b')
20. JUMP 2 (to 23)
22: FAILURE
23: MARK 1
25: MIN_UNTIL
26. LITERAL 0x62 ('b')
28. SUCCESS
[3] re.DEBUG output for r'(ab|a)*?b'
MIN_REPEAT 0 MAXREPEAT
SUBPATTERN 1 0 0
LITERAL 97
BRANCH
LITERAL 98
OR
LITERAL 98
0. INFO 4 0b0 1 MAXREPEAT (to 5)
5: REPEAT 19 0 MAXREPEAT (to 25)
9. MARK 0
11. LITERAL 0x61 ('a')
13. BRANCH 5 (to 19)
15. LITERAL 0x62 ('b')
17. JUMP 5 (to 23)
19: branch 3 (to 22)
20. JUMP 2 (to 23)
22: FAILURE
23: MARK 1
25: MIN_UNTIL
26. LITERAL 0x62 ('b')
28. SUCCESS
----------
nosy: +rhettinger
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35859>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com