Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:
> I cannot see why changing the order of the alternation should have this > effect. The first regex, r'(a|ab)*?b', looks for the first alternative group by matching left-to-right [1] stopping at the first matching alternation "a". Roughly, the regex simplifies to r'(a)*?b' giving 'a' in the captured group. The second regex, r'(ab|a)*?b', looks for the first alternative group by matching left-to-right [1] stopping at the first matching alternation "ab". Roughly, the regex simplifies to r'(ab)*?b' giving '' in the captured group. >From there, I'm not clear on how a non-greedy kleene-star works with capturing >groups and with the overall span(). A starting point would be to look at the >re.DEBUG output for each pattern [2][3]. [1] From the re docs for the alternation operator: As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. [2] re.DEBUG output for r'(a|ab)*?b' 0. INFO 4 0b0 1 MAXREPEAT (to 5) 5: REPEAT 19 0 MAXREPEAT (to 25) 9. MARK 0 11. LITERAL 0x61 ('a') 13. BRANCH 3 (to 17) 15. JUMP 7 (to 23) 17: branch 5 (to 22) 18. LITERAL 0x62 ('b') 20. JUMP 2 (to 23) 22: FAILURE 23: MARK 1 25: MIN_UNTIL 26. LITERAL 0x62 ('b') 28. SUCCESS [3] re.DEBUG output for r'(ab|a)*?b' MIN_REPEAT 0 MAXREPEAT SUBPATTERN 1 0 0 LITERAL 97 BRANCH LITERAL 98 OR LITERAL 98 0. INFO 4 0b0 1 MAXREPEAT (to 5) 5: REPEAT 19 0 MAXREPEAT (to 25) 9. MARK 0 11. LITERAL 0x61 ('a') 13. BRANCH 5 (to 19) 15. LITERAL 0x62 ('b') 17. JUMP 5 (to 23) 19: branch 3 (to 22) 20. JUMP 2 (to 23) 22: FAILURE 23: MARK 1 25: MIN_UNTIL 26. LITERAL 0x62 ('b') 28. SUCCESS ---------- nosy: +rhettinger _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35859> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com