Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

> I cannot see why changing the order of the alternation should have this 
> effect.

The first regex, r'(a|ab)*?b', looks for the first alternative group by 
matching left-to-right [1] stopping at the first matching alternation "a".  
Roughly, the regex simplifies to r'(a)*?b' giving 'a' in the captured group.

The second regex, r'(ab|a)*?b', looks for the first  alternative group by 
matching left-to-right [1] stopping at the first matching alternation "ab".  
Roughly, the regex simplifies to r'(ab)*?b' giving '' in the captured group.

>From there, I'm not clear on how a non-greedy kleene-star works with capturing 
>groups and with the overall span().  A starting point would be to look at the 
>re.DEBUG output for each pattern [2][3].

[1] From the re docs for the alternation operator:
As the target string is scanned, REs separated by '|' are tried from left to 
right. When one pattern completely matches, that branch is accepted. This means 
that once A matches, B will not be tested further, even if it would produce a 
longer overall match. In other words, the '|' operator is never greedy.

[2] re.DEBUG output for r'(a|ab)*?b'
 0. INFO 4 0b0 1 MAXREPEAT (to 5)
 5: REPEAT 19 0 MAXREPEAT (to 25)
 9.   MARK 0
11.   LITERAL 0x61 ('a')
13.   BRANCH 3 (to 17)
15.     JUMP 7 (to 23)
17:   branch 5 (to 22)
18.     LITERAL 0x62 ('b')
20.     JUMP 2 (to 23)
22:   FAILURE
23:   MARK 1
25: MIN_UNTIL
26. LITERAL 0x62 ('b')
28. SUCCESS

[3] re.DEBUG output for r'(ab|a)*?b'
MIN_REPEAT 0 MAXREPEAT
  SUBPATTERN 1 0 0
    LITERAL 97
    BRANCH
      LITERAL 98
    OR
LITERAL 98

 0. INFO 4 0b0 1 MAXREPEAT (to 5)
 5: REPEAT 19 0 MAXREPEAT (to 25)
 9.   MARK 0
11.   LITERAL 0x61 ('a')
13.   BRANCH 5 (to 19)
15.     LITERAL 0x62 ('b')
17.     JUMP 5 (to 23)
19:   branch 3 (to 22)
20.     JUMP 2 (to 23)
22:   FAILURE
23:   MARK 1
25: MIN_UNTIL
26. LITERAL 0x62 ('b')
28. SUCCESS

----------
nosy: +rhettinger

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35859>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to