[issue35496] left-to-right violation in match order

2018-12-16 Thread Steve Newcomb


Steve Newcomb  added the comment:

I'm very grateful for your time and attention, and sorry to have distracted 
you.  You're correct when you say:  

Steven D'Aprano: ...the rightmost alternative matches from position 1 of the 
text, while the leftmost alternative doesn't match until position 8. So 
starting from position 0, the IPV6 check matches first, and so wins.

I see now that what I was trying to do is simply not possible. I was looking 
for a way to do a kind of hat trick: to keep a matched substring (":::") 
out of matchObject.group(0).  I guess I just don't get to do that.  

It would be a nice feature to add: a "consume-and-forget" or "suppress" 
extension group type. Non-capturing groups forget about themselves, but they 
don't suppress their matched contents.  It's a nice thing to be able to do 
because some software accepts regular expressions as configuration items but 
doesn't allow configuration of selection among the groups that may appear 
within it.  (I admit there aren't many occasions when suppression of substrings 
from group(0) is really necessary, but I think they do occur.)

--
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35496] left-to-right violation in match order

2018-12-15 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

> See attached script, which is self-explanatory.

I'm glad one of us thinks so, because I find it clear as mud.

I spent *way* longer on this than I should have, but I simplified your sample 
code to the best of my ability. (See attached.) As far as I can tell, your code 
and mine does roughly the same thing, but please check that you agree.

I agree that with the IPV6 portion of the regex removed, it matches on 
"208.123.4.22", but with the IPV6 portion included, it matches on 
":::208.123.4.22". But I'm not sure that's a bug. I think it is working as 
designed. For example:


py> import re
py> text = 'green pepper'
py> re.search('pepper|green pepper', text).group(0)
'green pepper'


seems to be analogous to your example, but simpler. Do you agree? If not, it 
would also help a lot if you could find a simpler regex that demonstrates the 
issue. See http://www.sscce.org/

In your case, I believe that the rightmost alternative matches from position 1 
of the text, while the leftmost alternative doesn't match until position 8. So 
starting from position 0, the IPV6 check matches first, and so wins.

It is possible you were expecting that the IPV4 check would be tested against 
position 0, then position 1, then position 2, then ... and so on until the end 
of the string, and only then the IPV6 check tested against position 0, then 1 
etc.

--
nosy: +steven.daprano
Added file: https://bugs.python.org/file48000/logcheck3.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35496] left-to-right violation in match order

2018-12-14 Thread Steve Newcomb


New submission from Steve Newcomb :

Documentation for the re module insists that matches are made left-to-right 
within the alternatives delimited by an "or* | group.  I seem to have found a 
case where the rightmost alternative is matched unless it (and only it) is 
commented out.  See attached script, which is self-explanatory.

--
files: left-to-right_violation_in_python3_re_match.py
messages: 331838
nosy: steve.newcomb
priority: normal
severity: normal
status: open
title: left-to-right violation in match order
type: behavior
versions: Python 3.6
Added file: 
https://bugs.python.org/file47997/left-to-right_violation_in_python3_re_match.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com