[issue17668] re.split loses characters matching ungrouped parts of a pattern

R. David Murray Mon, 08 Apr 2013 23:25:08 -0700

R. David Murray added the comment:

Only group the stuff you want to see in the result:


>>> re.split(r'(^>.*$)', '>Homo sapiens catenin (cadherin-associated)')
['', '>Homo sapiens catenin (cadherin-associated)', '']

>>> re.split(r'^(>.*)$', '>Homo sapiens catenin (cadherin-associated)')
['', '>Homo sapiens catenin (cadherin-associated)', '']

If you are using grouping to get alternatives, you can use a non-capturing 
group:

>>> re.split(r'(ca(?:t|d))', '>Homo sapiens catenin (cadherin-associated)')
['>Homo sapiens ', 'cat', 'enin (', 'cad', 'herin-associated)']

(By the way, I'm a bit confused as to what exactly you are splitting in your 
original example, since you seem to be matching the whole string, and only if 
it is the whole string.  On the other hand, regular expressions regularly 
confuse me... :)

I indeed do not think it is worth complicating the interface to handle the 
unusual case of accepting and applying unknown regexes.  The one change I could 
see as a possibility would be to allow all of the groups matched by the split 
regex to appear as a single sublist.  But I'm not the maintainer of this module 
either :)

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue17668>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17668] re.split loses characters matching ungrouped parts of a pattern

Reply via email to