Commit b7cbeb... findall() should return groups() not plain string

Phil Charlesworth Thu, 15 Mar 2012 08:18:54 -0700

Lex,
     I notice you have posted an amendment to re.findall(). I was just 
about to do the same.


I agree that groups need to be taken into account but I think that your 
code will fail if the regular expression doesn't define any groups. I 
have just tested it on an example from the Python 2.6.4 docs for the re 
module.

text = "He was carefully disguised but captured quickly by police."
patt2= re.compile(r"\w+ly")

print patt2.findall(text)    # test with Python
output is:  ['carefully', 'quickly']

print patt2.lex_findall(text) # test with your code
output is: [(), ()]

The version I was going to post is:

def new_findall(self, string, pos=0, endpos=None):
     # Return a list of all non-overlapping matches of pattern in string.
     if not endpos is None:
         string = string[:endpos]
     all = []
     while True:
         m = self.search(string, pos)
         if m is None:
             break
         span = m.span()
         if m.groups() is None:
             all.append(string[span[0]:span[1]])
         else:
             all.append(tuple([group or '' for group in m.groups()]))
         pos = span[1]
     return all

This combines the old code, which worked OK when there were no groups, 
with new
code to handle groups.

The line in my code which outputs the group data is slightly different 
from yours
for a reason.

Taking the regular expression used in the string.Template class:
\$(?:(\$)|([_a-z][_a-z0-9]*)|{([_a-z][_a-z0-9]*)}|())

and a string with placeholders:
Here is some $text which contains ${some} placeholders

Your amended code produces: [(None, 'text', None, None), (None, None, 
'some', None)]
but Python re.findall() produces: [('', 'text', '', ''), ('', '', 
'some', '')]

To convert all the None items to empty strings, I have used
the expression:
tuple([group or '' for group in m.groups()])

Regards,
Phil

Commit b7cbeb... findall() should return groups() not plain string

Reply via email to