Lex,
I notice you have posted an amendment to re.findall(). I was just
about to do the same.
I agree that groups need to be taken into account but I think that your
code will fail if the regular expression doesn't define any groups. I
have just tested it on an example from the Python 2.6.4 docs for the re
module.
text = "He was carefully disguised but captured quickly by police."
patt2= re.compile(r"\w+ly")
print patt2.findall(text) # test with Python
output is: ['carefully', 'quickly']
print patt2.lex_findall(text) # test with your code
output is: [(), ()]
The version I was going to post is:
def new_findall(self, string, pos=0, endpos=None):
# Return a list of all non-overlapping matches of pattern in string.
if not endpos is None:
string = string[:endpos]
all = []
while True:
m = self.search(string, pos)
if m is None:
break
span = m.span()
if m.groups() is None:
all.append(string[span[0]:span[1]])
else:
all.append(tuple([group or '' for group in m.groups()]))
pos = span[1]
return all
This combines the old code, which worked OK when there were no groups,
with new
code to handle groups.
The line in my code which outputs the group data is slightly different
from yours
for a reason.
Taking the regular expression used in the string.Template class:
\$(?:(\$)|([_a-z][_a-z0-9]*)|{([_a-z][_a-z0-9]*)}|())
and a string with placeholders:
Here is some $text which contains ${some} placeholders
Your amended code produces: [(None, 'text', None, None), (None, None,
'some', None)]
but Python re.findall() produces: [('', 'text', '', ''), ('', '',
'some', '')]
To convert all the None items to empty strings, I have used
the expression:
tuple([group or '' for group in m.groups()])
Regards,
Phil