On 09/01/2011 16:49, Tom Anderson wrote:
Hello everyone, long time no see,
This is probably not a Python problem, but rather a regular
expressions problem.
I want, for the sake of arguments, to match strings comprising any
number of occurrences of 'spa', each interspersed by any number of
occurrences of the 'm'. 'any number' includes zero, so the whole
pattern should match the empty string.
Here's the conversation Python and i had about it:
Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:16)
[GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import re
re.compile("(spa|m*)*")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.6/re.py", line 245, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
What's going on here? Why is there nothing to repeat? Is the problem
having one *'d term inside another?
Now, i could actually rewrite this particular pattern as '(spa|m)*'.
But what i neglected to mention above is that i'm actually generating
patterns from structures of objects (representations of XML DTDs, as
it happens), and as it stands, patterns like this are a possibility.
Any thoughts on what i should do? Do i have to bite the bullet and
apply some cleverness in my pattern generation to avoid situations
like this?
Thanks,
tom
I think you want to anchor your list, or anything will match. Perhaps
re.compile('/^(spa(m)+)*$/')
is what you need.
Regards
Ian
--
http://mail.python.org/mailman/listinfo/python-list