On 09/01/2011 16:49, Tom Anderson wrote:
Hello everyone, long time no see,

This is probably not a Python problem, but rather a regular expressions problem.

I want, for the sake of arguments, to match strings comprising any number of occurrences of 'spa', each interspersed by any number of occurrences of the 'm'. 'any number' includes zero, so the whole pattern should match the empty string.

Here's the conversation Python and i had about it:

Python 2.6.4 (r264:75706, Jun  4 2010, 18:20:16)
[GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import re
re.compile("(spa|m*)*")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

What's going on here? Why is there nothing to repeat? Is the problem having one *'d term inside another?

Now, i could actually rewrite this particular pattern as '(spa|m)*'. But what i neglected to mention above is that i'm actually generating patterns from structures of objects (representations of XML DTDs, as it happens), and as it stands, patterns like this are a possibility.

Any thoughts on what i should do? Do i have to bite the bullet and apply some cleverness in my pattern generation to avoid situations like this?

Thanks,
tom

I think you want to anchor your list, or anything will match. Perhaps

re.compile('/^(spa(m)+)*$/')

is what you need.

Regards

Ian
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to