<gburde...@gmail.com> wrote in message
news:f98a6057-c35f-4843-9efb-7f36b05b6...@g19g2000yqo.googlegroups.com...
If I do this:
import re
a=re.search(r'hello.*?money', 'hello how are you hello funny money')
I would expect a.group(0) to be "hello funny money", since .*? is a
non-greedy match. But instead, I get the whole sentence, "hello how
are you hello funny money".
Is this expected behavior? How can I specify the correct regexp so
that I get "hello funny money" ?
A non-greedy match matches the fewest characters before matching the text
*after* the non-greedy match. For example:
import re
a=re.search(r'hello.*?money','hello how are you hello funny money and
more money')
a.group(0) # non-greedy stops at the first money
'hello how are you hello funny money'
a=re.search(r'hello.*money','hello how are you hello funny money and
more money')
a.group(0) # greedy keeps going to the last money
'hello how are you hello funny money and more money'
This is why it is difficult to use regular expressions to match nested
objects like parentheses or XML tags. In your case you'll need something
extra to not match the first hello.
a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny
money')
a.group(0)
'hello funny money'
-Mark
--
http://mail.python.org/mailman/listinfo/python-list