<gburde...@gmail.com> wrote in message news:f98a6057-c35f-4843-9efb-7f36b05b6...@g19g2000yqo.googlegroups.com...
If I do this:

import re
a=re.search(r'hello.*?money',  'hello how are you hello funny money')

I would expect a.group(0) to be "hello funny money", since .*? is a
non-greedy match. But instead, I get the whole sentence, "hello how
are you hello funny money".

Is this expected behavior? How can I specify the correct regexp so
that I get "hello funny money" ?

A non-greedy match matches the fewest characters before matching the text *after* the non-greedy match. For example:

import re
a=re.search(r'hello.*?money','hello how are you hello funny money and more money')
a.group(0)  # non-greedy stops at the first money
'hello how are you hello funny money'
a=re.search(r'hello.*money','hello how are you hello funny money and more money')
a.group(0)  # greedy keeps going to the last money
'hello how are you hello funny money and more money'

This is why it is difficult to use regular expressions to match nested objects like parentheses or XML tags. In your case you'll need something extra to not match the first hello.

a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny money')
a.group(0)
'hello funny money'

-Mark

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to