I'm having trouble getting re to stop matching after it's consumed what I want it to. Using this string as an example, the goal is to match "CAPS":

>>> s = "only the word in CAPS should be matched"

So let's say I want to specify when to begin my pattern by using a lookbehind:

>>> x = re.compile(r"(?<=\bin)") #this will simply match the spot in front of "in"

So that's straight forward, but let's say I don't want to use a lookahead to specify the end of my pattern, I simply want it to stop after it has combed over the word following "in". I would expect this to work, but it doesn't:

>>> x=re.compile(r"(?<=\bin).+\b") #this will consume everything past "in" all the way to the end of the string

In the above example I would think that the word boundary flag "\b" would indicate a stopping point. Is ".+\b" not saying, "keep matching characters until a word boundary has been reached"?

Even stranger are the results I get from:

>>> x=re.compile(r"(?<=\bin).+\s") #keep matching characters until a whitespace has been reached(?)
>>> r = x.sub("[EMAIL PROTECTED]", s)
>>> print r
only the word [EMAIL PROTECTED]

For some reason there it's decided to consume three words instead of one.

My question is simply this: after specifying a start point, how do I make a match stop after it has found one word, and one word only? As always, all help is appreciated.
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to