I'm having some strange results using the "or" operator. In every test I do I'm matching both sides of the "|" metacharacter, not one or the other as all documentation says it should be (the parser supposedly scans left to right, using the first match it finds and ignoring the rest). It should only go beyond the "|" if there was no match found before it, no?

Correct me if I'm wrong, but your regex is saying "match dog, unless it's followed by cat. if it is followed by cat there is no match on this side of the "|" at which point we advance past it and look at the alternative expression which says to match in front of cat."

However, if I run a .sub using your regex on a string contain both dog and cat, both will be replaced.

A simple example will show what I mean:

>>> import re
>>> x = re.compile(r"(A) | (B)")
>>> s = "X R A Y B E"
>>> r = x.sub("13", s)
>>> print r
X R 13Y13 E

...so unless I'm understanding it wrong, "B" is supposed to be ignored if "A" is matched, yet I get both matched. I get the same result if I put "A" and "B" within the same group.


On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:




Regular expressions are a little evil at times; here's what I think you're
thinking of:


###
import re
pattern = re.compile(r"""dog(?!cat)
... | (?<=dogcat)""", re.VERBOSE)
pattern.match('dogman').start()
0
pattern.search('dogcatcher').start()



Hi Mike,

Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually does
come up with a result:

###
pattern.search('dogcatcher').start()
6
###

Sorry about that!


_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor

Reply via email to