Changes by Rick Otten rottenwindf...@gmail.com:
--
components: Regular Expressions
nosy: Rick Otten, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: regex | behavior differs from documentation
type: behavior
versions: Python 2.7
Mark Shannon added the comment:
This looks like the expected behaviour to me.
re.sub matches the leftmost occurence and the regular expression is greedy so
(x|xy) will always match xy if it can.
--
nosy: +Mark.Shannon
___
Python tracker
Rick Otten added the comment:
Can the documentation be updated to make this more clear?
I see now where the clause As the target string is scanned, ... is describing
what you have listed here.
I and a coworker both read the description several times and missed that. I
thought it first tried
Matthew Barnett added the comment:
@Mark is correct, it's not a bug.
In the first example:
It tries to match each alternative at position 0. Failure.
It tries to match each alternative at position 1. Failure.
It tries to match each alternative at position 2. Failure.
It tries to match each
New submission from Rick Otten:
The documentation states that | parsing goes from left to right. This
doesn't seem to be true when spaces are involved. (or \s).
Example:
In [40]: mystring
Out[40]: 'rwo incorporated'
In [41]: re.sub('incorporated| inc|llc|corporation|corp| co', '',