New submission from wesley chun <[email protected]>: In the re docs, it states the following for the conditional regular expression syntax:
(?(id/name)yes-pattern|no-pattern) Will try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn’t. no-pattern is optional and can be omitted. For example, (<)?(\w+@\w+(?:\.\w+)+)(?(1)>) is a poor email matching pattern, which will match with '<[email protected]>' as well as '[email protected]', but not with '<[email protected]'. this regex is incomplete as it allows for '[email protected]>': >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<[email protected]>')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '[email protected]')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<[email protected]')) False >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '[email protected]>')) True This error has existed since this feature was added in 2.4... http://docs.python.org/release/2.4.4/lib/re-syntax.html ... through the 3.3. docs... http://docs.python.org/dev/py3k/library/re.html#regular-expression-syntax The fix is to add the end char '$' to the regex to get all 4 working: >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<[email protected]>')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '[email protected]')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<[email protected]')) False >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '[email protected]>')) False If accepted, I propose this patch (also attached): $ svn diff re.rst Index: re.rst =================================================================== --- re.rst (revision 88499) +++ re.rst (working copy) @@ -297,9 +297,9 @@ ``(?(id/name)yes-pattern|no-pattern)`` Will try to match with ``yes-pattern`` if the group with given *id* or *name* exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is optional and - can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)`` is a poor email + can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)`` is a poor email matching pattern, which will match with ``'<[email protected]>'`` as well as - ``'[email protected]'``, but not with ``'<[email protected]'``. + ``'[email protected]'``, but not with ``'<[email protected]'`` nor ``'[email protected]>'`` . ---------- assignee: docs@python components: Documentation, Regular Expressions files: re.rst messages: 129041 nosy: docs@python, wesley.chun priority: normal severity: normal status: open title: incorrect pattern in the re module docs for conditional regex versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 Added file: http://bugs.python.org/file20833/re.rst _______________________________________ Python tracker <[email protected]> <http://bugs.python.org/issue11283> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
