New submission from ipolcak: The text about non-greedy match in the documentation for re library is misleading.
The docs for py2.7 (https://docs.python.org/2.7/library/re.html) and 3.6 (https://docs.python.org/3.6/library/re.html) says: "The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against <a> b <c>, it will match the entire string, and not just <a>. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using the RE <.*?> will match only <a>." The docs for py3.4 (https://docs.python.org/3.4/library/re.html) offers a little bit different example: "The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'." However, in reality if the non-greedy match is not successful, it might fallback to the greedy match, see: >>> import re >>> a = re.compile(r"<.*?><span>") >>> a.match("<a> b <c><span>") <_sre.SRE_Match object; span=(0, 15), match='<a> b <c><span>'> >>> a.search("<a> b <c><span>") <_sre.SRE_Match object; span=(0, 15), match='<a> b <c><span>'> So the '<.*?>' part of the regex matches '<a> b <c>' in this example. I propose to add to the documentation the following text: "However, note that even the non-greedy version can match additional text, for example consider the RE '(<.*>)<d>' to be matched against '<a> b <c><d>'. The match is successful and the unnamed group contains '<a> b <c>'." ---------- assignee: docs@python components: Documentation messages: 285619 nosy: docs@python, ipolcak priority: normal severity: normal status: open title: Misleading text in the documentation of re library for non-greedy match type: behavior versions: Python 2.7, Python 3.4, Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29291> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com