New submission from Brandon Rhodes: Regular expression re.MatchObject objects are sequences. They contain at least one “group” string, possibly more, which are integer-indexed starting at zero. Today, groups can be accessed in one of two ways.
(1) You can call the method match.group(N). (2) You can call glist = match.groups() and then access each group as glist[N-1]. Note the obvious off-by-one error: .groups() does not include “group zero”, which contains the entire match, and therefore its indexes are off-by-one from the values you would pass to .group(). I propose that MatchObject gain a __getitem__(N) method whose return value for every N is the same as .group(N) as I think that match[N] is a quite obvious syntax for asking for one particular group of an RE match. The only objection I can see to this proposal is the obvious asymmetry between Group Zero and all subsequent groups of a regular expression pattern: zero means “the whole thing” whereas each of the others holds the content of a particular explicit set of parens. Looping over the elements match[0], match[1], ... of a pattern like this: r'(\d\d\d\d)/(\d\d)/(\d\d)' will give you *first* the *entire* match, and only then turn its attention to the three parenthesized substrings. My retort is that concentric groups can happen anyway: that Group Zero, holding the entire match, is not really as special as the newcomer might suspect, because you can always wind up with groups inside of other groups; it is simply part of the semantics of regular expressions that groups might overlap or might contain one another, as in: r'((\d\d)/(\d\d)) Description: (.*)' Here, we see that concentricity is not a special property of Group Zero, but in fact something that can happen quite naturally with other groups. The caller simply needs to imagine every regular expression being surrounded by an “automatic set of parentheses” to understand where Group Zero comes from, and how it will be ordered in the resulting sequence of groups relative to the subordinate groups within the string. If one or two people voice agreement here in this issue, I will be very happy to offer a patch. ---------- components: Regular Expressions messages: 202480 nosy: brandon-rhodes, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: MatchObject should offer __getitem__() type: enhancement versions: Python 3.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19536> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com