Hello everybody,

I've got a suggestion for the std. re module developers: to consider allowing 
match group name redefinitions, especially in alternatives.
While you may not see the point at first glance, let me try to reason such a 
thing using a real-world example from my practice:

Imagine a company, using certain codes for their products/product components 
(unfortunately, I'm not at liberty to disclose the true
nature of them, but bare with me).
Let's say that they may have the following forms:
r"(?P<type>AB|C|D)[- ](?P<prefix>[A-Z])?(?P<number>\d+)(?P<postfix>[A-Z])?"

So far so good. But now, imagine that a particular type of code has a bit 
different syntax:
r"(?P<type>E)[- ](?P<prefix>[A-Za-z])?(?P<number>\d+)[- ](?P<postfix>[A-Za-z])?"

As you can see, the prefix & postfix may be lowercase in case of code type E 
and moreover, a space or dash is required before the postfix.
If I merged the definitions, I'd have to allow that syntax even for the AB, C 
and D code types---but that would've been incorrect and would
require post-matching checks.

Ideally I'd like to have the opportunity to define the regex as an alternative:
r"(?P<type>AB|C|D)[- 
](?P<prefix>[A-Z])?(?P<number>\d+)(?P<postfix>[A-Z])?|(?P<type>E)[- 
](?P<prefix>[A-Za-z])?(?P<number>\d+)[- ](?P<postfix>[A-Za-z])?"

I can't, of course, getting the re.error: redefinition of group name error upon 
the regex compilation.

But is that really a problem, especially in such alternatives?
If you imagine the regex as a FSA, the code type branches into completely 
independent sub-trees of the automaton state transitions.
There's no problem with efficiency; the regex might look a bit complex, but the 
matching is perfectly efficient---definitely more so than if I match multiple 
expressions.
The redefinition of the match group names is IMO technically perfectly possible 
and note that in such alternatives, re-assignments won't really happen.
And finally, even if they would happen, what's the problem with that? Might be 
a logical error in the regex definition of course, but that's the programmer's 
lookout in general...

So what do you think?
If the match group name redefinition was allowed, I could just match a single 
regex, getting match group dict and read out parsed parts of the codes by 
name---nice and easy.
Currently, my 2 choices are:
1/ Use uniquely named groups, which requires me to do a post-match group name 
consolidation of sort or
2/ Match multiple reg. expressions, which is unnecessary

Therefore, I ask you to reconsider issuing the error, which I deem redundant 
and unnecessarily limiting a justified use-case, IMO.
Also note that doing that won't break any old code---anything that worked 
before will continue to work with unchanged semantics; so such a change would 
be perfectly safe.

Thanks,

Best Regards

vasek
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K2FXXQ2XG75FPDIJIDP4HHXXKCMYRP4I/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to