> On 2 Oct 2021, at 10:27, ven...@razdva.cz wrote:
> 
> Hello everybody,
> 
> I've got a suggestion for the std. re module developers: to consider allowing 
> match group name redefinitions, especially in alternatives.
> While you may not see the point at first glance, let me try to reason such a 
> thing using a real-world example from my practice:
> 
> Imagine a company, using certain codes for their products/product components 
> (unfortunately, I'm not at liberty to disclose the true
> nature of them, but bare with me).
> Let's say that they may have the following forms:
> r"(?P<type>AB|C|D)[- ](?P<prefix>[A-Z])?(?P<number>\d+)(?P<postfix>[A-Z])?"
> 
> So far so good. But now, imagine that a particular type of code has a bit 
> different syntax:
> r"(?P<type>E)[- ](?P<prefix>[A-Za-z])?(?P<number>\d+)[- 
> ](?P<postfix>[A-Za-z])?"
> 
> As you can see, the prefix & postfix may be lowercase in case of code type E 
> and moreover, a space or dash is required before the postfix.
> If I merged the definitions, I'd have to allow that syntax even for the AB, C 
> and D code types---but that would've been incorrect and would
> require post-matching checks.
> 
> Ideally I'd like to have the opportunity to define the regex as an 
> alternative:
> r"(?P<type>AB|C|D)[- 
> ](?P<prefix>[A-Z])?(?P<number>\d+)(?P<postfix>[A-Z])?|(?P<type>E)[- 
> ](?P<prefix>[A-Za-z])?(?P<number>\d+)[- ](?P<postfix>[A-Za-z])?"
> 
> I can't, of course, getting the re.error: redefinition of group name error 
> upon the regex compilation.
> 
> But is that really a problem, especially in such alternatives?
> If you imagine the regex as a FSA, the code type branches into completely 
> independent sub-trees of the automaton state transitions.
> There's no problem with efficiency; the regex might look a bit complex, but 
> the matching is perfectly efficient---definitely more so than if I match 
> multiple expressions.
> The redefinition of the match group names is IMO technically perfectly 
> possible and note that in such alternatives, re-assignments won't really 
> happen.
> And finally, even if they would happen, what's the problem with that? Might 
> be a logical error in the regex definition of course, but that's the 
> programmer's lookout in general...
> 
> So what do you think?
> If the match group name redefinition was allowed, I could just match a single 
> regex, getting match group dict and read out parsed parts of the codes by 
> name---nice and easy.
> Currently, my 2 choices are:
> 1/ Use uniquely named groups, which requires me to do a post-match group name 
> consolidation of sort or
> 2/ Match multiple reg. expressions, which is unnecessary
> 
> Therefore, I ask you to reconsider issuing the error, which I deem redundant 
> and unnecessarily limiting a justified use-case, IMO.
> Also note that doing that won't break any old code---anything that worked 
> before will continue to work with unchanged semantics; so such a change would 
> be perfectly safe.

Faced with this problem I would write a parser for the product codes that 
understands the syntax and break it into pieces that make sense.
I would not use regex in the parser.

Barry



> 
> Thanks,
> 
> Best Regards
> 
> vasek
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/K2FXXQ2XG75FPDIJIDP4HHXXKCMYRP4I/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HTDGZ6HW2Z32ZNBLIEKIOLJEIA4I3WR5/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to