> On 2 Oct 2021, at 10:27, ven...@razdva.cz wrote:
>
> Hello everybody,
>
> I've got a suggestion for the std. re module developers: to consider allowing
> match group name redefinitions, especially in alternatives.
> While you may not see the point at first glance, let me try to reason such a
> thing using a real-world example from my practice:
>
> Imagine a company, using certain codes for their products/product components
> (unfortunately, I'm not at liberty to disclose the true
> nature of them, but bare with me).
> Let's say that they may have the following forms:
> r"(?P<type>AB|C|D)[- ](?P<prefix>[A-Z])?(?P<number>\d+)(?P<postfix>[A-Z])?"
>
> So far so good. But now, imagine that a particular type of code has a bit
> different syntax:
> r"(?P<type>E)[- ](?P<prefix>[A-Za-z])?(?P<number>\d+)[-
> ](?P<postfix>[A-Za-z])?"
>
> As you can see, the prefix & postfix may be lowercase in case of code type E
> and moreover, a space or dash is required before the postfix.
> If I merged the definitions, I'd have to allow that syntax even for the AB, C
> and D code types---but that would've been incorrect and would
> require post-matching checks.
>
> Ideally I'd like to have the opportunity to define the regex as an
> alternative:
> r"(?P<type>AB|C|D)[-
> ](?P<prefix>[A-Z])?(?P<number>\d+)(?P<postfix>[A-Z])?|(?P<type>E)[-
> ](?P<prefix>[A-Za-z])?(?P<number>\d+)[- ](?P<postfix>[A-Za-z])?"
>
> I can't, of course, getting the re.error: redefinition of group name error
> upon the regex compilation.
>
> But is that really a problem, especially in such alternatives?
> If you imagine the regex as a FSA, the code type branches into completely
> independent sub-trees of the automaton state transitions.
> There's no problem with efficiency; the regex might look a bit complex, but
> the matching is perfectly efficient---definitely more so than if I match
> multiple expressions.
> The redefinition of the match group names is IMO technically perfectly
> possible and note that in such alternatives, re-assignments won't really
> happen.
> And finally, even if they would happen, what's the problem with that? Might
> be a logical error in the regex definition of course, but that's the
> programmer's lookout in general...
>
> So what do you think?
> If the match group name redefinition was allowed, I could just match a single
> regex, getting match group dict and read out parsed parts of the codes by
> name---nice and easy.
> Currently, my 2 choices are:
> 1/ Use uniquely named groups, which requires me to do a post-match group name
> consolidation of sort or
> 2/ Match multiple reg. expressions, which is unnecessary
>
> Therefore, I ask you to reconsider issuing the error, which I deem redundant
> and unnecessarily limiting a justified use-case, IMO.
> Also note that doing that won't break any old code---anything that worked
> before will continue to work with unchanged semantics; so such a change would
> be perfectly safe.
Faced with this problem I would write a parser for the product codes that
understands the syntax and break it into pieces that make sense.
I would not use regex in the parser.
Barry
>
> Thanks,
>
> Best Regards
>
> vasek
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/K2FXXQ2XG75FPDIJIDP4HHXXKCMYRP4I/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/HTDGZ6HW2Z32ZNBLIEKIOLJEIA4I3WR5/
Code of Conduct: http://python.org/psf/codeofconduct/