New submission from Tristan <[email protected]>:
>From Python 3.7, sre_parse.parse() do not create SubPattern instances that can
>be used to back reproduce original expression if containing non-capturing
>groups.
In Python 3.6:
>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
SUBPATTERN None 0 0
BRANCH
LITERAL 102
LITERAL 111
LITERAL 111
LITERAL 32
SUBPATTERN None 0 0
LITERAL 98
LITERAL 97
LITERAL 114
LITERAL 32
OR
LITERAL 32
SUBPATTERN None 0 0
LITERAL 98
LITERAL 97
LITERAL 122
In Python 3.7 and beyond:
>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
BRANCH
LITERAL 102
LITERAL 111
LITERAL 111
LITERAL 32
LITERAL 98
LITERAL 97
LITERAL 114
LITERAL 32
OR
LITERAL 32
LITERAL 98
LITERAL 97
LITERAL 122
This behaviour is making it impossible to write a correct colorizer for regular
expressions using the sre_parse module from Python 3.7. I'm not a regex expert,
so I cannot say wether this change has any effect on the matching itself, but
if I trust regex101, it will add a capturing group in the place of the
non-capturing group.
----------
components: Regular Expressions
messages: 405327
nosy: ezio.melotti, mrabarnett, tristanlatr
priority: normal
severity: normal
status: open
title: From Python 3.7, sre_parse.parse() do not create SubPattern instances
that can be used to back reproduce original expression if containing
non-capturing groups
type: behavior
versions: Python 3.7
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45674>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com