In 8-Aug-07, at 12:47 PM, Nick Maclaren wrote: > >>> The other approach, which is to stick to true regular expressions, >>> and wholly or partially convert to DFAs, has already been rendered >>> impossible by even the limited Perl/PCRE extensions that Python >>> has adopted. >> >> Impossible? Surely, a sufficiently-competent re engine could detect >> when a DFA is possible to construct? > > I doubt it. While it isn't equivalent to the halting problem, it IS > an intractable one! There are two problems: > > Firstly, things like backreferences are an absolute no-no. They > are not regular, and REs with them in cannot be converted to DFAs. > That could be 'solved' by a parser that kicked out such constructions, > but it would get screams from many users. > > Secondly, anything involving explicit or implicit negation can lead > to (if I recall) a super-exponential explosion in the size of the > DFA. That could be 'solved' by imposing a limit, but few people > would be able to predict when it would bite.
Right. The analysis I envisioned would be more along the lines of "if troublesome RE extensions are used, do not attempt to construct a DFA". It could even be exposed via an alternate api (re.compile_dfa ()) that admitted a subset of the usual grammar. > Thirdly, I would require notice of the question of whether capturing > parentheses could be supported, and what semantic changes would be > to which were set and how. Capturing groups are rather integral to the python regex api and, as you say, a major difficulty for DFA-based implementations. Sounds like a task best left to a thirdparty package. -Mike _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com