On Sat, Oct 06, 2018 at 02:00:27PM -0700, Nathaniel Smith wrote: > Fortunately, there's an elegant and natural solution: Just save the > regex engine's internal state when it hits the end of the string, and > then when more data arrives, use the saved state to pick up the search > where we left off. Theoretically, any regex engine *could* support > this – it's especially obvious for DFA-based matchers, but even > backtrackers like Python's re could support it, basically by making > the matching engine a coroutine that can suspend itself when it hits > the end of the input, then resume it when new input arrives. Like, if > you asked Knuth for the theoretically optimal design for this parser, > I'm pretty sure this is what he'd tell you to use, and it's what > people do when writing high-performance HTTP parsers in C.
The message I take from this is: - regex engines certainly can be written to support streaming data; - but few of them are; - and it is exceedingly unlikely to be able to easily (or at all) retro-fit that support to Python's existing re module. Perhaps the solution is a lightweight streaming DFA regex parser? Does anyone know whether MRAB's regex library supports this? https://pypi.org/project/regex/ > you can't write efficient > character-by-character algorithms in Python I'm sure that Python will never be as efficient as C in that regard (although PyPy might argue the point) but is there something we can do to ameliorate this? If we could make char-by-char processing only 10 times less efficient than C instead of 100 times (let's say...) perhaps that would help Ram (and you?) with your use-cases? -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/