On Feb 8, 2018 13:06, "Serhiy Storchaka" <storch...@gmail.com> wrote:

08.02.18 12:45, Franklin? Lee пише:

> Could it be that re uses an optimization that can also be used in str?
> CPython uses a modified Boyer-Moore for str.find:
> https://github.com/python/cpython/blob/master/Objects/string
> lib/fastsearch.h
> http://effbot.org/zone/stringlib.htm
> Maybe there's a minimum length after which it's better to precompute a
> table.
>

Yes, there is a special optimization in re here. It isn't free, you need to
spend some time for preparing it. You need a special object that keeps an
optimized representation for faster search. This makes it very unlikely be
used in str, because you need either spend the time for compilation on
every search, or use some kind of caching, which is not free too, adds
complexity and increases memory consumption. Note also in case of re the
compiler is implemented in Python. This reduces the complexity.


The performance of the one-needle case isn't really relevant, though, is
it? This idea is for the multi-needle case, and my tests showed that re
performs even worse than a loop of `.find`s. How do re and .find scale with
both number and lengths of needles on your machine?
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to