[Python-Dev] Re: string find(substring) vs. substring in string

Fredrik Lundh Wed, 16 Feb 2005 13:52:26 -0800

Guido van Rossum wrote:

> Which is exactly how s.find() wins this race. (I guess it loses when
> it's found by having to do the "find" lookup.) Maybe string_contains
> should just call string_find_internal()?


I somehow suspected that "in" did some extra work in case the "find"
failed; guess I should have looked at the code instead...  I didn't really
expect anyone to use a bad implementation of a brute-force algorithm
(O(nm)) when the library already contained a reasonably good version
of the same algorithm.

> And then there's the question of how the re module gets to be faster
> still; I suppose it doesn't bother with memcmp() at all.

the benchmark cheats (a bit) -- it builds a state machine (KMP-style) in
"compile", and uses that to search in O(n) time.

that approach won't fly for "in" and find, of course, but it's definitely 
possible
to make them run a lot faster than RE (i.e. O(n/m) for most cases)...

but refactoring the contains code to use find_internal sounds like a good
first step.  any takers?

</F> 



_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Re: string find(substring) vs. substring in string

Reply via email to