> Assuming stringobject.c:string_contains is the right function, the
> code looks like this:
>
> size = PyString_GET_SIZE(el);
> rhs = PyString_AS_STRING(el);
> lhs = PyString_AS_STRING(a);
>
> /* optimize for a single character */
> if (size == 1)
> return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL;
>
> end = lhs + (PyString_GET_SIZE(a) - size);
> while (lhs <= end) {
> if (memcmp(lhs++, rhs, size) == 0)
> return 1;
> }
>
> So it's doing a zillion memcmp()s. I don't think there's a more
> efficient way to do this with ANSI C; memmem() is a GNU extension that
> searches for blocks of memory. Perhaps saving some memcmps by writing
>
> if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0)
>
> would help.
Which is exactly how s.find() wins this race. (I guess it loses when
it's found by having to do the "find" lookup.) Maybe string_contains
should just call string_find_internal()?
And then there's the question of how the re module gets to be faster
still; I suppose it doesn't bother with memcmp() at all.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com