A.M. Kuchling wrote: >> time. But I would also hope that it would be smart enough to know that it >> doesn't need to look past the 2nd character in 'not the xyz' when it is >> searching for 'not there' (due to the lengths of the sequences). > > Assuming stringobject.c:string_contains is the right function, the > code looks like this: > > size = PyString_GET_SIZE(el); > rhs = PyString_AS_STRING(el); > lhs = PyString_AS_STRING(a); > > /* optimize for a single character */ > if (size == 1) > return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; > > end = lhs + (PyString_GET_SIZE(a) - size); > while (lhs <= end) { > if (memcmp(lhs++, rhs, size) == 0) > return 1; > } > > So it's doing a zillion memcmp()s. I don't think there's a more > efficient way to do this with ANSI C; memmem() is a GNU extension that > searches for blocks of memory.
oops. so whoever implemented contains didn't even bother to look at the find implementation... (which uses the same brute-force algorithm, but a better implementation...) > Perhaps saving some memcmps by writing > > if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) > > would help. memcmp still compiles to REP CMPB on many x86 compilers, and the setup overhead for memcmp sucks on modern x86 hardware; it's usually better to write your own bytewise comparision... (and the fact that we're still brute-force search algorithms in "find" is a bit embarrassing -- note that RE outperforms "in" by a factor of five.... guess it's time to finish the split/replace parts of stringlib and produce a patch... ;-) </F> _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com