Dennis Sweeney <[email protected]> added the comment:
I agree that skip could could do 1 better.
---
> I don't know whether it "should be" applied
I don't think I'm convinced: the second check fixes only the very specific case
when s[len(p):].startswith(p).
Perturbations of reproducer.py are still very slow with the patch:
- pattern2 = b"B" * BIG
+ pattern2 = b"B" * (BIG + 10)
In fact, it's even slower than before due to the double-checking.
---
I'd be curious how something like Crochemore and Perrin's "two-way" algorithm
would do (constant space, compares < 2n-m):
http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm
Apparently it's been used as glibc's memmem() since 2008. There, it
additionally computes a table, but only for long needles where it really pays
off:
https://code.woboq.org/userspace/glibc/string/str-two-way.h.html
https://code.woboq.org/userspace/glibc/string/memmem.c.html
Although there certainly seems to be a complexity rabbithole here that would be
easy to over-do.
I might look more into this.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com