Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:
Below is one of the tests that got run when I happened to import something, and I thought it was a good illustration of the Boyer-Moore bad character shift table. It's worth noting in particular that the table is the dominant force for speed in some common cases, with the two-way stuff only ever being checked once in this example. The shift table can be defeated with pathological strings, and that's where the two-way stuff begins to shine. Checking " 32 bit (ARM)" in "3.10.0a1+ (heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]". ======================== Two-way with needle=" 32 bit (ARM)" and haystack="(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]" Split " 32 bit (ARM)" into " 32 bit" and " (ARM)". needle is NOT completely periodic. Using period 8. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" Last character not found in string. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" Table says shift by 11. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" # Made the '3's line up Table says shift by 5. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" # Here made the spaces line up Last character not found in string. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" # Made the spaces > line up Checking the right half. No match. Jump forward without checking left half. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" # Made the spaces > line up Table says shift by 5. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" # Made the > spaces line up Table says shift by 5. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit (ARM)" # Made > the spaces line up Table says shift by 10. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit > (ARM)" # Made the spaces line up Table says shift by 4. > "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit > (Intel)]" > " 32 bit > (ARM)" # Made the lparens line up Last character not found in string. Reached end. Returning -1. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41972> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com