Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:

Below is one of the tests that got run when I happened to import something, and 
I thought it was a good illustration of the Boyer-Moore bad character shift 
table.

It's worth noting in particular that the table is the dominant force for speed 
in some common cases, with the two-way stuff only ever being checked once in 
this example. The shift table can be defeated with pathological strings, and 
that's where the two-way stuff begins to shine.

Checking " 32 bit (ARM)" in "3.10.0a1+ (heads/two-way-dirty:cf4e398e94, Oct 18 
2020, 20:09:21) [MSC v.1927 32 bit (Intel)]".
========================
Two-way with needle=" 32 bit (ARM)" and 
haystack="(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 
32 bit (Intel)]"
Split " 32 bit (ARM)" into " 32 bit" and " (ARM)".
needle is NOT completely periodic.
Using period 8.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
> " 32 bit (ARM)"
Last character not found in string.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>              " 32 bit (ARM)"
Table says shift by 11.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                         " 32 bit (ARM)" # Made the '3's line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                              " 32 bit (ARM)" # Here made the spaces line up
Last character not found in string.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                                           " 32 bit (ARM)" # Made the spaces 
> line up
Checking the right half.
No match.
Jump forward without checking left half.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                                            " 32 bit (ARM)" # Made the spaces 
> line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                                                 " 32 bit (ARM)" # Made the 
> spaces line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                                                      " 32 bit (ARM)" # Made 
> the spaces line up
Table says shift by 10.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                                                                " 32 bit 
> (ARM)" # Made the spaces line up
Table says shift by 4.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>                                                                    " 32 bit 
> (ARM)" # Made the lparens line up
Last character not found in string.
Reached end. Returning -1.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to