Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:

> Dennis, I think that's expected, right? Two-way on its own can exploit 
> nothing about individual characters - it only preprocesses the needle to 
> break the possibility for quadratic-time behavior due to periods in the 
> needle.

Yes, that's correct.

> It sounds like you switched the current code's Sunday-ish approach to a more 
> Boyer-Moore-ish approach. That probably hurt, and especially for shorter 
> needles (where Sunday can yield a maximum shift of len(needle)+1, not 
> len(needle) - the "+1" is more significant the smaller len(needle)).

I'm definitely open to switching things up, but I'm not totally sure there 
would be a difference. In Sunday, the pattern is somehow scanned at each step, 
and then upon finding a mismatch, we look up in the table the first character 
outside the window. With the PR as written, when looking up in the table the 
last character of the window, we haven't actually produced a mismatch yet; the 
'continue' statements jump ahead *until* the last characters match (mod 128), 
and then the two-way scanning begins, which determines how the window gets 
shifted after that. But after this two-way-determined shift, the last character 
in the new window immediately gets looked up again in the table at the top of 
the loop, effectively letting the two-way shift immediately be extended to a 
longer shift. I suppose that the line `j += i - suffix + 1;` (which is very 
often an increment by 1) could be changed to a table lookup of the character 
just beyond the window, but I don't quite understand why that would b
 e better than incrementing j and then doing the table lookups at the top of 
the loop.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to