On Wed, Oct 14, 2020 at 9:56 AM Tim Peters <tim.pet...@gmail.com> wrote:

> [Guido]
> > Maybe someone reading this can finish the Wikipedia page on
> > Two-Way Search? The code example trails off with a function with
> > some incomprehensible remarks and then a TODO..
>
> Yes, the Wikipedia page is worse than useless in its current state,
> although some of the references it lists are helpful.  This is a much
> better summary:
>
> http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
>
> but, I believe, still far too telegraphic.
>

The key seems to be:

"""
The searching phase of the Two Way algorithm consists in first comparing
the character of xr from left to right, then the character of x from right
to left.
When a mismatch occurs when scanning the k-th character of xr, then a shift
of length k is performed.
When a mismatch occurs when scanning x, or when an occurrence of the
pattern is found, then a shift of length per(x) is performed.
Such a scheme leads to a quadratic worst case algorithm, this can be
avoided by a prefix memorization: when a shift of length per(x) is
performed the length of the matching prefix of the pattern at the beginning
of the window (namely m-per(x)) after the shift is memorized to avoid to
scan it again during the next attempt.
"""

The preprocessing comes down to splitting the original search string
("needle") in two parts, xl and xr, using some clever algorithm (IIUC the
wikipedia page does describe this, although my brain is currently too
addled to visualize it).

The original paper is by far the best account I've seen so far, with
> complete code and exhaustive explanations and proofs.  Even examples
> ;-)
>

I need a Python version though.


> But here's the thing: I don't believe this algorithm this _can_ be
> reduced to an elevator pitch.  Excruciating details appear to be
> fundamental at every step, and I've seen nothing yet anywhere
> approaching an "intuitive explanation" :-(  What happens instead is
> that you run it on the hardest cases you can dream up, and your jaw
> drops in amazement :-)
>

I am not able to dream up any hard cases -- like other posters, my own use
of substring search is usually looking for a short string in a relatively
short piece of text. I doubt even the current optimizations matter to my
uses. What are typical hard cases used for? DNA search? (That would be
cool!)

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HSLPDIFQ2YEWBBO3XJKYQHMP3LML3DNS/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to