On 10/13/20 9:54 AM, Tim Peters wrote:
Due to the natures of the old and new algorithms, neither will be
faster or slower in all cases, the newer one should never be
dramatically slower than the old one, the new one will be
spectacularly faster in some cases, and "in general" it seems
impossible to guess in advance. It depends on the value the fancier
preprocessing can extract from the pattern versus the higher
preprocessing cost, and that in turn depends on details of exactly
what the patterns and texts to be searched are.


My off-the-top-of-my-head reaction: I've always assumed that most string searching is tiny.  Like, searching for a < 10 character string in a < 256 character string.  That's what I'm always doing, at least.  So while I'd obviously like to see us address the pathological cases cited--and thanks to Dennis Sweeney for the PRs!--I'd hope that it doesn't make these small searches any slower.  Your email didn't give me a sense of how much additional preprocessing these new algorithms require; the fact that they're O(1) space suggests they aren't bad.  But if you do lots and lots of dinky searches surely it would add up.

Looking at the PR, I see there's a special case for searching for a single character, and then cases for forward-search and reverse-search.  I was wondering if I'd find a heuristic like "if the string you're searching in is < 2k, use the old search algorithm".  Is that worth pursuing?  It doesn't seem like the maintenance cost would be that high; I don't think anybody has touched that code in years.  (No surprise, the Effbot did a good job.)


//arry/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/D5GYO2FVUW762RGZ5NMYKYM7PPWWRDIS/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to