I'm sure that the large majority of the string searches I've done are in Larry's tiny category.
However, I also think that big data needs are increasing, and things like FASTA files can be enormously large texts that one has good reasons to search on. If there is a heuristic switch between algorithms, it seems like it should threshold on needle, not on haystack. Or possibly some more complex function of both. But if I understand the two-way approach correctly (which I probably don't), there's not really much gain for small needle. On Wed, Oct 14, 2020, 12:09 AM Larry Hastings <la...@hastings.org> wrote: > > On 10/13/20 9:54 AM, Tim Peters wrote: > > Due to the natures of the old and new algorithms, neither will be > faster or slower in all cases, the newer one should never be > dramatically slower than the old one, the new one will be > spectacularly faster in some cases, and "in general" it seems > impossible to guess in advance. It depends on the value the fancier > preprocessing can extract from the pattern versus the higher > preprocessing cost, and that in turn depends on details of exactly > what the patterns and texts to be searched are. > > > My off-the-top-of-my-head reaction: I've always assumed that most string > searching is tiny. Like, searching for a < 10 character string in a < 256 > character string. That's what I'm always doing, at least. So while I'd > obviously like to see us address the pathological cases cited--and thanks > to Dennis Sweeney for the PRs!--I'd hope that it doesn't make these small > searches any slower. Your email didn't give me a sense of how much > additional preprocessing these new algorithms require; the fact that > they're O(1) space suggests they aren't bad. But if you do lots and lots > of dinky searches surely it would add up. > > Looking at the PR, I see there's a special case for searching for a single > character, and then cases for forward-search and reverse-search. I was > wondering if I'd find a heuristic like "if the string you're searching in > is < 2k, use the old search algorithm". Is that worth pursuing? It > doesn't seem like the maintenance cost would be that high; I don't think > anybody has touched that code in years. (No surprise, the Effbot did a > good job.) > > > */arry* > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/D5GYO2FVUW762RGZ5NMYKYM7PPWWRDIS/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4IEDAS5QAHF53IV5G3MRWPQAYBIOCWJ5/ Code of Conduct: http://python.org/psf/codeofconduct/