Tim Peters <t...@python.org> added the comment:

I'm sorry I haven't been able to give more time to this. I love what's been 
done, but am just overwhelmed :-(

The main thing here is to end quadratic-time disasters, without doing 
significant damage in other cases. Toward that end it would be fine to use 
"very high" cutoffs, and save tuning for later. It's clear that we _can_ get 
significant benefit even in many non-disastrous cases, but unclear exactly how 
to exploit that. But marking the issue report as "fixed" doesn't require 
resolving that - and I want to see this improvement in the next Python release.

An idea that hasn't been tried: in the world of sorting, quicksort is usually 
the fastest method - but it's prone to quadratic-time disasters. OTOH, heapsort 
never suffers disasters, but is almost always significantly slower in ordinary 
cases. So the more-or-less straightforward "introsort" compromises, by starting 
with a plain quicksort, but with "introspection" code added to measure how 
quickly it's making progress. If quicksort appears to be thrashing, it switches 
to heapsort.

It's possible an "introsearch" would work well too. Start with pure brute force 
(not even with the current preprocessing), and switch to something else if it's 
making slow progress. That's easy to measure: compare the number of character 
comparisons we've made so far to how many character positions we've advanced 
the search window so far. If that ratio exceeds (say) 3, we're heading out of 
linear-time territory, so should switch to a different approach.

With no preprocessing at all, that's likely to be faster than even the current 
code for very short needles, and in all cases where the needle is found at the 
start of the haystack.

This context is trickier, though, in that the current code, and pure C&P, can 
also enjoy sub-linear time cases. It's always something ;-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to