[Terry Reedy <tjre...@udel.edu>] >> Tim Peters investigated and empirically determined that an >> O(n*n) binary insort, as he optimized it on real machines, is faster >> than O(n*logn) sorting for up to around 64 items.
[Nikolaus Rath <nikol...@rath.org>] > Out of curiosity: is this test repeated periodically on different > architectures? Or could it be that it only ever was true 10 years ago on > Tim's Power Mac G5 (or whatever he used)? It has little to do with architecture, but much to do with the relative cost of comparisons versus pointer-copying. Near the end of https://github.com/python/cpython/blob/master/Objects/listsort.txt """ BINSORT A "binary insertion sort" is just like a textbook insertion sort, but instead of locating the correct position of the next item via linear (one at a time) search, an equivalent to Python's bisect.bisect_right is used to find the correct position in logarithmic time. Most texts don't mention this variation, and those that do usually say it's not worth the bother: insertion sort remains quadratic (expected and worst cases) either way. Speeding the search doesn't reduce the quadratic data movement costs. But in CPython's case, comparisons are extraordinarily expensive compared to moving data, and the details matter. Moving objects is just copying pointers. Comparisons can be arbitrarily expensive (can invoke arbitrary user-supplied Python code), but even in simple cases (like 3 < 4) _all_ decisions are made at runtime: what's the type of the left comparand? the type of the right? do they need to be coerced to a common type? where's the code to compare these types? And so on. Even the simplest Python comparison triggers a large pile of C-level pointer dereferences, conditionals, and function calls. So cutting the number of compares is almost always measurably helpful in CPython, and the savings swamp the quadratic-time data movement costs for reasonable minrun values. """ Binsort does a close to optimal number of comparisons on randomly ordered data, and that's the point. Also, in the context of the overall sorting algorithm, binsort is used to _extend_ the length of a naturally occurring "too short" run. There's no need to sort the whole thing from scratch, because we already know the prefix is sorted. That makes binsort a more-than-less obvious choice.(it takes full advantage of knowing that the prefix is already ordered). As that doc also says: """ When N is a power of 2, testing on random data showed that minrun values of 16, 32, 64 and 128 worked about equally well. At 256 the data-movement cost in binary insertion sort clearly hurt, and at 8 the increase in the number of function calls clearly hurt. """ So it settled on forcing minrun into the range 32 <= minrun <= 64 (the precise value depends on the number of elements in the entire array, for reasons also explained in that doc). That's far from either end where the value clearly mattered. If the full path through Python's expensive PyObject_RichCompareBool(X, Y, Py_LT) has gotten significantly faster, a smaller minrun range may make more sense now; or if it's gotten significantly slower, a larger minrun range. But, no, I don't believe anyone retests it. IIRC, when the algorithm was adopted in Java, they found a minrun range of 16 through 32 worked marginally better for them, because _their_ spelling of PyObject_RichCompareBool (for Java object comparison methods) is faster than CPython's. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com