Cyker Way <cyker...@gmail.com> added the comment:

Thank you very much for the great discussion here, especially Tim's great 
threads in *python-ideas* that give neat and insightful answers to this problem 
in different ways:

-   <https://mail.python.org/pipermail/python-ideas/2016-October/043045.html>

-   <https://mail.python.org/pipermail/python-ideas/2016-October/043126.html>

Since this topic is closed, future discussions probably should go to other 
python forums. But it might be good to draw some conclusions here for future 
reference.

First of all, either single-pass sort with a vector key or multi-pass sort with 
a scalar key may work better, depending on the input. However, in most cases, 
using multi-pass sort for such problem is the right way to go in the current 
python implementation. The multi-pass sort algorithm typically runs 2x faster 
or so than a single-pass sort algorithm. This is likely due to constants rather 
than asymptotic complexity. But when measured in real time, multi-pass sort 
algorithm clearly wins in most cases.

If your input is so special that it aligns much better with single-pass sort 
algorithms (overwhelming the constant advantage of multi-pass sort algorithm), 
you may use a single-pass sort algorithm. But there are actually different ways 
of implementing so. The algorithm posted in Tim's second thread on python-ideas 
is in fact different from mine in this bug thread, where Tim used a wrapper 
class for the keys and I used a wrapper class for the scalars. Since there are 
`n` keys but can be as many as `n * m` scalars, my method would be using more 
wrapper objects. So I expected it to run slower than Tim's. To my surprise, 
sometimes it works better. The reason is later found to be easy to understand: 
Wrapper objects are created only when a column needs to be reversed. The number 
of columns to be reversed is actually a factor controlling the running time. To 
give a fair evaluation, I created 500000 random rows with 8 columns and control 
the number of columns to be reversed. The evaluation s
 hows my algorithm could have running time 80%-120% that of Tim's (excluding 
the special case where no column needs to be reversed). This shows a new 
direction of optimizing such algorithms.

Finally, this issue was rejected because the added benefits were deemed not 
enough to complicate the implementation. Considering the current multi-pass 
sort algorithm has a huge advantage and is easy to implement in python, this 
decision is fair. Users who care less about performance may write a key adapter 
in their own code if they want to stick with builtin sort functions. Users who 
do care about performance can use single-pass sort techniques mentioned in this 
issue in case multi-pass sort doesn't work well with their data.

----------
Added file: https://bugs.python.org/file47880/performance-2.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35010>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to