[issue21592] Make statistics.median run in linear time

Julian Taylor Sun, 01 Jun 2014 05:21:13 -0700

Julian Taylor added the comment:

in the case of the median you can archive similar performance to a multiselect 
by simply calling min([len(data) // 2 + 1]) for the second order statistic 
which you need for the averaging of even number of elements.


maybe an interesting datapoint would be to compare with numpys selection 
algorithm which is a intromultiselect (implemented in C for native datattypes).
It uses a standard median of 3 quickselect with a cutoff in recursion depth to 
median of median of group of 5.
the multiselect is implemented using a sorted list of kth order statistics and 
reducing the search space for each kth by maintaining a stack of all visited 
pivots.
E.g. if you search for 30 and 100, when during the search for 30 one has 
visited pivot 70 and 110, the search for 100 only needs to select in l[70:110].
The not particularly readable implementation is in: 
./numpy/core/src/npysort/selection.c.src
unfortunately for object types it currently falls back to quicksort so you 
can't directly compare performance with the pure python variants.

----------
nosy: +jtaylor

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21592>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21592] Make statistics.median run in linear time

Reply via email to