FWIW, here is a timing: >>> many_nums = [randint(10, 100) for _ in range(1_000_000)] >>> %timeit statistics.median_low(many_nums) 87.2 ms ± 654 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> %timeit median(many_nums) 282 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I think almost all the slowdown is because `sorted()` is a C function. In big-O terms, mine should be an improvement since it does part of a Quicksort in partitioning elements, but it doesn't actually bother sorting the smaller partition. It *does* make one pass through to find the min or max though. I'm pretty sure that even in plain-Python, someone smarter than me, like Tim Peters (or others who watch here), could speed up my version by 2x. On Thu, Dec 26, 2019 at 4:13 PM David Mertz <me...@gnosis.cx> wrote: > Here is an implementation that: > > A. Only relies on '<' > B. Has no special knowledge of NaN > C. Does not use sorted() or .sort() for anything > D. Is Pandas-like in choosing among only comparable items > E. Only picks an element from the original iterator, does not take mean of > two candidates > (I.e. kinda like statistic.median_low() without the NaN bug) > F. Is barely tested and not-at-all benchmarked. I'm sure Timsort is way > faster than this kinda-like-quicksort approach. > > >>> even_names = ['Bob', 'Alice', 'Charlie', 'Zack', 'Xavier', 'Mary'] > >>> odd_names = names + ['Francis'] > >>> nums = [2, 3, 4, nan, 1] > >>> %paste > def median(it, *, _offset=0): > smalls = [] > bigs = [] > go_big = False > it = iter(it) > candidate = next(it) > for x in it: > if x < candidate: > smalls.append(x) > elif candidate < x: > bigs.append(x) > elif candidate == x: > if go_big: > bigs.append(x) > else: > smalls.append(x) > go_big = not go_big > > if abs(len(smalls) - len(bigs) + _offset) <= 1: > return candidate > elif len(smalls) >= len(bigs): > offset = -len(bigs) > if bigs: > smalls.append(min(bigs)) > return median(smalls, _offset=offset) > else: > offset = len(smalls) > if smalls: > bigs.append(max(smalls)) > return median(bigs, _offset=offset) > > ## -- End pasted text -- > >>> statistics.median_low(odd_names), median(odd_names) > ('Francis', 'Francis') > >>> statistics.median_low(even_names), median(even_names) > ('Charlie', 'Charlie') > >>> statistics.median_low(nums), median(nums) > (4, 2) > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZNM2Q3FIG23EPPKI2SLHC6LLRNS7VKFW/ Code of Conduct: http://python.org/psf/codeofconduct/