Oh... mine doesn't handle unbalanced duplicate values correctly :-).  Maybe
I'll figure out how to fix it.  But I'm not really proposing the
implementation anyway, just making the abstract point that we don't need
sorted()

On Thu, Dec 26, 2019 at 4:34 PM David Mertz <me...@gnosis.cx> wrote:

> FWIW, here is a timing:
>
> >>> many_nums = [randint(10, 100) for _ in range(1_000_000)]
> >>> %timeit statistics.median_low(many_nums)
> 87.2 ms ± 654 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> >>> %timeit median(many_nums)
> 282 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>
> I think almost all the slowdown is because `sorted()` is a C function.  In
> big-O terms, mine should be an improvement since it does part of a
> Quicksort in partitioning elements, but it doesn't actually bother sorting
> the smaller partition.  It *does* make one pass through to find the min
> or max though.
>
> I'm pretty sure that even in plain-Python, someone smarter than me, like
> Tim Peters (or others who watch here), could speed up my version by 2x.
>
>
> On Thu, Dec 26, 2019 at 4:13 PM David Mertz <me...@gnosis.cx> wrote:
>
>> Here is an implementation that:
>>
>> A. Only relies on '<'
>> B. Has no special knowledge of NaN
>> C. Does not use sorted() or .sort() for anything
>> D. Is Pandas-like in choosing among only comparable items
>> E. Only picks an element from the original iterator, does not take mean
>> of two candidates
>>     (I.e. kinda like statistic.median_low() without the NaN bug)
>> F. Is barely tested and not-at-all benchmarked.  I'm sure Timsort is way
>> faster than this kinda-like-quicksort approach.
>>
>> >>> even_names = ['Bob', 'Alice', 'Charlie', 'Zack', 'Xavier', 'Mary']
>> >>> odd_names = names + ['Francis']
>> >>> nums = [2, 3, 4, nan, 1]
>> >>> %paste
>> def median(it, *, _offset=0):
>>     smalls = []
>>     bigs = []
>>     go_big = False
>>     it = iter(it)
>>     candidate = next(it)
>>     for x in it:
>>         if x < candidate:
>>             smalls.append(x)
>>         elif candidate < x:
>>             bigs.append(x)
>>         elif candidate == x:
>>             if go_big:
>>                 bigs.append(x)
>>             else:
>>                 smalls.append(x)
>>             go_big = not go_big
>>
>>     if abs(len(smalls) - len(bigs) + _offset) <= 1:
>>         return candidate
>>     elif len(smalls) >= len(bigs):
>>         offset = -len(bigs)
>>         if bigs:
>>             smalls.append(min(bigs))
>>         return median(smalls, _offset=offset)
>>     else:
>>         offset = len(smalls)
>>         if smalls:
>>             bigs.append(max(smalls))
>>         return median(bigs, _offset=offset)
>>
>> ## -- End pasted text --
>> >>> statistics.median_low(odd_names), median(odd_names)
>> ('Francis', 'Francis')
>> >>> statistics.median_low(even_names), median(even_names)
>> ('Charlie', 'Charlie')
>> >>> statistics.median_low(nums), median(nums)
>> (4, 2)
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IW56I34YOZWQTDYMVV4USPCLSSW474WU/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to