FWIW, although no one cares, I "withdraw" my proposed implementation.
While it bugs me that I'm not sure what error I made in dealing with
duplicate values in an iterable, on reflection I think the whole idea is
wrong.

That is, I don't like the weirdness of the behavior of statistics.median.
But what I guard against in my partitioning approach isn't every possible
comparison of two items anyway.  That would always take quadratic time.  I
just do a bunch of such comparisons according to some particular program
flow, but not everything.  "Incomparability" can be a property of any pair
of objects, in principle.

However, I also realize the completely general question is irrelevant.
NaNs really are just special in arising innocuously from relatively normal
numeric operations.  If I make some custom class IncomparableToEverything,
it's my problem if I stick it in a list of things I want the median of.

So we could get the Pandas-style behavior simply by calling median like so:

    statistics.median((x for x in it if not math.isnan(x)))

I still feel like having median (and friends) do that internally would be
worthwhile under some optional parameter.  But the default value of that
parameter is indeed non-obvious.  In a sort of Pandas way of using
arguments, we might get `on_nan=["skip"|"poison"|"raise"|"random"]`.
"Random" seems like the only wrong answer, but it is the status quo.

On Thu, Dec 26, 2019 at 4:34 PM David Mertz <me...@gnosis.cx> wrote:

> FWIW, here is a timing:
>
> >>> many_nums = [randint(10, 100) for _ in range(1_000_000)]
> >>> %timeit statistics.median_low(many_nums)
> 87.2 ms ± 654 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> >>> %timeit median(many_nums)
> 282 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>
> I think almost all the slowdown is because `sorted()` is a C function.  In
> big-O terms, mine should be an improvement since it does part of a
> Quicksort in partitioning elements, but it doesn't actually bother sorting
> the smaller partition.  It *does* make one pass through to find the min
> or max though.
>

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/42GTSIJ6HBGDFTSUMMZDSANFVCHJEIZC/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to