I’m going to strongly support David Mertz’s point here:

It is a well known axiom of computing that returning an *incorrect* result
is a very bad thing.

What the correct result of the median of a sequence of floats that contains
some NaNs is up for debate.  As David points out there are (at least) three
reasonable answers:
- NaN
- the median that ignores the NaNs
- an Exception

But having a result that essentially treats NaN as having an arbitrary
value depending on where it is the sequence is not correct by any
definition.

And while it’s nice that median supports duck typing, floats are probably
the most (certainly the second most) common type used with it — some
special case code is called for if needed.

Python is not a high performance computational language. If you are
crunching enough numbers that a NaN check is going to make a difference,
you probably should be using numpy or some other higher performance stats
lib.

So I think this should be addressed, and it's well worth a performance hit
to do so.

Which does bring us to how to do it. I think there are two possible
approaches:

1) do checks in the median function itself -- I'm going to suggest that
that's the place to do it -- and indeed, probably a good idea to review the
statistics module for other places where NaNs will cause problems.

2) in the sort function(s) -- it would b nice if sort did something
"smarter" with NaNs, but if that smarter thing is something other than a
Exception, median() should still check for NaNs; if they are al at the end
of beginning, you will at least get consistent results, but still not
really meaningful ones -- I think NaNs should either be treated as missing
values, or not allowed at all.

NOTE: a while back someone was suggesting that the sort function(s) check
for simple C datatypes in the keys -- and if they are, e.g. floats or ints,
use fast C code for the sorting -- he claimed it would make sorting much
faster for these common cases. I don't know what came of that, but if it is
implemented, then adding NaN handling there might make some sense.

-CHB
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ENCEOW4QVETPCIEQ5VE2ERTJEXHSDFDG/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to