On 12/30/19 12:45 PM, David Mertz wrote:
On Mon, Dec 30, 2019 at 12:37 PM Richard Damon
<rich...@damon-family.org <mailto:rich...@damon-family.org>> wrote:
My preference is that the interpretation that NaN means Missing Data
isn't appropriate for for the statistics module.
You need to tel the entire PyData ecosystem, the entire R ecosystem,
and a pretty much all of Data Science that they are wrong then. I
would generally prefer a different sentinel value as well, but you are
saying to refuse to interoperate with hundreds of millions of lines of
code that do not meet the rule you have now declared.
I suppose purity beats practicality though.
First, for R and other languages where arrays of data are single typed,
NaN is a sort of reasonable (or at least a least wrong) value. That is
the environment where the convention stated. There Practicality beats
trying to be pure, and once you decide you need a No Data value, NaN is
better than -99999. (one of the other historical choices)
In the domain of advanced statistical packages, that derive from that
history, I can accept that usage, when used by people who understand its
implications, and use packages adapted to Python from that domain. The
statistics package does NOT come from that history, and explicitly
refers people to package that are for those usages.
I would note that if median ignored NaNs, then so should things like
mean, and stdev which don't, but return nans. This would be an argument
that the 'poison' option maybe should be the default option for median
if a nan policy is added.
--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/P3CIUBRFRW6CBZYNLY772A7OSZTX24ND/
Code of Conduct: http://python.org/psf/codeofconduct/