Re: [Python-ideas] NAN handling in the statistics module

Tim Peters Tue, 08 Jan 2019 22:12:06 -0800

[David Mertz <me...@gnosis.cx>]
> I think consistent NaN-poisoning would be excellent behavior.  It will
> always make sense for median (and its variants).
>
>> >>> statistics.mode([2, 2, nan, nan, nan])
>> nan
>> >>> statistics.mode([2, 2, inf - inf, inf - inf, inf - inf])
>> 2
>
>
> But in the mode case, I'm not sure we should ALWAYS treat a NaN as
> poisoning the result.


I am:  I thought about the following but didn't write about it because
it's too strained to be of actual sane use ;-)

>  If NaN means "missing value" then sometimes it could change things,
>?and we shouldn't guess.  But what if it cannot?
>
>     >>> statistics.mode([9, 9, 9, 9, nan1, nan2, nan3])
>
> No matter what missing value we take those nans to maybe-possibly represent, 9
> is still the most common element.  This is only true when the most common 
> thing
> occurs at least as often as the 2nd most common thing PLUS the number
> of all NaNs.  But in that case, 9 really is the mode.

See "too strained" above.

It's equally true that, e.g., the _median_ of your list above:

    [9, 9, 9, 9, nan1, nan2, nan3]

is also 9 regardless of what values are plugged in for the nans.  That
may be easier to realize at first with a simpler list, like

    [5, 5, nan]

It sounds essentially useless to me, just theoretically possible to
make a mess of implementations to cater to.

"The right" (obvious, unsurprising, useful, easy to implement, easy to
understand) non-exceptional behavior in the presence of NaNs is to
pretend they weren't in the list to begin with.  But I'd rather
;people ask for that _if_ that's what they want.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] NAN handling in the statistics module

Reply via email to