[David Mertz <me...@gnosis.cx>] > I think consistent NaN-poisoning would be excellent behavior. It will > always make sense for median (and its variants). > >> >>> statistics.mode([2, 2, nan, nan, nan]) >> nan >> >>> statistics.mode([2, 2, inf - inf, inf - inf, inf - inf]) >> 2 > > > But in the mode case, I'm not sure we should ALWAYS treat a NaN as > poisoning the result.
I am: I thought about the following but didn't write about it because it's too strained to be of actual sane use ;-) > If NaN means "missing value" then sometimes it could change things, >?and we shouldn't guess. But what if it cannot? > > >>> statistics.mode([9, 9, 9, 9, nan1, nan2, nan3]) > > No matter what missing value we take those nans to maybe-possibly represent, 9 > is still the most common element. This is only true when the most common > thing > occurs at least as often as the 2nd most common thing PLUS the number > of all NaNs. But in that case, 9 really is the mode. See "too strained" above. It's equally true that, e.g., the _median_ of your list above: [9, 9, 9, 9, nan1, nan2, nan3] is also 9 regardless of what values are plugged in for the nans. That may be easier to realize at first with a simpler list, like [5, 5, nan] It sounds essentially useless to me, just theoretically possible to make a mess of implementations to cater to. "The right" (obvious, unsurprising, useful, easy to implement, easy to understand) non-exceptional behavior in the presence of NaNs is to pretend they weren't in the list to begin with. But I'd rather ;people ask for that _if_ that's what they want. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/