[Python-ideas] Re: Fix statistics.median()?

Andrew Barnert via Python-ideas Thu, 26 Dec 2019 14:27:14 -0800

On Dec 26, 2019, at 12:36, Richard Damon <rich...@damon-family.org> wrote:
> 
> On 12/26/19 2:10 PM, Andrew Barnert via Python-ideas wrote:
>>> On Dec 26, 2019, at 10:58, Richard Damon <rich...@damon-family.org> wrote:
>>> Note, that NaN values are somewhat rare in most programs, I think they can 
>>> only come about by explicitly requesting them (like float("nan") ) or 
>>> perhaps with some of the more advanced math packages
>> You can get them easily just from math itself.
>> 
>> Or, once you can get infinite values, you can easily get nan values with 
>> just basic arithmetic:
>> 
>>     >>> 1e1000 - 1e1000
>>     nan
>> 
> I guess I didn't try hard enough to get a Nan. But once the Newbie has hit 
> infinities, NO answer is right.


I don’t think that’s true. Surely the median of (-inf, 1, 2, 3, inf, inf, inf) 
is well defined and can only be 3?

The only case where it’s a problem is when all the values are infinite and 
exactly half of them are positive, in which case the median has to be halfway 
between -inf and inf. But even then, the only reasonable answers are nan or an 
exception.

> The number could have been 1e1000 - 1e999 (and thus should be big) or 1e999 - 
> 1e1000 (and thus should be very negative) or 1e1000 - 1e1000 (and thus should 
> be zero), which is why we get a NaN here.

Well, here both numbers are clearly 1e1000, and the right answer is 0. The 
problem is that (in systems where float is IEEE double) that number can’t be 
represented as a float in the first place, so Python approximates it with inf, 
so you (inaccurately, but predictably and understandably) get nan instead of 0. 
It’s like a very extreme case of “float rounding error”.

If you have actual infinite values instead, then nan or an exception is the 
only appropriate answer in the first place, because subtraction is undefined. 
(Assuming you’re taking the floats as an approximate model of the affinely 
extended reals. If you’re taking them as a model of themselves, then it is well 
defined, as nan.)

> If you are really worried about a median with values like this confusing 
> someone, then we should handle the issue MUCH earlier, maybe even trapping 
> the overflow with an error message unless taken out of 'newbie' mode.

This amounts to an argument that in ‘newbie’ mode there should be no inf or nan 
values in float in the first place, and anything that returns one should 
instead raise an OverflowError or MathDomainError. Which is actually what many 
functions actually do, but I don’t think anyone has tried to divide existing 
functions into ‘newbie’ mode and ‘float programmer mode’ functions, so trying 
to do the same with new higher-level functions on top of them is probably a 
mug’s game. (You can use Decimal with an appropriate context to get that kind 
of behavior, but I don’t think any newbie would know how to even begin doing 
that…)

Plus, as mentioned at the top, taking a median with some infinite values 
usually makes perfectly good sense that a newbie can understand. It’s not the 
same as taking a median with some nan values, which behaves in a way that only 
makes sense if you think through how sorting works.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C427NQXUSCPTP3HSIMEQQRFERALFXM5X/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to