On 12/26/19 5:23 PM, Andrew Barnert via Python-ideas wrote:
On Dec 26, 2019, at 12:36, Richard Damon <rich...@damon-family.org> wrote:
On 12/26/19 2:10 PM, Andrew Barnert via Python-ideas wrote:
On Dec 26, 2019, at 10:58, Richard Damon <rich...@damon-family.org> wrote:
Note, that NaN values are somewhat rare in most programs, I think they can only come 
about by explicitly requesting them (like float("nan") ) or perhaps with some 
of the more advanced math packages
You can get them easily just from math itself.

Or, once you can get infinite values, you can easily get nan values with just 
basic arithmetic:

     >>> 1e1000 - 1e1000
     nan

I guess I didn't try hard enough to get a Nan. But once the Newbie has hit 
infinities, NO answer is right.
I don’t think that’s true. Surely the median of (-inf, 1, 2, 3, inf, inf, inf) 
is well defined and can only be 3?

The only case where it’s a problem is when all the values are infinite and 
exactly half of them are positive, in which case the median has to be halfway 
between -inf and inf. But even then, the only reasonable answers are nan or an 
exception.
But you seem to assume that a program to compute the median is likely the only function of that program. But the fact that the programmer has overflowed values and then did some math with the overflow values starts to lead to all types of strangeness, (and you don't need to even get to infinities to get strangeness with math, for many values of x, we can have x being equal to x+1).

The number could have been 1e1000 - 1e999 (and thus should be big) or 1e999 - 
1e1000 (and thus should be very negative) or 1e1000 - 1e1000 (and thus should 
be zero), which is why we get a NaN here.
Well, here both numbers are clearly 1e1000, and the right answer is 0. The 
problem is that (in systems where float is IEEE double) that number can’t be 
represented as a float in the first place, so Python approximates it with inf, 
so you (inaccurately, but predictably and understandably) get nan instead of 0. 
It’s like a very extreme case of “float rounding error”.

If you have actual infinite values instead, then nan or an exception is the 
only appropriate answer in the first place, because subtraction is undefined. 
(Assuming you’re taking the floats as an approximate model of the affinely 
extended reals. If you’re taking them as a model of themselves, then it is well 
defined, as nan.)
And that is my point, IF we have decided that we need to protect the newbie, then at the point we have converted 1e1000 to inf we have put him on the path of problems. Fixing just median is like taking a leaky boat and bailing ONE bucket of water out of it.

If you are really worried about a median with values like this confusing 
someone, then we should handle the issue MUCH earlier, maybe even trapping the 
overflow with an error message unless taken out of 'newbie' mode.
This amounts to an argument that in ‘newbie’ mode there should be no inf or nan 
values in float in the first place, and anything that returns one should 
instead raise an OverflowError or MathDomainError. Which is actually what many 
functions actually do, but I don’t think anyone has tried to divide existing 
functions into ‘newbie’ mode and ‘float programmer mode’ functions, so trying 
to do the same with new higher-level functions on top of them is probably a 
mug’s game. (You can use Decimal with an appropriate context to get that kind 
of behavior, but I don’t think any newbie would know how to even begin doing 
that…)

Plus, as mentioned at the top, taking a median with some infinite values 
usually makes perfectly good sense that a newbie can understand. It’s not the 
same as taking a median with some nan values, which behaves in a way that only 
makes sense if you think through how sorting works.

As I have been saying, fixing *median* is the wrong spot to fix it, as there are many similar traps in the system. If we really want to protect the newbie from this sort of error, and not treat it as a teachable moment, then we need to make a more fundamental change.

One option is to make floating math by default safer, and require some special statement added to the program to enable the extra features. One problem is that you can't totally protect from these issues, as long as you use floats, the value of numbers will not always be precise and round off errors will accumulate, but perhaps making overflow 'noisy' by signalling would catch some of the more confusing parts (and Python already does some of these, like 0/0 is an error, not a NaN.). The cost of this would be a bit of efficiency, as there would need to be some test if we are in simple or advanced mode or a check for the overflow at each operation, and people who know what they are doing and WANT the support for full IEEE mode would need to add something to their program (or environment maybe).

--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JAAHYKOVIR5AMIF4NY2WFPRNZDFUCW4P/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to