On 12/26/19 5:23 PM, Andrew Barnert via Python-ideas wrote:
On Dec 26, 2019, at 12:36, Richard Damon <rich...@damon-family.org> wrote:
On 12/26/19 2:10 PM, Andrew Barnert via Python-ideas wrote:
On Dec 26, 2019, at 10:58, Richard Damon <rich...@damon-family.org> wrote:
Note, that NaN values are somewhat rare in most programs, I think they can only come
about by explicitly requesting them (like float("nan") ) or perhaps with some
of the more advanced math packages
You can get them easily just from math itself.
Or, once you can get infinite values, you can easily get nan values with just
basic arithmetic:
>>> 1e1000 - 1e1000
nan
I guess I didn't try hard enough to get a Nan. But once the Newbie has hit
infinities, NO answer is right.
I don’t think that’s true. Surely the median of (-inf, 1, 2, 3, inf, inf, inf)
is well defined and can only be 3?
The only case where it’s a problem is when all the values are infinite and
exactly half of them are positive, in which case the median has to be halfway
between -inf and inf. But even then, the only reasonable answers are nan or an
exception.
But you seem to assume that a program to compute the median is likely
the only function of that program. But the fact that the programmer has
overflowed values and then did some math with the overflow values starts
to lead to all types of strangeness, (and you don't need to even get to
infinities to get strangeness with math, for many values of x, we can
have x being equal to x+1).
The number could have been 1e1000 - 1e999 (and thus should be big) or 1e999 -
1e1000 (and thus should be very negative) or 1e1000 - 1e1000 (and thus should
be zero), which is why we get a NaN here.
Well, here both numbers are clearly 1e1000, and the right answer is 0. The
problem is that (in systems where float is IEEE double) that number can’t be
represented as a float in the first place, so Python approximates it with inf,
so you (inaccurately, but predictably and understandably) get nan instead of 0.
It’s like a very extreme case of “float rounding error”.
If you have actual infinite values instead, then nan or an exception is the
only appropriate answer in the first place, because subtraction is undefined.
(Assuming you’re taking the floats as an approximate model of the affinely
extended reals. If you’re taking them as a model of themselves, then it is well
defined, as nan.)
And that is my point, IF we have decided that we need to protect the
newbie, then at the point we have converted 1e1000 to inf we have put
him on the path of problems. Fixing just median is like taking a leaky
boat and bailing ONE bucket of water out of it.
If you are really worried about a median with values like this confusing
someone, then we should handle the issue MUCH earlier, maybe even trapping the
overflow with an error message unless taken out of 'newbie' mode.
This amounts to an argument that in ‘newbie’ mode there should be no inf or nan
values in float in the first place, and anything that returns one should
instead raise an OverflowError or MathDomainError. Which is actually what many
functions actually do, but I don’t think anyone has tried to divide existing
functions into ‘newbie’ mode and ‘float programmer mode’ functions, so trying
to do the same with new higher-level functions on top of them is probably a
mug’s game. (You can use Decimal with an appropriate context to get that kind
of behavior, but I don’t think any newbie would know how to even begin doing
that…)
Plus, as mentioned at the top, taking a median with some infinite values
usually makes perfectly good sense that a newbie can understand. It’s not the
same as taking a median with some nan values, which behaves in a way that only
makes sense if you think through how sorting works.
As I have been saying, fixing *median* is the wrong spot to fix it, as
there are many similar traps in the system. If we really want to protect
the newbie from this sort of error, and not treat it as a teachable
moment, then we need to make a more fundamental change.
One option is to make floating math by default safer, and require some
special statement added to the program to enable the extra features. One
problem is that you can't totally protect from these issues, as long as
you use floats, the value of numbers will not always be precise and
round off errors will accumulate, but perhaps making overflow 'noisy' by
signalling would catch some of the more confusing parts (and Python
already does some of these, like 0/0 is an error, not a NaN.). The cost
of this would be a bit of efficiency, as there would need to be some
test if we are in simple or advanced mode or a check for the overflow at
each operation, and people who know what they are doing and WANT the
support for full IEEE mode would need to add something to their program
(or environment maybe).
--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/JAAHYKOVIR5AMIF4NY2WFPRNZDFUCW4P/
Code of Conduct: http://python.org/psf/codeofconduct/