[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-13 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> Maybe add two key functions for making NaN either the smallest
> or the largest number?

SQL does this with NULLS FIRST or NULLS LAST but it is a nuisance.  I think it 
better to opt for simplicity and choose a default.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-13 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> There's also a somewhat arbitrary choice to be made here: 
> do we consider NaNs to be negative or positive?
>  That is, do we want NaNs to sort to the beginning of the 
> list, or the end?

I had suggested putting NaNs at the beginning because that is how None was 
sorted in Python 2.  But that doesn't really need to be a guide.  Likely the 
best choice is to follow numpy's decision, placing NaNs at the end.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-11 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Maybe add two key functions for making NaN either the smallest or the largest 
number?

min(1, NAN, key=nan_is_smallest) -> NAN
max(1, NAN, key=nan_is_smallest) -> 1
sorted([1, NAN, 2], key=nan_is_smallest) -> [NAN, 1, 2]
min(1, NAN, key=nan_is_largest) -> 1
max(1, NAN, key=nan_is_largest) -> NAN
sorted([1, NAN, 2], key=nan_is_largest) -> [1, 2, NAN]

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-11 Thread Mark Dickinson


Mark Dickinson  added the comment:

> That is, do we want NaNs to sort to the beginning of the list, or the end?

FWIW, NumPy chooses to sort NaNs to the end of the list: 
https://numpy.org/doc/stable/reference/generated/numpy.sort.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-11 Thread Mark Dickinson


Mark Dickinson  added the comment:

> We should consider biting the bullet and revising the default NaN sort order.

If we went that route, I think we wouldn't need to consider payload or 
identity. We could just do:

NaN < NaN  ->  False
NaN < non-NaN  ->  True
non-NaN < NaN  ->  False
non-NaN < non-NaN  ->  usual numeric comparison

That then satisfies the axioms for a total ordering, albeit that the implied 
equality isn't Python's == (and it can't be, if we're going to keep the 
property that NaN != NaN): all NaNs compare equal for the equality determined 
by the above order, and the stability of the sort means that those NaNs will 
retain their order relative to each other in the sorted output.

Making `NaN < non-NaN` return `True` (which happens under both the proposal 
above and Raymond's more elaborate proposal) _would_ be a break with IEEE 754, 
though.

There's also a somewhat arbitrary choice to be made here: do we consider NaNs 
to be negative or positive? That is, do we want NaNs to sort to the beginning 
of the list, or the end?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-10 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

We should consider biting the bullet and revising the default NaN sort order.  
It has been a perpetual irritant.

1) Have NaNs always compare to less than any other float value.
2) When comparing two distinct NaNs, use the NaN payload
   and fall back to object id if necessary.
3) When comparing two identical NaNs, always return False.

That preserves the IEEE-754 guarantee that NaNs never compare equal to one 
another -- which is one of the few properties that people rely on.

IEEE-754 doesn't concern itself at all with the notion of object identity, so 
giving two NaNs a deterministic but unequal ordering isn't at odds with the 
standard.

IMHO, making min(), max(), and sorted() both deterministic and commutative is a 
big enough win to warrant foregoing one of the less important properties of 
NaN.  User care about determinism and commutativity more than they care that 
float('NaN') > 10 and float('NaN') < 10 both return False.

“Logic clearly dictates that the needs of the many outweigh the needs of the 
few.” -- Spock, The Wrath of Khan ;-)

Python is practical language, so we should consider making a choice based on 
pragmatism rather than strict standard compliance.  Bug reports like this are 
the tip of the iceberg.  Given the prevalence of NaNs being used as a 
placeholder for missing data, it is likely that people struggle with this every 
day.

The original design of NaNs was to let invalid intermediate results flow 
through a calculation and not require error testing at every step.  But in 
practice, NaNs are not commonly used this way.  Mainly, they are used as the 
float equivalent of None in an array of floats.  We should support this 
reality.  IMO it is a small disaster that NaNs completely wreck sorting and do 
so silently.  From user POV, an arcane property of NaNs is an inadequate 
justification for the misbehavior.

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-10 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

The idea of math.ieee754_total_order looks interesting, but how would it work 
with min/max?

In 
https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf
 there is a comparison of several standards and implementations of symmetric 
and associative min/max. There are two options: propagate NaN and ignore it 
(treat it as missing data). It differs between different standards and 
implementations but in particular standard or implementation the same rule is 
used for min and max.

* If ieee754_total_order(NAN) < ieee754_total_order(1), then min(1, NAN) -> NAN 
("propagate") and max(1, NAN) -> 1 ("missing data").
* If ieee754_total_order(NAN) > ieee754_total_order(1), then min(1, NAN) -> 1 
("missing data") and max(1, NAN) -> NAN ("propagate").

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-10 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

math.ieee754_total_order sounds good to me, but how would you get the minimum?

Wikipedia doesn't have much on the 2019 revision to IEEE-754 except to say that 
the min/max rules have been overhauled again, but without giving much detail.

https://en.wikipedia.org/wiki/IEEE_754

Also relevant:

https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-10 Thread Steven D'Aprano


Change by Steven D'Aprano :


--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-10 Thread Mark Dickinson


Mark Dickinson  added the comment:

See also #11986, which this report is essentially a duplicate of.

I think it may be time to implement `math.ieee754_total_order` (after suitable 
bikeshedding about the name), so that one can do

   sorted(my_list_of_floats, key=math.ieee754_total_order)

and

   max(my_list_of_floats, key=math.ieee754_total_order)

--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-10 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

implantation --> implementation

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-10 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

This is essentially the same issue, but with sorted(): 
https://bugs.python.org/issue36095

The trouble is that the implementation of min() is roughly equivalent to:

iterable = iter(iterable)
current_min = next(iterable)
for x in iterable:
if x < current_min:
current_min = x
return current_min

NaN < number and number < NaN both return false. Thus including NaNs make 
floats not totally ordered, so there's no perfect way to  sort them or take a 
max/min. This leads to some unexpected result, as you've identified. So you 
could say that the real "bug" is in floats more broadly.

This behavior was originally implemented to comply with IEEE 754. Even if that 
was a mistake, it would be very difficult to change this behavior in case 
someone's code relies on exactly using this implantation. I also don't like the 
idea of special-casing the behavior of min/max for floats, since right now the 
behavior is completely agnostic of the types of the items.

--
nosy: +Dennis Sweeney

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-09 Thread Joël Larose

Joël Larose  added the comment:

The same problem occurs if the argument is a `list`.  The same inconsistency 
happens depending on the position in the list that `nan` happens to be.

>>> max([5, nan, 3, 0, 8, -10])
8

>>> min([5, nan, 3, 0, 8, -10])
-10

>>> min([nan, 5, 3, 0, 8, -10])
nan

>>> max([nan, 5, 3, 0, 8, -10])
nan

Passing a `tuple` with the same values produces the same inconsistency.

For the examples above, replacing the lists with sets with the same values 
(i.e. replace [] with {}) always results in `nan`.  This may have to do with 
the hash value of `nan` always making the first value in iteration be `nan` 
given the sample space.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-09 Thread Joël Larose

Joël Larose  added the comment:

Forgot to mention the environment:
Python version: 3.9.0
OS: Windows 10

I have not tested this on any other version of python.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44370] Inconsistent results for min() and max() with math.nan as argument

2021-06-09 Thread Joël Larose

New submission from Joël Larose :

If `math.nan` is the first argument for either max() or min(), the result is 
always `nan`, regardless of the other values in the results.  However, if `nan` 
is in any other position in the arguments list, the result is always what you 
would expect if `nan` was not in the arguments list.

E.g.
from math import nan, inf

>>> max(nan, 5, 3, 0)
nan

>>> min(nan, 5, 3, 0)
nan

>>> min(nan, 5, 3, 0, inf, -inf)
nan

>>> max(nan, 5, 3, 0, inf, -inf)
nan

>>> max(inf, 5, 3, 0, nan, -inf)
inf

>>> max(8, inf, 5, 3, 0, nan, -inf)
inf

>>> min(8, inf, 5, 3, 0, nan, -inf)
-inf

>>> min(8, nan, inf, 5, 3, 0, nan, -inf)
-inf

>>> max(8, nan, inf, 5, 3, 0, nan, -inf)
inf

>>> max(8, 5, 3, 0, nan)
8

As you can see, the results for min() and max() are inconsistent for `nan` 
depending on whether it's the first argument or not.

I would prefer for min() and max() to ignore `nan` unless all arguments are 
`nan`.  However, the other path for consistent behaviour is to return `nan` if 
any of the arguments are `nan`.

--
components: Interpreter Core
messages: 395497
nosy: joel.larose
priority: normal
severity: normal
status: open
title: Inconsistent results for min() and max() with math.nan as argument
type: behavior
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com