[issue36095] Better NaN sorting.

2019-12-23 Thread Marco Sulla


Marco Sulla  added the comment:

Excuse me, ignore my previous post.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-23 Thread Marco Sulla


Marco Sulla  added the comment:

marco@buzz:~$ python3.9
Python 3.9.0a0 (heads/master-dirty:d8ca2354ed, Oct 30 2019, 20:25:01) 
[GCC 9.2.1 20190909] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal as Dec, BasicContext as Bctx
>>> a = Dec("1981", Bctx)
>>> b = Dec("nan", Bctx)
>>> a.max(b)
Decimal('1981')
>>> b.max(a)
Decimal('1981')
>>> Bctx.max(a, b)
Decimal('1981')
>>> Bctx.max(b, a)
Decimal('1981')

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Tim Peters


Tim Peters  added the comment:

Sorry, but I'm not replying here any more:  the issue tracker is not for asking 
questions or brainstorming ideas.  Take this to python-ideas, please.  If a 
concrete proposal that gains traction results, _then_ the issue tracker may be 
an appropriate place (but a new issue - this issue is closed).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Marco Sulla


Marco Sulla  added the comment:

Excuse me, I had an epiphany.

NaN returns False for every comparison.

So in teory any element of the iterable should result minor that NaN.

So NaN should treated as the highest element, and should be at the end of the 
sorted result!

Indeed this is the behavior in Java. NaNs are in the end of the sorted iterator.

On the contrary, Python sorting does not move the NaN from its position.

Why?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Tim Peters


Tim Peters  added the comment:

Ah, so you're proposing a hack that would catch some, but not all, cases where 
a total ordering doesn't exist.  Why?  The issue tracker isn't a suitable place 
to discuss it regardless, so if you want to pursue it I'd suggest taking it to 
python-ideas.  If you do, some points to think about in advance:

- As above, it only catches some cases.  For example, given list [a, b], where 
a < b and b < a, no total ordering exists but the hack won't detect anything 
amiss.

- As has been said twice already, list.sort() promises to use only "<".  This 
isn't disposable.  User-written classes _rely_ on it, which was the point:  for 
instances to participate in sorting, the class only needs to define __lt__ 
(and, although it's not currently documented, they can also participate in the 
`bisect` and `heapq` module facilities, which also restrict themselves to "<" 
compares).

- Assuming "a < b" is false about half the time, the hack would require sorting 
to do about 50% more comparisons.  That's a huge expense.  All in all, I've 
spent at least a year of my life minimizing the number of compares Python's 
sort needs to do, and that would more than wipe out all the progress I've made 
since 1991 ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Marco Sulla


Marco Sulla  added the comment:

> No idea what "are minor that another object" could possibly mean.

Oh my god... a < b?

> I don't know what purpose would be served by checking ">=" too

Well, it's very simple. Since the sorting algorithm checks if a < b, if this 
check fails, I propose to check also a >= b. If this is false too, the iterable 
contains an unorderable object. From this point, the check will never done 
again, an unorderable object is sufficient to raise the warning.

The check a >= b is *not* for ordering the iterable, is only for checking if 
the elements are orderable or not, and raise the warning.

Furthermore, I suppose that if someone is sure that its iterable is 
unorderable-free and want a fine-grained boost to speed, a flag can added. If 
true, sorting will not use the algorithm with the check, but the old algorithm.

> You haven't addressed any of the points he (Dickinson) raised

Dickinson said you have to check for total preorder. If you have understood my 
idea, this is not needed at all.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Tim Peters


Tim Peters  added the comment:

Marco, your

> I suppose the sorting function checks if the objects of
> the iterable are minor that another object

was incoherent to me.  No idea what "are minor that another object" could 
possibly mean.

As Mark explained, the mathematical meaning of "orderable" is more expensive to 
check than it is to do sorting.

Mark also explained that list.sort() guarantees to use only "<" (__lt__) 
comparisons.  Because your message was incoherent to me (see above), I don't 
know what purpose would be served by checking ">=" too, but if there _is_ a 
coherent purpose, list.sort() cannot use ">=" regardless.

Reply to Mark's message instead of this one?  You haven't addressed any of the 
points he raised, and they're all deal-breakers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Marco Sulla


Marco Sulla  added the comment:

Anyway, Java by default puts NaNs at the end of the iterable:

https://onlinegdb.com/SJjuiXE0S

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Marco Sulla


Marco Sulla  added the comment:

Excuse me, but have you, Dickinson and Peters, read how I propose to check if 
the object is orderable or not? 

I explained it in a very detailed way, and this does not change the float 
comparison. And does not need to check first if the iterable it totally 
preordered.

Can you please read my post?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Tim Peters


Tim Peters  added the comment:

Closing as Mark suggested, but as "not a bug" rather than "won't fix".  That 
floats aren't totally ordered is a consequence of IEEE 754 semantics, not 
Python's whim.  As Mark said, if people want a total_ordering function for 
floats, that should be opened as a different issue - we're not going to change 
the default float comparison logic.

--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed
versions: +Python 2.7 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Mark Dickinson


Mark Dickinson  added the comment:

Tim, Raymond: I propose that we close this issue as "won't fix". The suggestion 
to add `math.total_ordering` could be broken out into a separate issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Mark Dickinson


Mark Dickinson  added the comment:

> IMHO sorting functions should emit a warning if they contains an unorderable 
> objects

How do you define "unorderable", and how do you propose to detect "unorderable 
objects" efficiently in practice within the sorting algorithm?

What you propose isn't practical in full generality. The case in which `sorted` 
can't give a well-defined result is the case where the collection of elements 
being sorted, under the given ordering, is not a total preorder. Being a total 
preorder is a property of the whole collection being sorted, not of individual 
objects: it's not necessarily true that if the collection is not totally 
preordered then there's any one item you can point to as being "unorderable".

Checking for a total preorder is much more expensive than simply sorting (at 
best it would be a quadratic time post-sort check), so that's not a reasonable 
check to make.

So by checking for unorderable objects, whatever those are, you're only solving 
a small part of the overall problem, at the expense of extra computation. 
You're also likely breaking the existing guarantees (depending on how you do 
this checking) that `sort` and `sorted` only rely on `<` comparisons; that 
would be a backwards compatibility break.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Marco Sulla


Marco Sulla  added the comment:

Excuse me, a little errata:

> After the current object a is checked against the object b, if 
> `all_orderables` is true [...]

must be change to

> After the current object a is checked against the object b, ***if the check 
> returns false and*** if `all_orderables` is true [...]

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-12-15 Thread Marco Sulla


Marco Sulla  added the comment:

I'm in favor of a `math.total_ordering` function, but IMHO sorting functions 
should emit a warning if they contains an unorderable objects.

This can be done easily: I suppose the sorting function checks if the objects 
of the iterable are minor that another object. And soon or later, this check 
will be done for all objects in the iterable.

What I propose is simply to add a flag `all_orderables`, true by default.

After the current object a is checked against the object b, if `all_orderables` 
is true, another check will be performed: `a >= b`?

If this return false, `all_orderables` will be set to false.

At the end of sorting, if `all_orderables` is false, a warning will be emitted.

This way, no old code will break, but developers will be informed of the 
problem. Because "Errors should never pass silently" :)

--
nosy: +Marco Sulla

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-24 Thread Tim Peters


Tim Peters  added the comment:

> *Clearly*, negative NaNs should be sorted to the start
> of the list, while positive NaNs are sorted to the end
> of the list. (Yes, I'm joking, but only a little bit: that's
> the result you'd get if you used IEEE 754's totalOrder
> to sort.

So not a joke at all!  If a revision of the standard mandates a specific total 
ordering, it's certain that people will eventually ask for it be supported.

> But it's difficult to see how it would be useful in
> practice, given that people usually don't know about
> or care about the sign that their NaN has.)

The standard "wastes" an enormous number of bit patterns on NaNs, catering to 
some hypothetical system that uses those bits to encode diagnostic information 
(e.g., maybe the source code line number of the operation that produced the 
NaN), and this is just one more "wasted" bit.  If any implementation actually 
used any of these possibilities, I'm sure semi-plausible ;-) use cases would 
follow.

I bet this was driven by implementation - it's _almost_ what you'd get if you 
viewed the float bit patterns as signed integers.  Seems the only flaw with 
that is that the negative floats are in "reverse order" then.  Which can be 
repaired easily enough; e.g.,

import struct
def totkey(x, *, MASK=(1 << 63) - 1, pack=struct.pack):
asint = int.from_bytes(pack(">d", x), "big", signed=True)
if asint < 0:
asint ^= MASK
return asint

Then

xs = [2.0, 1.0, 0.0, -0.0, -1.0, -2.0,
  math.inf, -math.inf, math.nan, -math.nan]
print(sorted(xs, key=totkey))

displays

[nan, -inf, -2.0, -1.0, -0.0, 0.0, 1.0, 2.0, inf, nan]

That makes negative NaNs smallest, positive NaNs largest, puts -0 before +0, 
and in all other respects I checked ;-) appears to match the totalOrder 
requirements.  On most boxes, it will even consider (positive) signaling NaNs 
to "be smaller" than (positive) quiet NaNs, which totalOrder also seems to 
require (presumably again not because it's obviously useful, but because it 
_follows_ from a very simple implementation like the one above).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-24 Thread Brandt Bucher

Brandt Bucher  added the comment:

One other idea I had considered was having a new magic method intended to be 
used for elementwise comparisons such as these (I’m thinking specifically of 
the old __cmp__/tp_compare). Sorting routines could check for this method on 
each element, but fall back on __lt__ if it is missing or NotImplemented. 

This would allow us to define an ordering for types independent of the rich 
comparison behavior. Simply defining it for floats and adding quick patches for 
list.sort/min/max/... would provide the language (and users) with that 
flexibility.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-24 Thread Mark Dickinson


Mark Dickinson  added the comment:

Agreed with Raymond and Tim here that the sorting functionality itself 
shouldn't be special-cased for nans.

[Tim]
> So people who go down this path can't get two steps before making a fork in 
> the road ;-)

Well, both those solutions are wrong. *Clearly*, negative NaNs should be sorted 
to the start of the list, while positive NaNs are sorted to the end of the 
list. (Yes, I'm joking, but only a little bit: that's the result you'd get if 
you used IEEE 754's totalOrder to sort. But it's difficult to see how it would 
be useful in practice, given that people usually don't know about or care about 
the sign that their NaN has.)

Maybe the solution would be to provide an official "math.total_order" function 
that can be used as a key:

sorted(my_list_of_floats_with_nans_in_it, key=math.total_order)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Tim Peters

Tim Peters  added the comment:

If we could roll back the clock, I'd impose a total ordering on floats in 
Python and supply some other way to spell 754's comparison operators (which 
users would have to ask for explicitly if they wanted to hassle with the 
near-useless and too-often-surprising "unordered" outcome).

But, too late.  As is, I don't want to mess up every context that _uses_ 
comparisons under the covers to make a special case out of NaNs.  Floats are a 
partially ordered type, and live with the consequences.

Which are approximately none for me ;-)  I rarely have a use for a NaN to begin 
with, and never in a list I intend to sort.

That said, the one thing that gives me pause is this line from numpy's docs:

   https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
"In numpy versions >= 1.4.0 nan values are sorted to the end."

But that's not the end of it:


https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
...
na_position : {‘first’, ‘last’}, default ‘last’
first puts NaNs at the beginning, last puts NaNs at the end

So people who go down this path can't get two steps before making a fork in the 
road ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

This issue isn't sorting.  The issue is with what NaNs do.  You can propose to 
"fix" NaNs rather than special casing everything else in Python that compares 
two values.

A NaN could be given a deterministic sort order relative to other floats (much 
as None used to compare less than everything else),  That said, you'll find 
people who argue than NaNs aren't broken and are doing exactly what they're 
supposed to do.

Please take this to python-ideas.  From my point of view, the current proposal 
is a non-starter.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Brandt Bucher

Brandt Bucher  added the comment:

Thanks for the feedback. I agree with you on the iffy delegation issue. 
However, this is problem that I feel deserves a fix... the behavior (silently 
producing garbage results) is just so un-pythonic. It’s been made clear in 
other issues that a warning isn’t right here, and sorting is guaranteed to use 
__lt__ (which NaN defines well for use in all other contexts). I honestly don’t 
see any other way. If that means fixing min, max, and friends too, I think it’s 
worth it.

Special cases justify special-casing, especially when doing so doesn’t break 
the rules. NaN isn’t like any other value of any other type; It’s practically 
defined by its ability to ruin valid comparison operations, even with fellow 
floats, for which ordering is otherwise well-defined.

For what it’s worth, I don’t know if anybody who’s tried to sort sets. Every 
Python programmer I know has blindly sorted floats at some point.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

-1 I am flat opposed to special casing the list.sort() code for the specific 
case of NaNs.  

This is an incorrect delegation of responsibility.  The sorted objects are 
responsible for saying how they will sort and whether than is. deterministic.  
Note, float('NaN') isn't the only case.  Sets objects only have a partial 
ordering.

Also note that list.sort() isn't the only tool that compares objects.  There is 
also min(), max(), heaps, bisect, etc.  For the most part, we've able to keep 
those all it sync with one another by a clear separation of responsibilities 
(objects decide how they are compared versus tools that use comparisons).

At the very least, this would need a python-ideas discussion. FWIW, the usual 
solution to this problem is to strip the NaN values using math.isnan().

--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Brandt Bucher


Brandt Bucher  added the comment:

As a design decision, I consciously chose "no". However, it would be 
straightforward to make the change to support float subclasses:

#define ISNAN(N) (N->ob_type == &PyFloat_Type && Py_IS_NAN(PyFloat_AsDouble(N)))

becomes

#define ISNAN(N) (PyFloat_Check(N) && Py_IS_NAN(PyFloat_AsDouble(N)))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Does it work for subclasses of float? For other floating point types like 
numpy.float32?

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Cheryl Sabella


Cheryl Sabella  added the comment:

This has been brought up in the past, for example, see #12286.

--
nosy: +cheryl.sabella

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +rhettinger, tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Brandt Bucher


Change by Brandt Bucher :


--
keywords: +patch
pull_requests: +12031
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36095] Better NaN sorting.

2019-02-23 Thread Brandt Bucher


New submission from Brandt Bucher :

Sorting sequences containing NaN values produces an incompletely sorted result. 
Further, because of the complexity of the timsort, this incomplete sort often 
silently produces unintuitive, unstable-seeming results that are extremely 
sensitive to the ordering of the inputs:

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2.0, 3, nan, 2]
>>> sorted(reversed([3, 1, 2, float('nan'), 2.0, 2, 2.0]))
[1, 2.0, 2, 2.0, nan, 2, 3]

The patch I have provided addresses these issues, including for lists 
containing nested lists/tuples with NaN values. Specifically, it stably sorts 
NaNs to the end of the list with no changes to the timsort itself (just the 
element-wise comparison functions):

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2, 2.0, 3, nan]
>>> sorted([[3], [1], [2], [float('nan')], [2.0], [2], [2.0]])
[[1], [2], [2.0], [2], [2.0], [3], [nan]]

It also includes a new regression test for this behavior.

Some other benefits to this patch:

* These changes generally result in a sorting performance improvement across 
data types. The largest increases here are for nested lists, since we add a new 
unsafe_list_compare function. Other speed increases are due to 
safe_object_compare's delegation to unsafe comparison functions for objects of 
the same type. Specifically, the speed impact (positive is faster, negative is 
slower) is between:

* -3% and +3% (10 elements, no PGO)
* 0% and +4% (10 elements, PGO)
* 0% and +9% (1000 elements, no PGO)
* -1% and +9% (1000 elements, PGO)

* The current weird NaN-sorting behavior is not documented, so this is not a 
breaking change.

* IEEE754 compliance is maintained. The result is still a stable (arguably, 
more stable), nondecreasing ordering of the original list.

--
components: Interpreter Core
messages: 336401
nosy: brandtbucher
priority: normal
severity: normal
status: open
title: Better NaN sorting.
type: behavior
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com