New submission from Brandt Bucher <brandtbuc...@gmail.com>:

Sorting sequences containing NaN values produces an incompletely sorted result. 
Further, because of the complexity of the timsort, this incomplete sort often 
silently produces unintuitive, unstable-seeming results that are extremely 
sensitive to the ordering of the inputs:

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2.0, 3, nan, 2]
>>> sorted(reversed([3, 1, 2, float('nan'), 2.0, 2, 2.0]))
[1, 2.0, 2, 2.0, nan, 2, 3]

The patch I have provided addresses these issues, including for lists 
containing nested lists/tuples with NaN values. Specifically, it stably sorts 
NaNs to the end of the list with no changes to the timsort itself (just the 
element-wise comparison functions):

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2, 2.0, 3, nan]
>>> sorted([[3], [1], [2], [float('nan')], [2.0], [2], [2.0]])
[[1], [2], [2.0], [2], [2.0], [3], [nan]]

It also includes a new regression test for this behavior.

Some other benefits to this patch:

* These changes generally result in a sorting performance improvement across 
data types. The largest increases here are for nested lists, since we add a new 
unsafe_list_compare function. Other speed increases are due to 
safe_object_compare's delegation to unsafe comparison functions for objects of 
the same type. Specifically, the speed impact (positive is faster, negative is 
slower) is between:

    * -3% and +3% (10 elements, no PGO)
    * 0% and +4% (10 elements, PGO)
    * 0% and +9% (1000 elements, no PGO)
    * -1% and +9% (1000 elements, PGO)

* The current weird NaN-sorting behavior is not documented, so this is not a 
breaking change.

* IEEE754 compliance is maintained. The result is still a stable (arguably, 
more stable), nondecreasing ordering of the original list.

----------
components: Interpreter Core
messages: 336401
nosy: brandtbucher
priority: normal
severity: normal
status: open
title: Better NaN sorting.
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36095>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to