[Steven D'Aprano <st...@pearwood.info>]
> Sorting doesn't require a total order. Sorting only requires a weak
> order where the only operator required is the "comes before" operator,
> or less than. That's precisely how sorting in Python is implemented.

Let's be careful here.  Python's list.sort() guarantees that _if_ the
elements are totally orderable, it will deliver the unique stable
permutation consistent with that ordering.  Otherwise, the only thing
it guarantees is that the output will be _some_ permutation of the
input.

There was no attempt to do something "useful" in the absence of a
total order - whatever happens then is purely down to implementation
accidents.

The reason is sticks to __lt__ is pragmatic, not theoretical:  at the
time it was designed, rich comparisons weren't yet implemented, and
all comparisons were implemented via __cmp__ methods.  But rich
comparisons were scheduled for a future release, and sticking to
__lt__ was my deliberate nod to making it as easy as possible then for
classes that wanted to define their own order to participate in
sorting.  Implement __lt__, and you're good to go.

I don't think it's documented, but the guts of the bisect and heapq
modules were reworked (I think by Raymond) to stick to __lt__ too, for
the same reason.

But, e.g.,

>>> sorted([{1, 2}, {3}, {1}])
[{1, 2}, {3}, {1}]

despite that {1} < {1, 2}

As far as sorted() is concerned, that's "garbage in garbage out".
What happens today (and since the current list.sort() was written -
but may not be so tomorrow):

First it asks whether {3} < {1, 2}.  It gets back False, so concludes
the first two elements are already in order.  Then it asks whether {1}
< {3}.  Again False, so it concludes those two are also in order.

And that's it.  It's done.  It never compares {1, 2} to {1} at all.
By comparing only adjacent pairs of elements, it concludes that the
entire input consists of one non-descending run.  For totally ordered
input elements, that means "it's already sorted".


> Here is an interesting discussion of a practical use-case of sorting
> data with a partial order:
>
> https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-am-forking-the-compiler/

list.sort() is of no use for finding a linear ordering that respects a
collection of."A must precede B" constraints  But Python 3.9
introduces a capable new functools.TopologicalSorter class that is
good for that.  Although, unlike the approach in that blog, it doesn't
cater to cycles - if there's a cycle in the input, it raises
CycleError.


>> but we really need to use a
>> type that has these exceptional values. Imagine that sort/median was
>> defined to type check its parameter,

> No need to imagine it, sort already type-checks its arguments:
>
>     py> sorted([1, 3, 5, "Hello", 2])
>     TypeError: '<' not supported between instances of 'str' and 'int'

Yes and no ;-)  list.sort() makes no attempt to check element types
for the sake of checking them.  In the example, it's just passing on
an exception raised by asking whether "Hello" < 5 (as in the example
above, it first compares adjacent pairs to identify the longest
initial run). If, e.g., we sorted a list like [a, b, c], where

    (b < a) == (c < b) == False

it would stop then, even if comparing `a` with `c` would raise an exception.

> [snipped stuff about NaNs]
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IY6C4E3M5472IDPDG24KYNLK2AZNW3DI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to