I started writing up a SortedDict use case I have, but it's very
elaborate and I expect it would just end with endless pointless
argument about other approaches I _could_ take. But I already know all
those ;-)

So let's look at something conceptually "dead easy" instead: priority
queues. They're a basic building block in algorithms from many fields.

In the distributed Python, `heapq` is the only thing that comes close
to being reasonably efficient and scalable for that.

However:

- It only supports min-heaps. It's not that max-heaps aren't also
useful (indeed, _under the covers_ CPython also implements max-heaps
for its own internal use). It's more that the original API simply
doesn't generalize cleanly.

- So there's a body of word-of-mouth "tricks" for making a min-heap
"act like" a max-heap. For integer or float priorities, it suffices to
use the negation of "the real" priority. For other kinds of values, it
gets trickier. In the end, you can wrap each item in a class that
implements `x.__lt__(y)` by doing `y.value < x.value`. But few people
appear to be aware of that, while nobody wants to endure the bother
;-)

- It's often the case that "the priority" needs to be computed from an
item's value. `heapq` has no direct support for that. Instead there's
another body of word-of-mouth tricks akin to the old
decorate-sort-undecorate approach sorting used. Rather than store the
values in the heap, you store `(priority, value)` tuples. But then
various unpleasant things can happen if two priorities are tied: in
that case tuple comparison falls back to comparing the values. But,
e.g., value comparison may be very expensive, or the values may not
support __lt__ at all. So instead you store `(priority,
unique_integer, value)` triples. And hope that you don't really need a
max-heap, lest it get even more obscure.

Using SortedContainers, none of those are issues. It's all
straightforward and obvious. If you have a list L always sorted by
priority, extracting from a "min queue" just requires L.pop(0), or
from a "max queue" L.pop(-1) (or L.pop() - -1 is the default, same as
for the Python list.pop()). No tricks are needed. Fine too if
sometimes you want to extract the min, while at other times the max.
Or, e.g., peek at the 5 highest-priority values and pick the one that
best fits resources available at the time.

In cases of computed priorities, the SortedKeyList constructor allows
specifying a function to be used to compute the keys used to determine
the list's ordering. Again straightforward and obvious.

    from sortedcontainers import SortedKeyList
    L = SortedKeyList([(i, j) for i in range(1, 5)
                              for j in range(1, 5)],
                      key=lambda t: t[0]/t[1])

>>> [t for t in L]
[(1, 4), (1, 3), (1, 2), (2, 4), (2, 3), (3, 4), (1, 1),
 (2, 2), (3, 3), (4, 4), (4, 3), (3, 2), (2, 1), (4, 2),
  (3, 1), (4, 1)]

That said, in most of my code I end up _removing_ uses of
SortedContainers, in favor of faster ways of getting the job done. The
package isn't the last resort for me, it's the first. I can write code
in straightforward ways that scale well from the start. If I need more
speed later, fine, I can puzzle out faster ways to get the task done.
If you're proud of that you _start_ by plotting ways to minimize the
number of sorts you do, you're coding in C++, not Python ;-)

So I suggest people are staring at the wrong end when asking for use
cases that can't be done without the package. Those are necessarily
non-trivial. Having a sorted collection is "more than good enough" for
many tasks. As above, e.g., there's no reason to imagine that Grant
Jenks had priority queues in mind at all, yet the flavors of sorted
lists he implemented are very easy to use for that task.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XXAXVV3KJ4ORQ7K6ELSS4R7S5725ASRE/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to