I think Richard's point was two-fold: People usually don't want or need this kind of thing /except/ when they have some very specific performance requirements, in which case they probably want to also be very specific about the kind of container they are using rather than using an abstract "sorted container".

It is true that people sensitive to performance may really care about the way dict is implemented, but there are a great many uses for associative arrays in general. I knew about sortedcontainers and I also don't remember ever seeing a situation where I needed one or recommended its use. Maybe it would be useful in leetcode-style interview questions (which may by itself be a good reason to include it in the standard library, since a lot of those interviewers will let you use `collections.deque` or `collections.OrderedDict` without implementing any sort of linked listed, but they won't let you import a module that provides a trie or something)?

I'm fairly neutral on this proposal. I do think the standard library is a good place for these sorts of fundamental data structures (even abstract ones), but I do agree with Richard that they are pretty niche. In almost any situation where you'd want / need something like this, I feel like adding a PyPI dependency would not be a big deal.

One thing that I do think might be valuable is if the plan were to re-implement these as C extensions. Maintaining and distributing C extensions on PyPI is in many ways more difficult than bundling them directly into CPython. That said, doing so adds maintenance burden and implementation cost (and it's a different proposition than merely adopting an existing library), so I'm probably -0.0 on the whole thing — the absence of these types of containers in the standard library is not an obvious lack, and getting them from PyPI in the situations where you actually need them doesn't seem like a big deal, particularly when it's a pure Python impementation.

On 11/11/21 21:43, Steven D'Aprano wrote:
On Thu, Nov 11, 2021 at 11:01:32AM -0800, Richard Levasseur wrote:

Should the stdlib have e.g. SortedList? Probably not, because the use cases
of such data types are too niche to a one-size-fits-all implementation, and
there are too many implementations with too many of their own settings.
Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
because those are somewhat fundamental data structures and implementing,
testing, and verifying them is very much non-trivial. While niche, having
them at your immediate disposal is very powerful.
By that reasoning, we shouldn't have dicts, for exactly the same reason:
for anyone wanting an associative array, there are so many implementation
variants to choose from:

- hash table with linear probing
- hash table with chaining
- AVL tree
- red-black tree
- judy array
- splay tree
- treap
- scapegoat tree

and many more, and most of them can be tuned.

If Python didn't already have dict, your argument against SortedList
would equally apply to it: they are "fundamental data structures and
implementing, testing, and verifying them is very much non-trivial".

So if your argument is correct, that would imply that standardizing on
one single dict implementation, one that isn't even tunable by the
caller, was a mistake. We should have provided a dozen different hash
tables, trees and treaps.

But... we didn't, and I don't think that Python is a worse language
because we only have one associative array implementation in the stdlib.

Whenever I need an associative array, I don't lose sleep over whether I
could get 2% better performance for hits, at a cost of 17% worse
performance for misses, by swapping over to some other implementation. I
just reach for dict, knowing that it will almost always be good enough.


Last year, for fun, after wishing there was a SortedSet in the standard
lib, I ended up implementing a Red-Black Tree and BTree based sorted
dictionary/set[1]. After then trying to use them for my use case[2], I
found that, in order to fully and truly exploit their benefits, the basic
Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
would let me, e.g. binary search to a particular spot and then iterate, or
give me a range between two points, etc.
I believe that sortedcontainers.SortedSet provides that functionality
via the irange() method.

http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontainers.SortedList.irange

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HI3IW4XEANYTUOEWVRMH2WYZLGBSQEPD/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to