On 9/26/07, Mark Summerfield <[EMAIL PROTECTED]> wrote: > On 2007-09-26, [EMAIL PROTECTED] wrote: > > Mark> I have put a new version (incorporating another implementation > > Mark> idea from Paul Hankin) on PyPI: > > > > Mark> http://pypi.python.org/pypi/sorteddict > > > > From that: > > > > The main benefit of sorteddicts is that you never have to explicitly > > sort. > > > > Surely there must be something more than that. Wrapping sorted() around a > > keys() or values() call is a pretty trivial operation. I didn't see that > > the implementation saved anything. > > Assuming you have a good sorteddict implementation (i.e., based on a > balanced tree or a skiplist, not the one I've put up which is just > showing the API) you can gain significant performance benefits.
That depends very much on the use case, and in general I strongly doubt it. I haven't looked this up in Knuth, but I believe that in a sorted dict implementation, the best performance you can get for random access and random insertions is O(log N), which is always beat by the O(1) of a hash table. This translates in O(N log N) for inserting N elements into a sorted dict, vs. O(N) in a hash table. Sorted traversal is O(N) for the sorted dict and O(N log N) for the hash table. So in order to gain a "significant performance benefit" you'd have to have one pass of insertions and two traversals with a small number of insertions or deletions in between (otherwise the sorted result from the hash table could just be cached). I don't believe that this pattern is common enough, but I don't know your application. > For example, if you have a large dataset that you need to traverse quite > frequently in sorted order, calling sorted() each time will be expensive > compared to simply traversing an intrinsically sorted data structure. > > When I program in C++/Qt I use QMap (a sorteddict) very often; the STL > equivalent is called map. Both the Qt and STL libraries have dict > equivalents (QHash and unordered_map), but my impression is that the > sorted data structures are used far more frequently than the unsorted > versions. Perhaps out of ignorance? Or perhaps the hash implementations have suboptimal implementations? Or perhaps because no equivalent to sorted() exists? > If you primarily program in Python, using dict + sorted() is very > natural because they are built into the language. But using a sorted > data structure and never sorting is a very common practice in other > languages. Ah, now the real reason you want this so badly is finally clear: simply because you're more familiar with it. :-) Is the number of elements in a typical use case large enough that the performance difference even matters? -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com