Steven D'Aprano <[EMAIL PROTECTED]> writes: > I've never seen the point of a sorted dictionary, it's easy to just say: > for key, value in sorted(D.items())
You might want just a subrange of the dictionary (say the 100th through 150th items in sorted order) without having to sort the entire dictionary. Also, with the right data structures you can support fast non-mutating updates, i.e. e = d.updated(x=3) is equivalent to: e = dict((x,y) for x,y in d.iteritems()) # copy all entries from d e[x] = 3 # do the update but the sorted version can work in O(log n) time instead of O(n). The new dictionary shares most of its contents with the old dictionary, except now you can no longer mutate one without mutating the other, so better not do any in-place updates. Usage case (inspired by Happs State which was inspired by Prevayler): You are writing a web app that has to remember a lot of request data and you don't want to use an SQL database (too much overhead). Your app gets requests containing name/value pairs. It responds with the old value and then updates its stored value. I.e. your server process looks like: d = {} while True: name, new_value = get_request() old_value = d.get(name, None) d[name] = new_value send_response(old_value) This is very efficient (just in-memory dictionary access, no SQL garbage) but has any obvious problem: what if the computer crashes? You have to store the stuff on disk, too. So you put: log_to_disk(name, new_value) right after the get_request. If the computer crashes, reboot it and play back the log from the beginning. But that's no good either, the server could run for years nonstop and get billions of updates and you don't want to play them back from the beginning. You want to take a checkpoint now and then, so you can restart from there: d = {} while True: name, new_value = get_request() log_to_disk(name, new_value, timestamp()) if checkpoint_now(): # do this once an hour or so log_to_disk(pickle(d), timestamp()) old_value = d.get(name, None) d[name] = new_value send_response(old_value) Now to restore, you find the newest pickle, read it in, and then apply any logged updates with a newer timestamp, i.e. just the last hour's worth. But this is no good either, because d can have millions of items, and your server freezes while you're pickling and writing it. You want a separate thread doing that--except d is busily mutating while that operation is in progress! You can go crazy with locks, rollback buffers, and all the insanity that SQL servers use, or with the sorted immutable dictionary, you can simply write: d = frozendict() while True: name, new_value = get_request() log_to_disk(name, new_value, timestamp()) if checkpoint_now(): # do this once an hour or so in_new_thread(log_to_disk, (pickle(d), timestamp())) old_value = d.get(name, None) # look mom, no freeze-up d = d.updated(name, new_value) send_response(old_value) By making d.updated() create a new immutable dict that's separate from d, you now don't have to worry about locks or synchronization or any of that crap. When the pickler is done with the old d, its references are gone and the GC reclaims its storage. Of course you can do more elaborate checkpointing on subsets of the data etc. with the same methods. Really it's the presence of the O(log n) .updated() operation, rather than sorting, that makes this type of structure interesting, but the usual implementation involves balanced tree structures so the sorting comes naturally. -- http://mail.python.org/mailman/listinfo/python-list