Dan Schult <dsch...@colgate.edu> added the comment: On Apr 11, 2009, at 8:15 AM, Martin v. Löwis <rep...@bugs.python.org>@psf.upfronthosting.co.za @psf.upfronthosting.co.za> wrote: > Martin v. Löwis <mar...@v.loewis.de> added the comment: > >> By the way, defaultdict is NOT like setdefault--it is like get(). >> Missing entries do no get set. > > Why do you say that? > > __missing__(...) > __missing__(key) # Called by __getitem__ for missing key; > pseudo-code: > if self.default_factory is None: raise KeyError((key,)) > self[key] = value = self.default_factory() > return value > > In all cases of setdefault that I know of, replacing this with > a defaultdict would be appropriate. The only case where it wouldn't > work is if the default value depends on the key. The missing key reports the default object, but doesn't set that key or value object in the dict. So you cannot then call up that object and do something that depends on the key. So the default value may not depend on the key but still need to be different for each key due to later changes.
For example, to represent a graph structure, it is helpful to use a dict-of-dicts. The first time a node is looked up you want to add it as a key in the graph dict with an empty "nbr dict" as the value. the "nbr dict" is keyed by neighbor to the edge weight/object. So the default object is always a fresh nbr dict... but what gets added to that nbr dict depends on which node it is. I can come up with examples for lists too... But as you pointed out before, using a list or dict in setdefault requires creating the object before the call anyway... I'm beginning to question whether setdefault should ever be used... Still comes back to--why have it there inefficiently. Besides I'm still interested in being able to do this sort of thing at least through the C-API... Raymond Hettinger <rhettin...@users.sourceforge.net> added the comment: >> from my perspective creating an internal SetItem adds another >> function handling the data structure just as setdefault would > Incorrect comparison. Your in-lining manipulated the ep structure > directly (not a good thing). In contrast, adding an alternative > _PyDict_SetItemWithHash uses the insertdict() function, fully > isolating > itself of from the table implementation. The whole purpose of having > insertdict() and lookdict() is to isolate the data structure internals > from all externally visible functions. Good... I am understanding better what you mean by "handling the data structure". I agree that lookdict, insertdict and resizedict should be the three that change the data structure. But the way those 3 are currently written, you can't change an entry without looking it up. insertdict_clean almost does it, but it assumes there aren't any dummy entries. > The double call to a very simple user defined __hash__ adds .07 to > call > that takes on .05 with the double call to builtin object hash. So, we > could save half of the the .07 just by elimating the double call to > __hash__. With more complex hash functions the overall speedup is > huge > (essentially cutting the total work almost in half). This is a good example because it shows how the hash can matter. I think there are similar examples where the lookup is most of the time of dict access. I don't know enough about what causes collisions and how dicts are optimized to quickly come up with such an example. But such an example must exist or Tim Peters wouldn't have spent so much effort optimizing lookup.... besides his comments at the top of lookup suggest that we do exactly what I am suggesting to do. Get the entry from lookup and then "the caller can (if it wishes) add the <key, value> pair to the returned PyDictEntry*. Presently there is no way to do this with C-API except to write my own data structure manipulation. Wouldn't it be better to encapsulate this in the C-API in a standard way instead of having everybody writing their own C-extension doing setdefault the right way? (ok..ok.. "everybody" here refers to probably one person (me)... but others have thought of it. I've even seen the double lookup as a justification for never using setdefault) I'll look for an example that has lots of collisions. Maybe that would help justify a lookup-less insertdict. Dan ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5730> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com