"Raymond Hettinger" <[EMAIL PROTECTED]> wrote: > [Martin v. Löwis] > > This kind of invariant doesn't take into account > > that there might be a default value. > > Precisely. Therefore, a defaultdict subclass violates the Liskov > Substitution > Principle.
class defaultdict(dict): def __getitem__(self, key): try: return dict.__getitem__(self, key) except KeyError: return self.on_missing(key) def on_missing(self, key): if not hasattr(self, 'default') or not callable(self.default): raise KeyError, key r = self[key] = self.default() return r In my opinion, the above implementation as a subclass "does the right thing" in regards to __del__, __contains__, get, pop, popitem, __len__, has_key, and anything else I can think of. Does it violate the Liskov Substitution Principle? Yes, but only if user code relies on dd[key] raising a KeyError on a lack of a key. This can be easily remedied by removing the default when it is unneeded, at which point, you get your Liskov Substitution. > Of course, the __del__ followed __contains__ sequence is not the only > invariant > that is thrown-off. There are plenty of examples. Here's one that is > absolutely basic to the method's contract: > > k, v = dd.popitem() > assert k not in dd > > Any code that was expecting a dictionary and uses popitem() as a means of > looping over and consuming entries will fail. >>> a = defaultdict() >>> a.default = list >>> a['hello'] [] >>> k, v = a.popitem() >>> assert k not in a >>> Seems to work for the above implementation. > No one should kid themselves that a default dictionary is a drop-in > substitute. > Much of the dict's API has an ambiguous meaning when applied to defaultdicts. Actually, if one is careful, the dict's API is completely unchanged, except for direct access to the object via b = a[i]. >>> del a['hello'] Traceback (most recent call last): File "<stdin>", line 1, in ? KeyError: 'hello' >>> 'hello' in a False >>> a.get('hello') >>> a.pop('hello') Traceback (most recent call last): File "<stdin>", line 1, in ? KeyError: 'pop(): dictionary is empty' >>> a.popitem() Traceback (most recent call last): File "<stdin>", line 1, in ? KeyError: 'popitem(): dictionary is empty' >>> len(a) 0 >>> a.has_key('hello') False > If all keys are in-theory predefined, what is the meaning of len(dd)? It depends on the sequence of actions. Play around with the above defaultdict implementation. From what I understood of Guido's original post, this is essentially what he was proposing, only implemented in C. > Should dd.items() include any entries where the value is equal to the default > or > should the collection never store those? Yes, it should store any value which was stored via 'dd[k]=v', or any default value created via access by 'v=dd[k]' . > If the former, then how do you access > the entries without looping over the whole contents? Presumably one is looking for a single kind of default (empty list, 0, etc.) because one wanted to accumulate into them, similar to one of the following... for item, value in input: try: d[item] += value #or d[item].append(value) except KeyError: d[item] = value #or d[item] = [value] which becomes for item in input: dd[item] += 1 #or dd[item].append(value) Once accumulation has occurred, iteration over them via .iteritems(), .items(), .popitem(), etc., would progress exactly the same way as with a regular dictionary. If the code which is using the accumulated data does things like... for key in wanted_keys: try: value = dd[key] except KeyError: continue #do something nontrivial with value rather than... for key in wanted_keys: if key not in dd: continue value = dd[key] #do something nontrivial with value Then the user has at least three options to make it 'work right': 1. User can change to using 'in' to iterate rather than relying on a KeyError. 2. User could remember to remove the default. 3. User can create a copy of the default dictionary via dict(dd) and pass it into the code which relies on the non-defaulting dictionary. > If the latter, then do you > worry that "dd[v]=k" does not imply "(k,v) in dd.items()"? I personally wouldn't want the latter. My post probably hasn't convinced you, but much of the confusion, I believe, is based on Martin's original belief that 'k in dd' should always return true if there is a default. One can argue that way, but then you end up on the circular train of thought that gets you to "you can't do anything useful if that is the case, .popitem() doesn't work, len() is undefined, ...". Keep it simple, keep it sane. - Josiah _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com