I would like to get everyone's thoughts on two new dictionary methods: def count(self, value, qty=1): try: self[key] += qty except KeyError: self[key] = qty
def appendlist(self, key, *values): try: self[key].extend(values) except KeyError: self[key] = list(values) The rationale is to replace the awkward and slow existing idioms for dictionary based accumulation: d[key] = d.get(key, 0) + qty d.setdefault(key, []).extend(values) In simplest form, those two statements would now be coded more readably as: d.count(key) d.appendlist(key, value) In their multi-value forms, they would now be coded as: d.count(key, qty) d.appendlist(key, *values) The error messages returned by the new methods are the same as those returned by the existing idioms. The get() method would continue to exist because it is useful for applications other than accumulation. The setdefault() method would continue to exist but would likely not make it into Py3.0. PROBLEMS BEING SOLVED --------------------- The readability issues with the existing constructs are: * They are awkward to teach, create, read, and review. * Their wording tends to hide the real meaning (accumulation). * The meaning of setdefault() 's method name is not self-evident. The performance issues with the existing constructs are: * They translate into many opcodes which slows them considerably. * The get() idiom requires two dictionary lookups of the same key. * The setdefault() idiom instantiates a new, empty list prior to every call. * That new list is often not needed and is immediately discarded. * The setdefault() idiom requires an attribute lookup for extend/append. * The setdefault() idiom makes two function calls. The latter issues are evident from a disassembly: >>> dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec')) 1 0 LOAD_NAME 0 (d) 3 LOAD_ATTR 1 (get) 6 LOAD_NAME 2 (key) 9 LOAD_CONST 0 (0) 12 CALL_FUNCTION 2 15 LOAD_NAME 3 (qty) 18 BINARY_ADD 19 LOAD_NAME 0 (d) 22 LOAD_NAME 2 (key) 25 STORE_SUBSCR 26 LOAD_CONST 1 (None) 29 RETURN_VALUE >>> dis(compile('d.setdefault(key, []).extend(values)', '', 'exec')) 1 0 LOAD_NAME 0 (d) 3 LOAD_ATTR 1 (setdefault) 6 LOAD_NAME 2 (key) 9 BUILD_LIST 0 12 CALL_FUNCTION 2 15 LOAD_ATTR 3 (extend) 18 LOAD_NAME 4 (values) 21 CALL_FUNCTION 1 24 POP_TOP 25 LOAD_CONST 0 (None) 28 RETURN_VALUE In contrast, the proposed methods use only a single attribute lookup and function call, they use only one dictionary lookup, they use very few opcodes, and they directly access the accumulation functions, PyNumber_Add() or PyList_Append(). IOW, the performance improvement matches the readability improvement. ISSUES ------ The proposed names could possibly be improved (perhaps tally() is more active and clear than count()). The appendlist() method is not as versatile as setdefault() which can be used with other object types (perhaps for creating dictionaries of dictionaries). However, most uses I've seen are with lists. For other uses, plain Python code suffices in terms of speed, clarity, and avoiding unnecessary instantiation of empty containers: if key not in d: d.key = {subkey:value} else: d[key][subkey] = value Raymond Hettinger -- http://mail.python.org/mailman/listinfo/python-list