New submission from Jake McGuire <j...@youtube.com>: Instance attribute names are normally interned - this is done in PyObject_SetAttr (among other places). Unpickling (in pickle and cPickle) directly updates __dict__ on the instance object. This bypasses the interning so you end up with many copies of the strings representing your attribute names, which wastes a lot of space, both in RAM and in pickles of sequences of objects created from pickles. Note that the native python memcached client uses pickle to serialize objects.
>>> import pickle >>> class C(object): ... def __init__(self, x): ... self.long_attribute_name = x ... >>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None), pickle.HIGHEST_PROTOCOL)) for i in range(100)], pickle.HIGHEST_PROTOCOL)) 3658 >>> len(pickle.dumps([C(None) for i in range(100)], pickle.HIGHEST_PROTOCOL)) 1441 >>> Interning the strings on unpickling makes the pickles smaller, and at least for cPickle actually makes unpickling sequences of many objects slightly faster. I have included proposed patches to cPickle.c and pickle.py, and would appreciate any feedback. ---------- components: Library (Lib) files: cPickle.c.diff keywords: patch messages: 80670 nosy: jakemcguire severity: normal status: open title: unpickling does not intern attribute names type: resource usage Added file: http://bugs.python.org/file12879/cPickle.c.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5084> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com