Currently, when a thread loads a non-ghost into its object cache, its straight from being unpickled. That means that if two threads load the exact same object, any (immutable) string contained in the object state will be allocated for in duplicate (or in general, on the count of the active threads).
If instead, all unpickled strings were made canonical via a weak dictionary, there would be only one copy in memory, no matter the thread count, e.g.: string = weak_string_map.setdefault(string, string) If the returned string was a different (canonical) copy, the duplicate would immediately be ready for garbage collection. This is a real win in memory savings. Using Plone, I experimented with the approach by using the Python pickle implementation and interning all byte strings (using ``intern``) directly in the unpickle routine to the same effect: def load_binstring(self): len = mloads('i' + self.read(4)) string = self.read(len) interned = intern(string) # (sic) self.append(interned) With 20 active threads, each having rendered the Plone 4 front page, this approach reduced the memory usage with 70 MB. Note that unicode strings aren't internable (but the alternative technique of using a weak mapping should work fine). In a long-running operation, dirty objects should be invalidated after the transaction, to prevent future data redundancy. For an implementation one needs to have a hook to use a special reconstructor function for strings. Currently there is a technical impediment in that BTrees and Persistent objects have their own internal way to save strings. In my experiments, the ``persistent_id`` function was not called for string objects (which is a different behavior than the regular cPickle.Pickler.dump has). \malthe _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev