[Dieter Maurer]
The newest pickle formats can also handle the class references
is bit more efficiently -- at least when a single transaction
modifies many objects of the same class.


[Chris Withers]
I know ZC was involved in the work to introduce these new pickle
formats, but are they actually used in ZODB yet?

[Dieter]
I think the optimization I refered to (if you have several instances
of the same class in a pickle, then they all share a single
instance of the class name and reference this one) is already
used.

I'm not sure what you have in mind there.  It's always been true that
pickle was _able_ to reuse common bits, but this is effectively
disabled in ZODB on a cross-persistent-object basis by (from a recent
serialize.py):

   def _dump(self, classmeta, state):
       # To reuse the existing cStringIO object, we must reset
       # the file position to 0 and truncate the file after the
       # new pickle is written.
       self._file.seek(0)
       self._p.clear_memo()
       self._p.dump(classmeta)
       self._p.dump(state)
       self._file.truncate()
       return self._file.getvalue()

The "self._p.clear_memo()" there makes the pickler forget everything
it's done, so that the pickle for a persistent object is
self-contained.

For example, if you store an OOBTree whose internal state contans 100
OOBuckets, the string "BTrees._OOBucket" appears 100 times in the data
record, and string "OOBTree" even more.  Jeremy once analyzed a
customer Data.fs and incidentally discovered that about half the space
was consumed by repetitions of such BTree-related strings; no idea
whether that's typical, although I wouldn't be surprised if it were.

An entirely new gimmick was introduced in pickle protocol 2, the
"extension registry" described in PEP 307:

   http://www.python.org/dev/peps/pep-0307/

That _allows_ an application to register "popular" module and class
string names that pickles can reference later via teensy 2- or 3-byte
(independent of string length) opcodes.  In effect, such strings are
stored in the _implementation_ of pickle instead of inside pickles.

AFAIK, nobody anywhere has used this yet, outside of Python's test
suite.  It was intended to be a simple, cheap approach to cutting
pickle bloat for apps motivated enough to set up the registry.  You'll
note that half the one-byte codes are reserved for Zope :-)

You mean an optimization to make the pickle size for some
new style classes smaller. That's not yet used because it could
make the storage exchange between different Python versions impossible
(the older Python versions would not understand the new pickle protocol).

The need for protocol 2 is also why the extension registry (above)
can't be used so long as older Pythons are in the mix.
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to