On Wed, Apr 28, 2010 at 7:59 AM, Hanno Schlichting <ha...@hannosch.eu> wrote:
> The ZODB currently uses a hardcoded pickle protocol one. There's both
> the more efficient protocol two and in Python 3 protocol 3. Protocol
> two has seen various improvements in recent Python versions, triggered
> by its use in memcached.
> I'd be interested to work on changing the protocol. How should I approach
Do you know of specific benefits you expect from protocol 2? Any
you think it would be better in practice?
I've avoided going to protocol 2 for two reasons:
- It wasn't clear we'd get a benefit without deeper changes.
Those deeper changed might be of value, but only if we're
careful about how we make them.
In particular, we could replace class names in pickles
if we has a registry mapping ints to class names.
This could provide a number of benefits beyond
smaller pickles, but it needs some thought to get right.
- I want zope.xmlpickle to work with ZODB database records and
it doesn't support protocol 2 yet. This doesn't have to block
moving to protocol 2, but I really would like to have this work
> I can see three general approaches:
> 1. Hardcode the version to 2 in all places, instead of one.
> Pros: Easy to do, backwards compatible with all supported Python versions
> Cons: Still inflexible
> 2. Make the protocol version configurable
> Pros: Give control to the user, one could change the protocol used for
> storages or persistent caches independently
> Cons: More overhead, different protocol versions could have different bugs
> 3. Make the format configurable
> Shane made a proposal in this direction at some point. This would
> abstract the persistent format and allow for different serialization
> formats. As part of this one could also have different Pickle/Protocol
> Pros: Lots of flexibility, it might be possible to access the data
> from different languages
> Cons: Even more overhead
> If I am to look into any of these options, which one should I look
> into? Option 1 is obviously the easiest and I made a branch for this
> at some point already. I'm not particularly interested in option 3
> myself, as I haven't had the use-case.
I'm skeptical that there would be enough benefit for protocol 2 without
implementing a registry to take advantage of integer pickle codes.
The other benefit of protocol 2 has to do with the way instance pickles are
constructed and, for persistent objects, ZODB takes a very different
I suggest doing some realistic experiments to look at the impact of the
- Convert an interesting Zope 2 database from protocol 1 to protocol 2.
How does this affect database size?
- Do some sort of write and read benchmarks using the 2 protocols to
see if there's a meaningful benefit.
For the above, this doesn't include a class registry, since I don't think
you're proposing that.
BTW, I have almost no interest in a custom non-pickle protocol.
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org