Despite this change there are still a huge amount
of unexplained calls to the 'persistent_id' method of the ObjectWriter
in serialize.py.

Why 'unexplained'? 'persistent_id' is called from the Pickler instance
being used in ObjectWriter._dump(). It is called for each and every
single object reachable from the main object, due to the way Pickler
works (I believe). Maybe persistent_id can be analysed and optimized
for the most common cases?
Note that there is a undocumented feature in cPickle that I added years ago to deal with this issue but never got around to pursuing. Maybe someone else would be able to spend the time to try it out and report back. If you set inst_persistent_id, rather than persistent_id, on a pickler, then the hook will only be called for instances. This should eliminate that vast majority of the calls. Note that this feature was added back when testing was minimal or non-existent, so it is untested, however, the implementation is simple enough. :)

Do you mean that the ZODB has enough tests now that making the change and running the tests might already be a good proof ?

No, I mean that pickle and cPickle lack tests for this feature.

Or should we be more prudent ?

It would be nice to try this out with ZODB to see if it makes much difference. If it does, then that would provide extra motivation for me to add the missing test.

Roché Compaan said he would try it out, but I just realized that he might have been waiting for me.

Laurent (cced) tried it today and it seems it does make a difference.

Our benchmark is running this night with bigger amount of content.

We will be back with results tomorrow.

We can measure some benefit.

For tests on a ZODB prefilled with 100k instances of an archetypes class,

update of an instance : 12% improval
insert of an instance : 15% improval

If it would,

What do you mean by 'If it would' ?

If we can measure a benefit.


