I tried to analyze the overhead of changing content in Plone a bit. It
turns out we write back a lot of persistent objects to the database,
even tough the actual values of these objects haven't changed.

Digging deeper I tried to understand what happens here:

1. persistent.__setattr__ will always set _p_changed to True and thus
cause the object to be written back
2. Some BTree buckets define the "VALUE_SAME" macro. If the macro is
available and the new value is the same as the old, the change is
3. The VALUE_SAME macro is only defined for the int, long and float
value variants but not the object based ones
4. All code in Products.ZCatalog does explicit comparisons of the old
and new value and ignores non-value-changes. I haven't seen any other
code doing this.

I'm assuming doing a general check for "old == new" is not safe, as it
might not be implemented correctly for all objects and doing the
comparison might be expensive.

But I'm still curious if we could do something about this. Some ideas:

1. Encourage everyone to do the old == new check in all application
code before setting attributes on persistent objects.

Pros: This works today, you know what type of values you are dealing
with and can be certain when to apply this, you might be able to avoid
some computation if you store multiple values based on the same input
Cons: It clutters all code

2. Create new persistent base classes which do the checking in their
__setattr__ methods

Pros: A lot less cluttering in the application code
Cons: All applications would need to use the new base classes.
Developers might not understand the difference between the variants
and use the "checking" versions, even though they store data which
isn't cheap to compare

2.a. Create new base classes and do type checking for built-in types

Pros: Safer to use than always doing value comparisons
Cons: Still separate base classes and overhead of doing type checks

3. Compare object state at the level of the pickled binary data

This would need to work at the level of the ZODB connection. When
doing savepoints or commits, the registered objects flagged as
_p_changed would be checked before being added to the modified list.
In order to do this, we need to get the old value of the object,
either by loading it again from the database or by keeping a cache of
the non-modified state of all objects. The latter could be done in
persistent.__setattr__, where we add the pristine state of an object
into a separate cache before doing any changes to it. This probably
should be a cache with an upper limit, so we avoid running out of
memory for connections that change a lot of objects. The cache would
only need to hold the binary data and not unpickle it.

Pros: On the level of the binary data, the comparisons is rather cheap
and safe to do
Cons: We either add more database reads or complex change tracking,
the change tracking would require more memory for keeping a copy of
the pristine object. Interactions with ghosted objects and the new
cache could be fragile.

4. Compare the binary data on the server side

Pros: We can get to the old state rather quickly and only need to deal
with binary string data
Cons: We make all write operations slower, by adding additional read
overhead. Especially those which really do change data. This won't
work on RelStorage. We only safe disk space and cache invalidations,
but still do the bulk of the work and sent data over the network.

I probably missed some approaches here. None of the approaches feels
like a good solution to me. Doing it server side (4) is a bad idea in
my book. Option 3 seems to be the most transparent and safe version,
but is also the most complicated to write with all interactions to
other caches. It's also not clear what additional responsibilities
this would introduce for subclasses of persistent which overwrite
various hooks.

Maybe option one is the easiest here, but it would need some
documentation about this being a best practice. Until now I didn't
realize the implications of setting attributes to unchanged values.

For more information about ZODB, see the ZODB Wiki:

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to