I was already investigating the possibility to split the RelStorage
packing process up into smaller chunks.

Due to the expected load on the Oracle cluster during a pack, we'll
have to run the pack at night and want to be absolutely certain that
database is ready for normal site operations again the next day. With
a 40+GB database (hasn't been packed for it's entire run, more than 2
years now) we are not confident packing will be done in one night.

To at least get a handle on how much work the packing is going to be,
and to have a nice stopping point, I looked at splitting pre-pack and
pack operations out into two separate steps. To my delight I saw that
the 1.5.0 beta already implements basically running only the pre-pack
phase (the --dry-run option). From there I created the attached patch,
one that renames the dry-run op into a 'prepack only' option, and adds
another option to skip the pre-pack and just use whatever is present
in the pack tables.

I haven't yet actually run this code, but the change isn't big. I
didn't find any relevant tests to update. Anyone want to venture some

Helge Tesdal and I also looked into the pack operation itself, and how
it uses a duty cycle to give other transactions a chance to commit
during pack. We think there might be a better pattern to handle the

Currently, with the default values, the pack operation will hold the
commit lock for 5 seconds, pack, then release the lock for 5 more
seconds, repeating until done. With various options you can alter
these timings, but the basic principle is the same. For Oracle, where
the commit lock has a time-out, this means that packing can fail
because the commit lock times out. For all backends, Oracle or
otherwise, commits elsewhere on a site cluster will have to wait long
periods of time before they can proceed, leading to severe delays on a
heavily trafficked website.

With the variable time-out for requesting a commit lock on Oracle
however, there is a different option. I do not know if MySQL and
Postgres can support this too, I haven't looked into their lock
acquisition options, but the following relies on lock acquisition

Consider the following packing algorithm:

 * Use a short timeout (say 1 second) to request the commit lock.
 * If it doesn't time out:
    * run one batch update cycle (up to 100 transactions processed).
    * optionally clean out associated blobs
    * unlock
    * loop back up
 * If it does time out:
    * commit lock is busy, so back off by sleeping a bit
    * loop back up

By timing out the lock request quickly, you give commits from
non-packing zope transactions right of way. Packing truly becomes a
non-intrusive background operation. Is this a viable scenario?

Martijn Pieters

Attachment: twophasepack.patch
Description: Binary data

For more information about ZODB, see the ZODB Wiki:

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to