I started a new packing script for Relstorage (history free, postgresql). It is based on incoming reference counting. Why do we need this? First a bit story telling (fast forward down to section "new script" if you're bored)

The Problem

So we had a database with ~298.000.000 objects stored, grown over 4-5 months. The beast was never packed. We import/sync data on a weekly base: A complex Plone based product catalog optimized for web in 27 languages each ~18.000 products, each with several variants (stored in annotated btrees). Lots of stuff is deleted and recreated on import. So we had a lot of garbage in DB.

We're using history free Relstorage. Relstorage performs much better on concurrent writes and speeds up our import a lot. First on MySQL, now on PostgreSQL. Even if PostgreSQL does not solve all problems it seems to be the stable choice. At least for one time we had a frozen server. We blame MySQL, but well, its a CentOS and there are several other problems injecting side-effects (old unpatched kernel, etc, pp, ... we're dev not op, its enterprise level and decisions made in past are our constraints, carved in stone, more to say?)

Anyway: Our Plone with ZODB Relstorage performs very very well - with this many objects stored including all the garbage!

BUT...we want to get rid of the garbage left on every import.

Relstorage deploys with a packaging script. Running it on the ~300 million objects resulted in RAM-consumption >20GB with 300 days estimated time of completion (we did not try if this is true). Phew.

So something had to be done. Storage usage grew about 15-20GB per week. As a quick (took some days) fix we decided to try classical Data.fs packing (zeopack).

We blocked content managers from editing and stopped imports and cloned a snapshot of the virtual maschine with the DB.
At this point we had roughly 160GB in Postgres.
On the clone we converted the Relstorage to a Filesystem Storage. This took 40 hours.
Storage shrinked to a 55GB in size Data.fs
Then we started the classical packing and after some hours the DB was shrunken to 7.8 GB Data.fs still containing 44 million objects. Next we converted the Data.fs back to Relstorage (~6 hours) and switched live over to the packed DB.

Packing on the 44 million object relstorage with the default script shall take several days (3-4 days) only for the prepacking phase. It consumes less memory (~5GB) but is still insane.

New Script

After analyzing the current (generic - mysql, pgsql, oracle - history and history free) way to pack the DB I found no good way to optimize the code here. Other were there before me.

Trial 1:

So I tried to rethink the way packing is done and tried to follow the copy-and-switch method the filestorage uses: Copy all in-use zoids from object_state to a new table (traverse the graph starting at zoid 0), drop the old object_state, rename the new table to object_state. The copy process can be distributed. Several workers can do it in parallel until the database is at it max transactions/second.

I had this working. 4-6 workers running, a queue with zoids to process in DB, a master controlling the process and so on. But even with this approach the total time for a pack of the whole DB is after SQL optimizations(!) roughly 8 days for a 44 million object graph.

OK - Back to the drawing board.

Trial 2:

On thursday evening I had a good conservation (and some beer) with a buddy from the local linux user group and we discussed the problem. The outcome was to try it with reference counting.

The idea is simple:

- iterate over all transactions starting with the lowest
  transaction id (tid)
- for each transaction load the object states connected with tid
- for each state fetch its outgoing references and fill a table where
  all incoming references of an object are stored as an array.
  if an state has no references write it anyway to the table with empty
  outgoing references

After reaching the highest tid its easy: all entries with no incoming references are garbage. Delete them and remove all incoming references of this entry from other entries.

When new transactions are happening its enough to only process delta-transactions (newer than the last processed in the reference counting table). This is pretty fast if the delta small, i.e. one day only. On the deltas another check need to detect references gone meanwhile, but since the delta isnt that big this is probably not a problem.

I implemented this roughly now. Building the incoming reference counting table from the scratch takes less than 7 hours. Deleting itself is not implemented yet, but this is not that difficult :-)


The current code is at:

Did I miss something? Any opinions much appreciated!

Expect updates in this thread :)

Jens Klein
Klein & Partner KG, member of BlueDynamics Alliance

For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to