Laurence Rowe wrote:
It's helpful to post your responses to the mailing list, that way when
someone else has a similar problem in the future they'll be able to
find the information.

Inheriting from Persistent is also necessary to control the
granularity of the database. Persistent objects are saved as separate
`records` by ZODB. Other objects do not have a _p_oid attribute and
have to be saved as part of their parent record.

I made the changes yesterday and there was a huge benefit. The original method was all entries were simple Python dictionaries and they were values of a IOBTree. The only change I made was from

scores[article['key']] = article


scores[article['key']] = PersistentMapping(article)

(where scores is the IOBTree).

My cache size is 1000 items, and after every 10000 I commit the transaction, clear the caches, and garbage collect. At the end I pack the database to drop the history.

I'm dealing with a 20GB XML file with 670000+ entries. The original version too about 2 1/4 days to run. The new version, about 6 1/2 hours. The dict version behaves as O(N^2) (or worse), the PersistentMapping is a steady O(N). The dict version is slightly faster for less than 100,000 items, but only about 10 minutes or so.

The RAM usage for the dictionary version slowly increased to about 18 GB, while the PersistentMapping version stayed nearly constant, slowly increasing from 646 MB at 10000 records to 803 MB. (These numbers include the Python interpreter and everything else in the process.)

The final, packed versions are roughly the same size (4.24 GB for the dict version, 4.29 GB for the PersistentMapping). A greater gain is seen in the history; the old, pre-packing size is 91 GB for the dict, versus 4.6 GB for the PersistentMapping.

Most importantly, I can open up the database and do simple things like get the number of entries and all the ids much quicker and with little memory usage.

Thanks for the help.

Now, my next step is to figure out how to best index this, for which I plan to use zc.catalog. Its SetIndex seems to be best for my situation.

For more information about ZODB, see the ZODB Wiki:

ZODB-Dev mailing list  -

Reply via email to