We have a large dataset of 650,000+ records that I'd like to examine easily in Python. I have figured out how to put this into a ZODB file that totals 4 GB in size. But I'm new to ZODB and very large databases, and have a few questions.

1. The data is in a IOBTree so I can access each item once I know the key, but to get the list of keys I tried:

scores = root['scores']
ids = [id for id in scores.iterkeys()]

This seems to require the entire tree to be loaded into memory which takes more RAM than I have.

Does your record class inherit from persistent.Persistent? 650k integers + object pointers should only be of the order 10 Mb or so. It sounds to me like the record data is being stored in the btrees bucket directly.

Something like this should lead to smaller bucket objects where the record data is only loaded when you access the values of the btree:

>>> from BTrees.IOBTree import IOBTree
>>> bt = IOBTree()
>>> from persistent import Persistent
>>> class Record(Persistent):
...     def __init__(self, data):
...         super(Record, self).__init__()
...         self.data = data
>>> rec = Record("my really long string data")
>>> bt[1] = rec


For more information about ZODB, see the ZODB Wiki:

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to