For huge inserts like that, have you looked at the more modern
alternatives such as Tokyo Cabinet or MongoDB?
I heard about an experiment to transfer 20 million text blobs into a
Tokyo Cabinet. The first 10 million inserts were superfast but after
that it started to take up to a second to insert each item.
I'm not famililar with how good they are but I know they both have
indexing. And I'm confident they both have good Python APIs.
Or watch Bob Ippolitos PyCon 2009 talk on "Drop ACID".

2009/4/27 Hedley Roos <>:
> I've followed this thread with interest since I have a Zope site with
> tens of millions of entries in BTrees. It scales well, but it requires
> many tricks to make it work.
> Roche Compaan wrote these great pieces on ZODB, Data.fs size and
> scalability at 
> and 
> .
> My own in-house product is similar to GoogleAnalytics. I have to use a
> cascading BTree structure (a btree of btrees of btrees) to handle the
> volume. This is because BTrees do slow down the more items they
> contain. This is not a ZODB limitation or flaw - it is just how they
> work.
> My structure allows for fast inserts, but they also allow aggregation
> of data. So if my lowest level of BTrees store hits for a particular
> hour in time then the containing BTree always knows exactly how many
> hits were made in a day. I update all parent BTrees as soon as an item
> is inserted. The cost of this operation is O(1) for every parent.
> These are all details but every single one influenced my design.
> What is important is that you cannot just use the ZCatalog to index
> tens of millions of items since every index is a single BTree and will
> thus suffer the larger it gets. So you must roll your own to fit your
> problem domain.
> Data warehousing is probably a good idea as well.
> My problem domain allows me to defer inserts, so I have a queuerunner
> that commits larger transactions in batches. This is better than lots
> of small writes. This may of course not fit your model.
> Familiarize yourself with TreeSets and set operations in Python (union
> etc.) since those tools form the backbone of catalogueing.
> Hedley
> _______________________________________________
> Zope maillist  -
> **   No cross posts or HTML encoding!  **
> (Related lists -
> )

Peter Bengtsson,
Zope maillist  -
**   No cross posts or HTML encoding!  **
(Related lists - )

Reply via email to