For huge inserts like that, have you looked at the more modern alternatives such as Tokyo Cabinet or MongoDB? I heard about an experiment to transfer 20 million text blobs into a Tokyo Cabinet. The first 10 million inserts were superfast but after that it started to take up to a second to insert each item. I'm not famililar with how good they are but I know they both have indexing. And I'm confident they both have good Python APIs. Or watch Bob Ippolitos PyCon 2009 talk on "Drop ACID".
2009/4/27 Hedley Roos <hedleyr...@gmail.com>: > I've followed this thread with interest since I have a Zope site with > tens of millions of entries in BTrees. It scales well, but it requires > many tricks to make it work. > > Roche Compaan wrote these great pieces on ZODB, Data.fs size and > scalability at > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes > and > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter > . > > My own in-house product is similar to GoogleAnalytics. I have to use a > cascading BTree structure (a btree of btrees of btrees) to handle the > volume. This is because BTrees do slow down the more items they > contain. This is not a ZODB limitation or flaw - it is just how they > work. > > My structure allows for fast inserts, but they also allow aggregation > of data. So if my lowest level of BTrees store hits for a particular > hour in time then the containing BTree always knows exactly how many > hits were made in a day. I update all parent BTrees as soon as an item > is inserted. The cost of this operation is O(1) for every parent. > These are all details but every single one influenced my design. > > What is important is that you cannot just use the ZCatalog to index > tens of millions of items since every index is a single BTree and will > thus suffer the larger it gets. So you must roll your own to fit your > problem domain. > > Data warehousing is probably a good idea as well. > > My problem domain allows me to defer inserts, so I have a queuerunner > that commits larger transactions in batches. This is better than lots > of small writes. This may of course not fit your model. > > Familiarize yourself with TreeSets and set operations in Python (union > etc.) since those tools form the backbone of catalogueing. > > Hedley > _______________________________________________ > Zope maillist - z...@zope.org > http://mail.zope.org/mailman/listinfo/zope > ** No cross posts or HTML encoding! ** > (Related lists - > http://mail.zope.org/mailman/listinfo/zope-announce > http://mail.zope.org/mailman/listinfo/zope-dev ) > -- Peter Bengtsson, work www.fry-it.com home www.peterbe.com hobby www.issuetrackerproduct.com _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )