On Wed, Dec 9, 2009 at 8:42 AM, Pedro Ferreira <jose.pedro.ferre...@cern.ch> wrote: ... > We've modified > Jim's script
Cool. Storage iterators are so simple and allow a wide variety of analyses. > in order to find out which OIDs are being rewritten, and > how much space they are taking, and this is a fragment of it: > > OID class_name total_size percent_size n_pickles min_size avg_size max_size > '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683 > 1977885 2004241 2026518 > '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683 > 1616904 1635889 1651956 > '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522 > 20% 28513 418230 419315 420294 > '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238 > 307112 314379 320647 > '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238 > 190816 195216 199007 > '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3% > 1953 880615 884903 887285 > [...] > > As you can see, we have an OOBucket occupying more than 2MB (!) per > write. That's almost 17GB only considering the last 1M transactions of > the DB (we get ~3M transactions per week). We believe this bucket > belongs to some OOBTree-based index that we are using, whose values are > Python lists (maybe that was a bad choice to start with?). In any case, > how do OOBuckets work? Buckets themselves are essentially just sorted lists of key-value pairs. > Is it a simple key space segmentation strategy, In the case of BTrees, yes. I assume your OOBuckets are used within OOBTrees. (?) > or are the values taken into account as well? No. > Our theory is that an OOBTree simply divides the N keys in K buckets, > and doesn't care about the contents. Right. > So, since we are adding very large > lists as values, the tree remains unbalanced, No, they trees tend to stay fairly well balenced wrt keys. > and since new contents > will be added to this last bucket, Why would the contents only be added to one bucket? > each rewrite will imply the addition > of ~2MB to the file storage. That's definately a problem. > Will the replacement of these lists with a persistent structure such as > a PersistentList solve the issue? It might help, but if the lists are very large, you'll still have a problem because a persistent list is still stored in one database record. Jim -- Jim Fulton _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev