Yikes. I wonder if this overhead comes from Vocabulary updates... thanks
very much for doing this test.
Clearly we need to pin it down. This is very disappointing. :-( Any
further info you dig up is appreciated.
You didn't have any metadata stuff set up, did you? I imagine even if you
did, that they couldn't possibly account for 200K worth of extra stuff.
----- Original Message -----
From: "abel deuring" <[EMAIL PROTECTED]>
To: "Giovanni Maruzzelli" <[EMAIL PROTECTED]>
Cc: "Chris McDonough" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, June 26, 2001 2:40 PM
Subject: Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a
> Hi all,
> Giovanni Maruzzelli wrote:
> > We think that Abel is absolutely right:
> > if in the same almost empty folder we add and catalog an object with one
> > word (and now we have optimized and reduced the number of indexes to 11)
> > make a transaction of 73K, while if the object contains 300 words with
> > same other indexes or properties, the transaction is 224K, and if all is
> > same but the object contains 535 words the transaction is 331K.
> > And we are using now a catalog with only some 3000 document indexed with
> > medium lenght of each document around 1K.
> Well, Chris certainly knows more about the internals of ZCatalog than I
> do, so we should not ignore his comments to my mail :)
> Chris McDonough wrote:
> > > If you now add a new document containing 5 of these frequent words, 5
> > > larger BTrees will be updated. [Chris, let me know, if I'm now going
> > > tell nonsense...] I assume that the entire updated BTrees = 120000
> > > will be appended to the ZODB (ignoring the less frequent words) --
> > > if the document contains only 1 kB text.
> > Nah... I don't think so. At least I hope not! Each bucket in a BTree
> > is a separate persistent object. So only the sum of the data in the
> > updated buckets will be appended to the ZODB. So if you add an item to
> > a BTree, you don't add 24000+ bytes for each update. You just add the
> > amount of space taken up by the bucket... unfortunately I don't know
> > exactly how much this is, but I'd imagine it's pretty close to the
> > datasize with only a little overhead.
> OK, this made me curious, so I made test similar to the one by Giovanni.
> I started with a ZCatalog containing 21616 records; the catalog contains
> only one text index, no keyword index, no field index. I copied one of
> the indexed documents; the text is 2645 bytes long; wc tells me that it
> has 313 words. Next, I packed the data base in order to have a "clean
> start point". After packing, Data.fs has a size of 233661963 byte.
> Then I cataloged the new object using my "lazy catalog". Since I have
> only one new document, this is basically the same as using
> CatalogAwareness. After indexing, the data base has grown to 233851090
> bytes -- an increase of 189127 bytes. Then I packed the data base again,
> resulting in a size of 233666237 bytes.
> So the "net increase" is indeed 233666237-233661963 = 4274 bytes, as you
> expected, but obviously a few more data base records need to be updated.
> Zope-Dev maillist - [EMAIL PROTECTED]
> ** No cross posts or HTML encoding! **
> (Related lists -
> http://lists.zope.org/mailman/listinfo/zope )
Zope-Dev maillist - [EMAIL PROTECTED]
** No cross posts or HTML encoding! **
(Related lists -