Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-29 Thread Giovanni Maruzzelli

The Zope version we use contains the new btree catalog by default.

So, when we recreated the catalog from scratch, it was created as a btree
catalog.

The traces that you saw comes from the new catalog (the btree one).

-giovanni
- Original Message -
From: "Chris Withers" <[EMAIL PROTECTED]>
To: "Giovanni Maruzzelli" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; "Chris McDonough"
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, June 28, 2001 6:27 PM
Subject: Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a
solution?)


> Giovanni Maruzzelli wrote:
> >
> > The catalog is a pristine 2.3.3b1 catalog.
>
> I'm sure that'll need upgrading then...
>
> > We have recreated the catalog from scratch because we tried
> > manage_convertBTrees , but it don't work for us, it return with an error
> > (and the same happens with 2.3.3 final):
> >
> > Error Type: TypeError
> > Error Value: second argument must be a class
>
> Weird... from your earlier posting it looked like you _had_ successfully
> upgraded and updated (BTrees.IOBTree in your traceback rather than
> IOBTree.IOBTree)
>
> cheers,
>
> Chris


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-26 Thread Giovanni Maruzzelli

We think that Abel is absolutely right:

if in the same almost empty folder we add and catalog an object with one
word (and now we have optimized and reduced the number of indexes to 11) it
make a transaction of 73K, while if the object contains 300 words with the
same other indexes or properties, the transaction is 224K, and if all is the
same but the object contains 535 words the transaction is 331K.

And we are using now a catalog with only some 3000 document indexed with a
medium lenght of each document around 1K.

-giovanni

> Well, I'm not very familiar with the details about the sub-object
> management of ObjectManager and friends. Moreover, I had yet a closer
> look only into UnTextIndex, but not into UnIndex or UnKeywordIndex. So
> take my comments with a grain of salt.
>
> A text index (class SearchIndex.UnTextIndex) is definetely is a cause of
> bloating, if you use CatalogAware objects. An UnTextIndex maintains for
> each word a list of documents, where this word appears. So, if a
> document to be indexed contains, say, 100 words, 100 IIBTrees
> (containing mappings documentId -> word score) will be updated. (see
> UnTextIndex.insertForwardIndexEntry) If you have a larger number of
> documents, these mappings may be quite large: Assume 10.000 documents,
> and assume that you have 10 words which appear in 30% of all documents.
> Hence, each of the IIBTrees for these words contains 3000 entries. (Ok,
> one can try to keep this number of frequent words low by using a "good"
> stop word list, but at least for German, such a list is quite difficult
> to build. And one can argue that many "not too really frequent" words
> should be indexed in order to allow more precise phrase searches)I don't
> know the details, how data is stored inside the BTress, so I can give
> only a rough estimate of the memory requirements: With 32 bit integers,
> we have at least 8 bytes per IIBTree entry (documentId and score), so
> each of the 10 BTree for the "frequent words" has a minimum length of
> 3000*8 = 24000 bytes.
>
> If you now add a new document containing 5 of these frequent words, 5
> larger BTrees will be updated. [Chris, let me know, if I'm now going to
> tell nonsense...] I assume that the entire updated BTrees = 12 bytes
> will be appended to the ZODB (ignoring the less frequent words) -- even
> if the document contains only 1 kB text.
>
> This is the reason, why I'm working on some kind of "lazy cataloging".
> My approach is to use a Python class (or Base class,if ZClasses are
> involved), which has a method manage_afterAdd. This method looks for
> superValues of a type like "lazyCatalog" (derived from ZCatalog), and
> inserts self.getPhysicalPath() into the update list of each found
> "lazyCatalog".
>
> Later, a "lazyCatalog" can index all objects in this list. Then, then
> bloating happens either in RAM (without subtransaction), or in a
> temporary file, if you use subtransactions.
>
> OK, another approach which fits better to your (Giovanni) needs might be
> to use another data base than ZODB, but I'm afarid that even then
> "instant indexing" will be an expensive process, if you have a large
> number of documents.
>
> Abel


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-26 Thread Giovanni Maruzzelli

The catalog is a pristine 2.3.3b1 catalog.

We have recreated the catalog from scratch because we tried
manage_convertBTrees , but it don't work for us, it return with an error
(and the same happens with 2.3.3 final):

Error Type: TypeError
Error Value: second argument must be a class


Traceback (innermost last):
  File /fs1root/zope/Zope-2.3.3b1-src/lib/python/ZPublisher/Publish.py, line
223, in publish_module
  File /fs1root/zope/Zope-2.3.3b1-src/lib/python/ZPublisher/Publish.py, line
187, in publish
  File /fs1root/zope/Zope-2.3.3b1-src/lib/python/Zope/__init__.py, line 221,
in zpublisher_exception_hook
(Object: Traversable)
  File /fs1root/zope/Zope-2.3.3b1-src/lib/python/ZPublisher/Publish.py, line
171, in publish
  File /fs1root/zope/Zope-2.3.3b1-src/lib/python/ZPublisher/mapply.py, line
160, in mapply
(Object: manage_convertBTrees)
  File /fs1root/zope/Zope-2.3.3b1-src/lib/python/ZPublisher/Publish.py, line
112, in call_object
(Object: manage_convertBTrees)
  File
/fs1root/zope/Zope-2.3.3b1-src/lib/python/Products/ZCatalog/ZCatalog.py,
line 736, in manage_convertBTrees
(Object: Traversable)
  File
/fs1root/zope/Zope-2.3.3b1-src/lib/python/Products/ZCatalog/Catalog.py, line
204, in _convertBTrees
  File /fs1root/zope/Zope-2.3.3b1-src/lib/python/SearchIndex/UnTextIndex.py,
line 211, in _convertBTrees
TypeError: (see above)



- Original Message -
From: "Chris Withers" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Giovanni Maruzzelli" <[EMAIL PROTECTED]>; "Chris McDonough"
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, June 26, 2001 5:59 PM
Subject: Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a
solution?)


> Toby Dickenson wrote:
> >
> > >INDEXES:
> > >  PrincipiaSearchSource Text Index 2,524
> > >  autore Keyword Index 4,055
> > >  bflow0 Field Index 4,055
> > >  bflow1 Field Index 4,055
> > >  bflow2 Field Index 4,055
> >
> > Aha! a clue.
> >
> > If that is the output of the 'Indexes' tab then I dont think you are
> > using the newest ZCatalog. A recent release (im not surwe which,
> > 2.3.2?) has a new BTree implementation that reduces bloat by modifying
> > fewer buckets (it also doesnt have the column showing index size)
>
> Has the person concerned run the catalog update tool when they upgraded
their
> Zope version?
>
> cheers,
>
> Chris


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-26 Thread Giovanni Maruzzelli

I'm sorry to say that Toby is right in pointing at the version from which I
cutted and pasted the following, but we are using also a newer version and
the problem is the same.

We're working out our way with the "dump the first bytes of the raw dump" of
the new, magnificent tranalyzer from Toby (it reallly ought to be a standard
tool in the Zope distro), and we have now some hints of what happen when you
catalog something.

So, we are starting to optimize indexes and metadata, but the problem seems
not to fade away.

-giovanni


- Original Message -
From: "Toby Dickenson" <[EMAIL PROTECTED]>
To: "Giovanni Maruzzelli" <[EMAIL PROTECTED]>
Cc: "Chris McDonough" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, June 26, 2001 5:49 PM
Subject: Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a
solution?)


> >INDEXES:
> >  PrincipiaSearchSource Text Index 2,524
> >  autore Keyword Index 4,055
> >  bflow0 Field Index 4,055
> >  bflow1 Field Index 4,055
> >  bflow2 Field Index 4,055
>
>
> Aha! a clue.
>
> If that is the output of the 'Indexes' tab then I dont think you are
> using the newest ZCatalog. A recent release (im not surwe which,
> 2.3.2?) has a new BTree implementation that reduces bloat by modifying
> fewer buckets (it also doesnt have the column showing index size)
>
> Toby Dickenson
> [EMAIL PROTECTED]


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-26 Thread Giovanni Maruzzelli

I use 2.3.3 with python 1.5.2 on freebsd 3

I'm not so picky about bloating, but adding a document of 1K adds some 400K,
and keeps growing.

How much eat for you (I know you cataloged some 50K documents)?

-giovanni
- Original Message -
Sent: Tuesday, June 26, 2001 1:48 PM
Subject: Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a
solution?)


> Giovanni, which Zope version are you running?
>
> On Tue, 26 Jun 2001, Chris McDonough wrote:
>
> > How many indexes do you have, what are the index types, and what do
> > they index?  Likewise, what about metadata?  In your last message, you
> > said there's about 20.  That's a heck of a lot of indexes.  Do you
> > need them all?
>
> In my installation I have about 30 or 40
> Position(Text)Index/KeywordIndex/FieldIndex.  They don't bloat much, so I
> don't think that's the problem.  (The problem might be that we have
> different views on what bloating is, though :)


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-26 Thread Giovanni Maruzzelli

Hi Chris,

I don't think this is a problem of ObjectManager, also if it contribute to
the bloating.

We do breaks the content in subfolders, but our subfolders easily grows to
contains some hundred objects.

Do you think that the number of indexes contribute to the bloating? If this
is important, we can try to compact them in a littler number (eg: the
boolean indexes can become a sort of bitmask, eliminate the meta_type,
etc.).

This is our indexes (cut and paste from the ZMI), and following there is our
metadata :

INDEXES:
  PrincipiaSearchSource Text Index 2,524
  autore Keyword Index 4,055
  bflow0 Field Index 4,055
  bflow1 Field Index 4,055
  bflow2 Field Index 4,055
  bflow3 Field Index 4,055
  bflow4 Field Index 4,055
  bflow5 Field Index 4,055
  bflow6 Field Index 4,055
  bflow7 Field Index 4,055
  bflow8 Field Index 4,055
  bflow9 Field Index 4,055
  bobobase_modification_time Field Index 4,300
  dflow0 Field Index 4,055
  dflow1 Field Index 4,055
  id Field Index 4,300
  m_sflow0 Keyword Index 3,960
  m_sflow1 Keyword Index 3,960
  m_sflow2 Keyword Index 3,960
  meta_type Field Index 4,300
  pseudoId Text Index 4,054
  revisore Keyword Index 4,055
  title Text Index 3,844

METADATA:

  bobobase_modification_time
  id
  meta_type
  pseudoId
  title

- Original Message -
Sent: Tuesday, June 26, 2001 12:45 PM
Subject: Re: Zcatalog bloat problem (berkeleydb is a solution?)


>
> Hi Giovanni,
>
> How many indexes do you have, what are the index types, and what do they
> index?  Likewise, what about metadata?  In your last message, you said
> there's about 20.  That's a heck of a lot of indexes.  Do you need them
> all?
>
> I can see a potential reason for the problem you explain as "and I
> remind you that as the folder get populated, the size that is added to
> each transaction grows, a folder with one hundred objects adds some
> 100K"... It's true that "normal" folders (most ObjectManager-derived
> containers actually) cause database bloat within undoing storages when
> an object is added or removed from it.  This is because it keeps a list
> of contained subobject names in an "_objects" attribute, which is a
> tuple.  When an object is added, the tuple is rewritten in entirety.  So
> for instance, if you've got 100 items in your folder, and you add one
> more, you rewrite all the instance data for the folder itself, which
> includes the (large) _objects tuple (and of course, any other raw
> attributes, like properties).  Over time, this can be problematic.
>
> Shane's BTreeFolder Product attempts to ameliorate this problem a bit by
> keeping the data that is normally stored in the _objects tuple in its
> own persistent object (a btree).
>
> Are you breaking the content up into subfolders?  This is recommended.
>
> I'm temped to postulate that perhaps your problem isn't as much ZCatalog
> as it is ObjectManager overhead.
>
> - C
>
>
> Giovanni Maruzzelli wrote:
> >
> > Hello Zopistas,
> >
> > thank'you all for your replies.
> >
> > Our doubts still unresolved :-(
> >
> > With a clever hack that Toby Dickenson made on the very useful
tranalyzer,
> > we was able to see what happen
> > when we add or catalog an object. (BTW, we don't use CatalogAware).
> >
> > We can send the output of tranalyzer2 to anyone interested, but in short
> > words this is
> > what happens in an empty folder (and I remind you that as the folder get
> > populated, the size that
> > is added to each transaction grows, a folder with one hundred objects
adds
> > some 100K):
> >
> > if we add a normal DTML document (no catalog involved) in an empty
folder we
> > have
> > a very small increase in size: the size of the dtml and the size of the
> > folder:
> >
> > TID: 33D853C2CE6CDBB @ 77396692 obs 2 len 729
> > By ciao
> > "/aacucu/addDTMLDocument"
> > OID: 40817 len 270 [OFS.Folder.Folder]
> > OID: 40818 len 309 [OFS.DTMLDocument.DTMLDocument]
> >
> > if we add an "Articolo" that's cataloged on the fly in the same empty
> > directory we have a bloating:
> >
> > TID: 33D853D722FA167 @ 77397437 obs 96 len 226568
> > By ciao
> > "/aacucu/Articolo_add"
> > OID: 40817 len 363 [OFS.Folder.Folder]
> > OID: 40819 len 598 [*ennPsHQQKY5zjxlQs1ebmA==.Articolo]
> > OID: 407b5 len 8074 [BTrees.IOBTree.IOBucket]
> > OID: 37aa9 len 39 [BTrees.Length.Length]
> > OID: 37b95 len 1483 [BTrees.OIBTree.OIBucket]
> > OID: 407b7 len 1739 [BTrees.IOBTree.IOBucket]
> > OID: 407b8 len 402 [BTrees.IIBTree.IISet]
> > OID: 407b9 len 399 [BTrees.IOBTree.IOBucket]
> > OID: 407b

[Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-26 Thread Giovanni Maruzzelli

Hello Zopistas,

thank'you all for your replies.

Our doubts still unresolved :-(

With a clever hack that Toby Dickenson made on the very useful tranalyzer,
we was able to see what happen
when we add or catalog an object. (BTW, we don't use CatalogAware).

We can send the output of tranalyzer2 to anyone interested, but in short
words this is
what happens in an empty folder (and I remind you that as the folder get
populated, the size that
is added to each transaction grows, a folder with one hundred objects adds
some 100K):

if we add a normal DTML document (no catalog involved) in an empty folder we
have
a very small increase in size: the size of the dtml and the size of the
folder:

TID: 33D853C2CE6CDBB @ 77396692 obs 2 len 729
By ciao
"/aacucu/addDTMLDocument"
OID: 40817 len 270 [OFS.Folder.Folder]
OID: 40818 len 309 [OFS.DTMLDocument.DTMLDocument]

if we add an "Articolo" that's cataloged on the fly in the same empty
directory we have a bloating:

TID: 33D853D722FA167 @ 77397437 obs 96 len 226568
By ciao
"/aacucu/Articolo_add"
OID: 40817 len 363 [OFS.Folder.Folder]
OID: 40819 len 598 [*ennPsHQQKY5zjxlQs1ebmA==.Articolo]
OID: 407b5 len 8074 [BTrees.IOBTree.IOBucket]
OID: 37aa9 len 39 [BTrees.Length.Length]
OID: 37b95 len 1483 [BTrees.OIBTree.OIBucket]
OID: 407b7 len 1739 [BTrees.IOBTree.IOBucket]
OID: 407b8 len 402 [BTrees.IIBTree.IISet]
OID: 407b9 len 399 [BTrees.IOBTree.IOBucket]
OID: 407ba len 402 [BTrees.IIBTree.IISet]
OID: 407bb len 3497 [BTrees.IOBTree.IOBucket]
OID: 407bc len 5871 [BTrees.OOBTree.OOBucket]
OID: 37ab2 len 39 [BTrees.Length.Length]
OID: 407c6 len 6279 [BTrees.IOBTree.IOBucket]
OID: 3d7bf len 312 [BTrees.IIBTree.IISet]
OID: 407c7 len 4507 [BTrees.IOBTree.IOBucket]
OID: 3c992 len 837 [BTrees.OOBTree.OOBucket]
OID: 37abe len 39 [BTrees.Length.Length]
OID: 407d2 len 696 [BTrees.IOBTree.IOBucket]
OID: 3cb4e len 572 [BTrees.IIBTree.IISet]
OID: 407d3 len 537 [BTrees.IOBTree.IOBucket]
OID: 40809 len 387 [BTrees.IIBTree.IISet]
OID: 407d4 len 507 [BTrees.IOBTree.IOBucket]
OID: 4080a len 387 [BTrees.IIBTree.IISet]
OID: 407d5 len 507 [BTrees.IOBTree.IOBucket]
OID: 4080b len 387 [BTrees.IIBTree.IISet]
OID: 407d6 len 507 [BTrees.IOBTree.IOBucket]
OID: 4080c len 387 [BTrees.IIBTree.IISet]
OID: 407d7 len 339 [BTrees.IOBTree.IOBucket]
OID: 4080d len 382 [BTrees.IIBTree.IISet]
OID: 407d8 len 339 [BTrees.IOBTree.IOBucket]
OID: 4080e len 382 [BTrees.IIBTree.IISet]
OID: 407d9 len 339 [BTrees.IOBTree.IOBucket]
OID: 3d064 len 597 [BTrees.IIBTree.IISet]
OID: 407da len 347 [BTrees.IOBTree.IOBucket]
OID: 4080f len 387 [BTrees.IIBTree.IISet]
OID: 407db len 339 [BTrees.IOBTree.IOBucket]
OID: 3d1ba len 642 [BTrees.IIBTree.IISet]
OID: 407dc len 339 [BTrees.IOBTree.IOBucket]
OID: 40810 len 372 [BTrees.IIBTree.IISet]
OID: 407dd len 339 [BTrees.IOBTree.IOBucket]
OID: 40811 len 372 [BTrees.IIBTree.IISet]
OID: 407de len 339 [BTrees.IOBTree.IOBucket]
OID: 37f11 len 977 [BTrees.IOBTree.IOBucket]
OID: 380de len 830 [BTrees.OIBTree.OIBucket]
OID: 37ac4 len 25537 [BTrees.IIBTree.IISet]
OID: 37ac7 len 9892 [BTrees.IIBTree.IISet]
OID: 37aca len 13947 [BTrees.IIBTree.IISet]
OID: 38922 len 387 [BTrees.IIBTree.IISet]
OID: 38643 len 827 [BTrees.IIBTree.IISet]
OID: 3894c len 92 [BTrees.IIBTree.IISet]
OID: 388ff len 24707 [BTrees.IIBTree.IISet]
OID: 38581 len 277 [BTrees.IIBTree.IISet]
OID: 3d7f7 len 319 [BTrees.IOBTree.IOBTree]
OID: 3d7f8 len 356 [BTrees.IOBTree.IOBTree]
OID: 40812 len 372 [BTrees.IIBTree.IISet]
OID: 407e0 len 339 [BTrees.IOBTree.IOBucket]
OID: 40813 len 387 [BTrees.IIBTree.IISet]
OID: 407e1 len 339 [BTrees.IOBTree.IOBucket]
OID: 40814 len 362 [BTrees.IIBTree.IISet]
OID: 407e2 len 507 [BTrees.IOBTree.IOBucket]
OID: 37eb9 len 981 [BTrees.IOBTree.IOBucket]
OID: 38197 len 804 [BTrees.OIBTree.OIBucket]
OID: 38ac7 len 7947 [BTrees.IIBTree.IISet]
OID: 387f6 len 97 [BTrees.IIBTree.IISet]
OID: 383f7 len 850 [BTrees.OOBTree.OOBucket]
OID: 4081a len 47 [BTrees.IIBTree.IISet]
OID: 38407 len 850 [BTrees.OOBTree.OOBucket]
OID: 4081b len 47 [BTrees.IIBTree.IISet]
OID: 388ac len 92 [BTrees.IIBTree.IISet]
OID: 387d4 len 152 [BTrees.IIBTree.IISet]
OID: 3868c len 152 [BTrees.IIBTree.IISet]
OID: 38681 len 142 [BTrees.IIBTree.IISet]
OID: 388b0 len 72 [BTrees.IIBTree.IISet]
OID: 384f1 len 52 [BTrees.IIBTree.IISet]
OID: 37ca6 len 586 [BTrees.IOBTree.IOBucket]
OID: 4081c len 686 [BTrees.IOBTree.IOBucket]
OID: 37ab8 len 39336 [BTrees.IOBTree.IOBTree]
OID: 381d8 len 594 [BTrees.OIBTree.OIBucket]
OID: 38ac9 len 1252 [BTrees.IIBTree.IISet]
OID: 38770 len 52 [BTrees.IIBTree.IISet]
OID: 37d94 len 1234 [BTrees.IOBTree.IOBucket]
OID: 3821d len 617 [BTrees.OIBTree.OIBucket]
OID: 38acb len 557 [BTrees.IIBTree.IISet]
OID: 38710 len 52 [BTrees.IIBTree.IISet]
OID: 386ac len 52 [BTrees.IIBTree.IISet]
OID: 38409 len 1019 [BTrees.OOBTree.OOBucket]
OID: 4081d len 47 [BTrees.IIBTree.IISet]
OID: 3870b len 52 [BTrees.IIBTree.IISet]
OID: 38403 len 816 [BTrees.OOBTree.OOBucket]
OID: 4081e len 47 [BTrees.IIBTree.IISet]
OID: 387fe len 57 [BTrees.IIBTree.II

[Zope-dev] Zcatalog bloat problem (berkeleydb is a solution?)

2001-06-25 Thread Giovanni Maruzzelli

Hello Zopistas,

we are developing a Zope 2.3.3 (py 1.5.2) application that will add, index
and reindex some tens of thousands
objects (Zclass that are DTMLDocument on steroids) on some twenty properties
each day, while
the absolute number of objects cataloged keeps growing (think at content
management for a big
portal, where each day lots of content are added and modified and all the
old content remains as a
searchable archive and as material to recycle in the future).

This seems for some aspects a task similar to what Erik Enge impacted couple
a weeks ago.

We first derived from CatalogAware, then switched to manage ourselves the
cataloging - uncataloging - recataloging.

The ZODB still bloat at a too much fast pace.

***Maybe there's something obvious we missed***, but when you have some
4thousands object in the catalog, if you add and catalog one more object
the ZODB grows circa a couple of megabyte (while the object is some 1 k of
text, and some twelve boolean and datetime and strings properties). If we
pack the ZODB, Data.fs returns to an almost normal size (so the bloat are
made by the transactions as tranalyzer.py confirms).

Any hints on how to manage something like?
We use both textindexes, fieldindexes, and keywordsindexes (textindex on
string properties, fieldindexes on boolean and datetime, keywordindex on
strings). Maybe one kind of indexes is to be avoided?

Erik, any toughts?

We are almost decided to switch to berkeleydb storage (the Minimal one) to
get rid of the bloating, we are testing with it, but it seems to be
discontinued because of a lack of requests.

Any light on it? Is it production grade?

-giovanni



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )