I've been working on a mailman archive/search interface in zope. I
choose not to do the search mechanisms in zope because I was under the
impression that ZCatalog is great for object indexing but that it would
not be ideal for mass text indexing with 100K+ objects and 100MBs+ of
The comments below seem to indicate that its only problems with mass
indexing and transaction storage which would both get mitigated by
moving to a incremental indexing scheme.
but wouldn't you run into performance problems on searches and getting
available memory to powerup the catalog search?
i guess what i'm looking for is a maxim on catalog usage in terms of
number of objects/indexes and a machines specs?
btw a demo of my mailman search interface is at
Michel Pelletier wrote:
> Andy Dawkins wrote:
> > Michel
> > In case you are not aware, we at NIP currently host a complete archive of
> > the Zope mailing lists that are publicly available.
> > We are using ZCatalog to index all the messages from the Mailing list
> > archives. To give you an idea of numbers, the Zope mailing list alone is
> > over 30,000 messages.
> > The problem we have is getting that many objects in to the Catalog. If we
> > load the objects in to the ZODB, then catalog them, the machine either runs
> > out of memory or, if we lower the sub transactions, It runs out of hard
> > drive space.
> This is because you are indexing more content than you have virtual+tmp
> memory to store the transaction in. Zope is transaction, as I'm sure
> you know, so it has to store the transaction somewhere so it can roll it
> back if neccesary, and memory+tmp storage is where that goes
> (subtransactions are swapped out to tmp).
> > If we use CatalogAware to catalog the objects as they are imported the
> > Catalog explodes to stupid sizes because CatalogAware doesn't support Sub
> > transactions.
> Subtransactions are a storage thing, and really don't have anything to
> do with catalogaware, if you have a subtransaction threshold set then
> subtransactions will be used for any cataloging operation, catalogaware
> or not.
> > We could solve these issues by regularly packing the database during the
> > import, but it isn't a perfect solution.
> I'm not sure what you mean with these last to paragraphs, it seems like
> you have two problems:
> 1) you are mass indexing and running out of memory
> 2) you are indexing lots of content quickly and your database is growing
> The answer to 1 is to not mass index and incrimentatly index over time.
> The answer to 2 is to use a storage that does not store old revisions,
> like berkeley storage.
> > Also as messages arrived over time the Catalog would once again explode
> > dramatically,
> > Basically we(NIP) would like to know if you(Michel/DC) are planning to
> > improve ZCatalog/CatalogAware, if you are planning a successor to ZCatalog
> > or basically any information that could be useful to us regarding the
> > current development and urgency of ZCatalog/CatalogAware.
> There isn't anything wrong with the Catalog (for this particular
> problem), or at least, there isn't anything in the catalog to fix that
> would solve your problem. We've had customers index well over 50,000
> objects; you just have to understand the resource constraints and work
> with them, for example, don't mass index, use storages that scale to
> high write environments, etc.
> > Thanks in advance for your assistance.
> Zope-Dev maillist - [EMAIL PROTECTED]
> ** No cross posts or HTML encoding! **
> (Related lists -
> http://lists.zope.org/mailman/listinfo/zope )
Zope-Dev maillist - [EMAIL PROTECTED]
** No cross posts or HTML encoding! **
(Related lists -