I've added catalog metadata as Seb suggested and it works fine. Thank you very much. Nguyen Quan Son
> Nguyen Quan Son wrote: > > Hi, > > I have a problem with performance and memory consumption when trying to do some > > statistics, using following code: > > ... > > docs = container.portal_catalog(meta_type='Document', ...) > > for doc in docs: > > obj = doc.getObject() > > value = obj.attr > > ... > > > > With about 10.000 documents this Python script takes 10 minutes and more than > > 500MB of memory, after that I had to restart Zope. I > > am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM. > > What's wrong with this code? Any suggestion is appreciated. From: "John Barratt" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 10, 2003 6:41 PM Subject: Re: [Zope-dev] Catalog performance > > If you can't use catalog metadata as Seb suggests (eg. you are actually > accessing many attributes, large values, etc.) and if indeeed memory is > the problem (which seems likely) then you can ghostify the objects that > were ghosts to begin with, and it will save memory (unless all those > objects are already in cache). > > The problem with this strategy though is that doc.getObject() method > used in your code activates the object and hence you won't know if it > was a ghost already or not. To get around this you can shortcut this > method and do something like : > > docs = container.portal_catalog(meta_type='Document', ...) > for doc in docs: > obj = doc.aq_parent.unrestrictedTraverse(doc.getPath()) > was_ghost = obj._p_changed is None > value = obj.attr > if was_ghost:obj._p_deactivate() > > You can test this by running your code on a freshly restarted server, > and check the number of objects in cache. The number shouldn't change > much after running the above method, but will increase dramatically if > you just used 'obj = doc.getObject()' instead, or didn't do the > deactivating of the objects. The lower number of objects in your cache > should in turn keep your memory usage down, and prevent your computer > paging through the request, and hence speed things up considerably! > > Another option would be to reduce the size of your cache so that the > amount of memory your zope instance consumes doesn't cause your computer > to swap, though doing the above code changes will also help keep your > cache with the 'right' objects in it as well, which in turn will further > help with the performance of subsequent requests. > > Cheers, > > JB. From: "Seb Bacon" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 10, 2003 6:18 PM Subject: [Zope-dev] Re: Catalog performance > > With getObject(), you're loading entire objects into memory in order to > grab a single attribute. This is very wasteful. Try putting the > attribute into the metadata for the catalog and grabbing it from there. > Then you can do: > > for doc in docs: > value = doc.attr > > seb _______________________________________________ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )