I've added catalog metadata as Seb suggested and it works fine.
Thank you very much.
Nguyen Quan Son
Nguyen Quan Son wrote:
Hi,
I have a problem with performance and memory consumption when trying to do some
statistics, using following code:
...
docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.getObject()
value = obj.attr
...
With about 10.000 documents this Python script takes 10 minutes and more than
500MB of memory, after that I had to restart Zope.
I
am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
What's wrong with this code? Any suggestion is appreciated.
From: John Barratt [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 6:41 PM
Subject: Re: [Zope-dev] Catalog performance
If you can't use catalog metadata as Seb suggests (eg. you are actually
accessing many attributes, large values, etc.) and if indeeed memory is
the problem (which seems likely) then you can ghostify the objects that
were ghosts to begin with, and it will save memory (unless all those
objects are already in cache).
The problem with this strategy though is that doc.getObject() method
used in your code activates the object and hence you won't know if it
was a ghost already or not. To get around this you can shortcut this
method and do something like :
docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
was_ghost = obj._p_changed is None
value = obj.attr
if was_ghost:obj._p_deactivate()
You can test this by running your code on a freshly restarted server,
and check the number of objects in cache. The number shouldn't change
much after running the above method, but will increase dramatically if
you just used 'obj = doc.getObject()' instead, or didn't do the
deactivating of the objects. The lower number of objects in your cache
should in turn keep your memory usage down, and prevent your computer
paging through the request, and hence speed things up considerably!
Another option would be to reduce the size of your cache so that the
amount of memory your zope instance consumes doesn't cause your computer
to swap, though doing the above code changes will also help keep your
cache with the 'right' objects in it as well, which in turn will further
help with the performance of subsequent requests.
Cheers,
JB.
From: Seb Bacon [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 6:18 PM
Subject: [Zope-dev] Re: Catalog performance
With getObject(), you're loading entire objects into memory in order to
grab a single attribute. This is very wasteful. Try putting the
attribute into the metadata for the catalog and grabbing it from there.
Then you can do:
for doc in docs:
value = doc.attr
seb
___
Zope-Dev maillist - [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )