I've added catalog metadata as Seb suggested and it works fine.
Thank you very much.
Nguyen Quan Son

> Nguyen Quan Son wrote:
> > Hi,
> > I have a problem with performance and memory consumption when trying to do some 
> > statistics, using following code:
> > ...
> > docs = container.portal_catalog(meta_type='Document', ...)
> > for doc in docs:
> >     obj = doc.getObject()
> >     value = obj.attr
> >     ...
> >
> > With about 10.000 documents this Python script takes 10 minutes and more than 
> > 500MB of memory, after that I had to restart Zope.
I
> > am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
> > What's wrong with this code? Any suggestion is appreciated.


From: "John Barratt" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 10, 2003 6:41 PM
Subject: Re: [Zope-dev] Catalog performance


>
> If you can't use catalog metadata as Seb suggests (eg. you are actually
> accessing many attributes, large values, etc.) and if indeeed memory is
> the problem (which seems likely) then you can ghostify the objects that
> were ghosts to begin with, and it will save memory (unless all those
> objects are already in cache).
>
> The problem with this strategy though is that doc.getObject() method
> used in your code activates the object and hence you won't know if it
> was a ghost already or not.  To get around this you can shortcut this
> method and do something like :
>
> docs = container.portal_catalog(meta_type='Document', ...)
> for doc in docs:
>      obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
>      was_ghost = obj._p_changed is None
>      value = obj.attr
>      if was_ghost:obj._p_deactivate()
>
> You can test this by running your code on a freshly restarted server,
> and check the number of objects in cache.  The number shouldn't change
> much after running the above method, but will increase dramatically if
> you just used 'obj = doc.getObject()' instead, or didn't do the
> deactivating of the objects.  The lower number of objects in your cache
> should in turn keep your memory usage down, and prevent your computer
> paging through the request, and hence speed things up considerably!
>
> Another option would be to reduce the size of your cache so that the
> amount of memory your zope instance consumes doesn't cause your computer
> to swap, though doing the above code changes will also help keep your
> cache with the 'right' objects in it as well, which in turn will further
> help with the performance of subsequent requests.
>
> Cheers,
>
> JB.


From: "Seb Bacon" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 10, 2003 6:18 PM
Subject: [Zope-dev] Re: Catalog performance


>
> With getObject(), you're loading entire objects into memory in order to
> grab a single attribute.  This is very wasteful.  Try putting the
> attribute into the metadata for the catalog and grabbing it from there.
>   Then you can do:
>
>   for doc in docs:
>       value = doc.attr
>
> seb


_______________________________________________
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )

Reply via email to