[Zope-dev] Catalog performance

2003-09-10 Thread Nguyen Quan Son
Hi,
I have a problem with performance and memory consumption when trying to do some 
statistics, using following code:
...
docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.getObject()
value = obj.attr
...

With about 10.000 documents this Python script takes 10 minutes and more than 500MB of 
memory, after that I had to restart Zope. I
am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
What's wrong with this code? Any suggestion is appreciated.
Nguyen Quan Son.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Catalog performance - SOLVED

2003-09-10 Thread Nguyen Quan Son
I've added catalog metadata as Seb suggested and it works fine.
Thank you very much.
Nguyen Quan Son

 Nguyen Quan Son wrote:
  Hi,
  I have a problem with performance and memory consumption when trying to do some 
  statistics, using following code:
  ...
  docs = container.portal_catalog(meta_type='Document', ...)
  for doc in docs:
  obj = doc.getObject()
  value = obj.attr
  ...
 
  With about 10.000 documents this Python script takes 10 minutes and more than 
  500MB of memory, after that I had to restart Zope.
I
  am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
  What's wrong with this code? Any suggestion is appreciated.


From: John Barratt [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 6:41 PM
Subject: Re: [Zope-dev] Catalog performance



 If you can't use catalog metadata as Seb suggests (eg. you are actually
 accessing many attributes, large values, etc.) and if indeeed memory is
 the problem (which seems likely) then you can ghostify the objects that
 were ghosts to begin with, and it will save memory (unless all those
 objects are already in cache).

 The problem with this strategy though is that doc.getObject() method
 used in your code activates the object and hence you won't know if it
 was a ghost already or not.  To get around this you can shortcut this
 method and do something like :

 docs = container.portal_catalog(meta_type='Document', ...)
 for doc in docs:
  obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
  was_ghost = obj._p_changed is None
  value = obj.attr
  if was_ghost:obj._p_deactivate()

 You can test this by running your code on a freshly restarted server,
 and check the number of objects in cache.  The number shouldn't change
 much after running the above method, but will increase dramatically if
 you just used 'obj = doc.getObject()' instead, or didn't do the
 deactivating of the objects.  The lower number of objects in your cache
 should in turn keep your memory usage down, and prevent your computer
 paging through the request, and hence speed things up considerably!

 Another option would be to reduce the size of your cache so that the
 amount of memory your zope instance consumes doesn't cause your computer
 to swap, though doing the above code changes will also help keep your
 cache with the 'right' objects in it as well, which in turn will further
 help with the performance of subsequent requests.

 Cheers,

 JB.


From: Seb Bacon [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 6:18 PM
Subject: [Zope-dev] Re: Catalog performance



 With getObject(), you're loading entire objects into memory in order to
 grab a single attribute.  This is very wasteful.  Try putting the
 attribute into the metadata for the catalog and grabbing it from there.
   Then you can do:

   for doc in docs:
   value = doc.attr

 seb


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )