Re: [Zope-dev] Re: Catalog performance

2003-09-11 Thread Toby Dickenson
On Thursday 11 September 2003 03:03, John Barratt wrote:

 I think ghosts
 are only 'removed' after a restart, 

fyi, ghosts are removed from memory using reference counting.

-- 
Toby Dickenson


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: Catalog performance

2003-09-10 Thread Seb Bacon
Nguyen Quan Son wrote:
Hi,
I have a problem with performance and memory consumption when trying to do some 
statistics, using following code:
...
docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.getObject()
value = obj.attr
...
With about 10.000 documents this Python script takes 10 minutes and more than 500MB of 
memory, after that I had to restart Zope. I
am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
What's wrong with this code? Any suggestion is appreciated.
With getObject(), you're loading entire objects into memory in order to 
grab a single attribute.  This is very wasteful.  Try putting the 
attribute into the metadata for the catalog and grabbing it from there. 
 Then you can do:

 for doc in docs:
 value = doc.attr
seb



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: Catalog performance

2003-09-10 Thread Simon Michael
John Barratt wrote:
the problem (which seems likely) then you can ghostify the objects that 
were ghosts to begin with, and it will save memory (unless all those 
objects are already in cache).
This is rather interesting, but I don't quite follow what's happening. 
If  you can say a little more, or suggest a doc reference, I'm all ears.



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: Catalog performance

2003-09-10 Thread John Barratt
Simon Michael wrote:

 John Barratt wrote:

 the problem (which seems likely) then you can ghostify the objects
 that were ghosts to begin with, and it will save memory (unless all
 those objects are already in cache).


 This is rather interesting, but I don't quite follow what's happening.
 If  you can say a little more, or suggest a doc reference, I'm all ears.
In general when an object is first loaded from the ZODB it is in a 
'ghost' state, and is only a shell, it has no attributes etc.  When you 
access (almost) any attribute on that object (eg. do : value = ob.attr), 
it gets activated (the contents are loaded automatically, and then the 
value returned).  This is when the real memory usage takes place.

So if you get an object from the ZODB and don't access any attributes, 
it will remain in a ghosted state.  Some core python attributes *don't* 
cause it to activate such as accessing __dict__, and also clearly the 
reserved persistent _p_* attributes.

If you look at the Cache Paramaters tab of your Database in the Control 
Panel (at least with Zope 2.6.2, perhaps 2.6.1) you can see how many 
objects are in memory, and how many are just 'ghosts'.  I think ghosts 
are only 'removed' after a restart, and essentially just contain a 
_p_oid that references the object in the ZODB, ready for re-activation.

A general reference for the ZODB can be found here that explains more :

http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html

An example use (and good discussion) that is similar, can be found at 
the link below.  I found this after having problems with objects not 
de-ghostifying properly when just accessing __dict__ :

http://aspn.activestate.com/ASPN/Mail/Message/zodb-dev/913762

Also a grep through the zope source code  some products will also find 
many examples of 'deactivating' objects after a 'walk' :

eg. From OFS.ObjectManager :
def manage_afterAdd(self, item, container):
for object in self.objectValues():
try: s=object._p_changed
except: s=0
if hasattr(aq_base(object), 'manage_afterAdd'):
object.manage_afterAdd(item, container)
if s is None: object._p_deactivate()
A change to my example code that would be advisable is the wrapping of 
the _p_changed test in a try/except incase the object is None, or for 
some reason isn't persistent, and hence doesn't have a _p_changed.

I hope this helps  makes sense!

Cheers,

JB.



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: Catalog performance

2003-09-10 Thread Simon Michael
John - your post and the links helped a lot. Thanks!



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )