Re: [Zope-dev] catalog performance: query plan

2008-11-10 Thread Lennart Regebro
On Sun, Nov 9, 2008 at 19:58, Roché Compaan [EMAIL PROTECTED] wrote:
 Since I'm in full agreement that we need to fix indexes that are
 problematic, I started doing some benchmarks on the large data set that
 gave us so many headaches. It is probably not surprising that the more
 complex indexes are performing badly. DateRangeIndex, KeywordIndex and
 Plone's ExtendedPathIndex performed the worst. Below are some stats
 showing timings around the apply_index call in Catalog.py that was
 done while testing the application with real data:

ExtendedPathIndex doesn't need fixing, but we need to stop using it.
It's done to support navigation trees from the catalog, but navigation
should not be done via the same catalog as you do other things, but a
dedicated tool. That would simplify and speed things up a lot. But OK,
that's off-topic.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] catalog performance: query plan

2008-11-10 Thread Lennart Regebro
On Sun, Nov 9, 2008 at 22:29, Matt Hamilton [EMAIL PROTECTED] wrote:
 Lennart Regebro regebro at gmail.com writes:

 I would be interested in seeing a bunch of Gurus sit down at some
 sprint and trying to come up with a catalog engine that is incremental
 and uses query plans. There is no reason that would not be stupidly
 fast. :) We can then make a new catalog that uses this engine but has
 the same API as the old one, to ship with some future version of Zope,
 say 2.12.

 There is the Plone Performance sprint we are hosting in Bristol, UK on the 
 11th
 - 14th Dec.

 http://plone.org/events/sprints/bristol-performance-sprint

 Whilst it is billed as a Plone sprint, of course much of the speedups can be
 done at the Zope level, so Zope-only developers are more than welcome :)

 This is exactly the kind of thing that I like hacking on personally, so would
 love to see it worked on at the sprint.

Cool. I do not have time in December though, so some other time. And
if we could get Dieter Maurer and Helge Tesdal in on this, as they has
experience and understanding of the issues that would be great. That's
probably going to take even more planning, so maybe for a future
performance sprint somewhere?

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] catalog performance: query plan

2008-11-10 Thread Martin Aspeli
Hi Tres,

 Index Name   |Type |Avg Time |Calls/second
 ==
 object_implements|KeywordIndex |0.2172234| 4.6
 
 This is clearly not the same issue as the other KeywordIndexes:  in
 fact, I am astonished that anybody would be using a KeywordIndex for
 this at all.  I would suspect that the real problem here is in the
 appliation, rather than the index itself.

Why? object_implements indexes a list of interface dotted names. Would 
another type of index be more appropriate?

 UID  |FieldIndex   |0.0003070|  3257.1
 
 Note that this is the worst-case scenario for a FieldIndex:  there is
 exactly one value for every key.  This shouldn't be indexed at all, in
 fact, beyond a simple BTree (UID - rid).

Good point. I wonder how many places we use a UID index. UID *metadata* 
is quite important, of course.

 targetUID|FieldIndex   |0.0002287| 4372.12
 
 I don't know what this one is used for, but it should probably be
 scrapped as well.

Me neither ... sounds bogus.

 Title|ZCTextIndex  |0.128|77809.46
 
 This should be removed:  there is no valid use case for doing a
 full-text search restricted only to the title.

I'm pretty amazed that this is a ZCTextIndex as well. I always thought 
it was a FieldIndex.

 Description  |ZCTextIndex  |0.116|86241.39
 
 Again, should be removed.

Right.

 getEmail |ZCTextIndex  |0.113|87849.05
 
 Should *definitely* be removed:  how can you do full-text search on an
 e-mail address?

Surely this is application specific too? I don't think Plone has such an 
index.


 SearchableText   |TextIndex|0.113|88466.69
 
 Where did this one come from?  The 'SearchableText' above is a ZCTextIndex.

It certainly is in vanilla Plone.

Martin


-- 
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] catalog performance: query plan

2008-11-10 Thread Roché Compaan
On Mon, 2008-11-10 at 18:38 +0200, Hedley Roos wrote:
 Kinda pointless for me to continue since this is turning into a
 Plone-specific discussion on zope-dev. But at least the whole exercise
 has forced us to look in detail into how all these indexes affect
 performance with a zodb with many many objects.
 
 Roche investigated Tesdal's queryplan today end it seems to solve
 nearly all our performance problems. He'll have to elaborate.

Well that is not really true. What solved our performance problems is
not querying on object_implements and getEffective_or_created. I have
previously done benchmarks with query plan and it didn't make any
noticeable difference.

What might be true or is becoming more likely is that indexes are used
where they don't fit the use case rather than that the indexes
themselves need optimisation.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] catalog performance: query plan

2008-11-10 Thread Roché Compaan
On Mon, 2008-11-10 at 11:08 -0500, Tres Seaver wrote:
  Index Name   |Type |Avg Time |Calls/second
  ==
  object_implements|KeywordIndex |0.2172234| 4.6
 
 This is clearly not the same issue as the other KeywordIndexes:  in
 fact, I am astonished that anybody would be using a KeywordIndex for
 this at all.  I would suspect that the real problem here is in the
 appliation, rather than the index itself.
 
  getEffective_or_creat|DateIndex|0.1941770|5.15
  effectiveRange   |DateRangeIndex   |0.0086295|  115.88
  allowedRolesAndUsers |KeywordIndex |0.0069754|  143.36
 
 Hmm, I'm surprised there:  what query is being passed to 'apply_index'
 for this call?

Well it is not really performing badly at 6ms?

 
  path |ExtendedPathIndex|0.0040614|  246.22
 
 I don't trust the EPI implementation at all.
 
  portal_type  |FieldIndex   |0.0025984|  384.84
 
 This one is surprising:  its performance should be pretty similar to
 the
  other FieldIndexes (e.g., 'review_state') which map a controlled
 vocabulary onto the entire corpus.  Was the query different than
 'review_state' (e.g., multi-valued vs. single-valued)?

It's still not bad at 2ms. It has a lot more keys than review_state
though.

 
  SearchableText   |ZCTextIndex  |0.0007645| 1308.04
  sourceUID|FieldIndex   |0.0004886| 2046.31
 
 Probably bogus, but I don't know how it is used.

I'm not really worried about indexes beyond this point - they're all
returning results in less than a millisecond.

 Can you provide information on the corpus / configuration / test plan
 you used to generate these results?

It's basically a Plone site with 300,000 remember based users and
roughly 150,000 documents and images indexed.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] catalog performance: query plan

2008-11-10 Thread Laurence Rowe
Lennart Regebro wrote:
 On Sun, Nov 9, 2008 at 19:58, Roché Compaan [EMAIL PROTECTED] wrote:
 Since I'm in full agreement that we need to fix indexes that are
 problematic, I started doing some benchmarks on the large data set that
 gave us so many headaches. It is probably not surprising that the more
 complex indexes are performing badly. DateRangeIndex, KeywordIndex and
 Plone's ExtendedPathIndex performed the worst. Below are some stats
 showing timings around the apply_index call in Catalog.py that was
 done while testing the application with real data:
 
 ExtendedPathIndex doesn't need fixing, but we need to stop using it.
 It's done to support navigation trees from the catalog, but navigation
 should not be done via the same catalog as you do other things, but a
 dedicated tool. That would simplify and speed things up a lot. But OK,
 that's off-topic.
 

I wander if this could be replaced by zc.relationship / plone.relations?

There is potential for removing the five.intid / zope.app.keyreference 
layer of indirection if the actual oid was stored instead, with an index 
to a list of database names packed into the first byte. There would even 
be room to store a reference to the objects class (using the pickle 
protocol 2 registry to convert this to an integer) in the next two or 
three bytes if creating ghosts were useful. This would still leave at 
least 32 bits of space (4 billion) for the actual object id.

Without storing the aq_chain explicitly we would need to ensure that 
__parent__ pointers were pickled for all content objects. The objects 
themselves could be used instead of metadata rows (without a security 
check it would be as simple as loading the oid from the relevant db 
connection). So long as all the required metadata was stored on the 
object itself only one load would be required for each object.

If this same keyreference were used in the indexes of the catalog 
instead of rowids then result sets could be merged.

The downside is that the set intersections would require double the 
memory of the current 32 bit ids.

Laurence

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] catalog performance: query plan

2008-11-09 Thread Matt Hamilton
Lennart Regebro regebro at gmail.com writes:

 I would be interested in seeing a bunch of Gurus sit down at some
 sprint and trying to come up with a catalog engine that is incremental
 and uses query plans. There is no reason that would not be stupidly
 fast. :) We can then make a new catalog that uses this engine but has
 the same API as the old one, to ship with some future version of Zope,
 say 2.12.

There is the Plone Performance sprint we are hosting in Bristol, UK on the 11th
- 14th Dec.

http://plone.org/events/sprints/bristol-performance-sprint

Whilst it is billed as a Plone sprint, of course much of the speedups can be
done at the Zope level, so Zope-only developers are more than welcome :)

This is exactly the kind of thing that I like hacking on personally, so would
love to see it worked on at the sprint.

-Matt

-- 
Matt Hamilton   [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.   Understand. Develop. Deliver
http://www.netsight.co.uk +44 (0)117 9090901
Web Design | Zope/Plone Development  Consulting | Co-location | Hosting




___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] catalog performance: query plan

2008-11-09 Thread Roché Compaan
On Mon, 2008-10-27 at 11:32 -0500, Alan Runyan wrote:
 I agree with Tres.  A lot more can be done with Indexes and Catalog
 without caching.
 
 The most exiciting development in Catalog optimizations comes out
 Jarn.  Helge Tesdal (iirc) did a buncha work  at a RDBMS company when
 he was in college.  He has a protoype of a query plan for ZCatalog.
 
 http://www.jarn.com/blog/catalog-query-plan
 
 I would like to ask Roche and others to look at the Query Plan.

We looked at query plan but it didn't help us in any way. Some catalog
indexes are performing very badly and most of our content is in a
published state which doesn't help the query plan much.

 Caching is a total PITA because invalidation machinery becomes
 overwhelming complex and unwieldly quickly in production.
 

I agree but this was the only thing that we could do to even go into
production.

Since I'm in full agreement that we need to fix indexes that are
problematic, I started doing some benchmarks on the large data set that
gave us so many headaches. It is probably not surprising that the more
complex indexes are performing badly. DateRangeIndex, KeywordIndex and
Plone's ExtendedPathIndex performed the worst. Below are some stats
showing timings around the apply_index call in Catalog.py that was
done while testing the application with real data:

Index Name   |Type |Avg Time |Calls/second
==
object_implements|KeywordIndex |0.2172234| 4.6
getEffective_or_creat|DateIndex|0.1941770|5.15
effectiveRange   |DateRangeIndex   |0.0086295|  115.88
allowedRolesAndUsers |KeywordIndex |0.0069754|  143.36
path |ExtendedPathIndex|0.0040614|  246.22
portal_type  |FieldIndex   |0.0025984|  384.84
SearchableText   |ZCTextIndex  |0.0007645| 1308.04
sourceUID|FieldIndex   |0.0004886| 2046.31
UID  |FieldIndex   |0.0003070|  3257.1
targetUID|FieldIndex   |0.0002287| 4372.12
exact_getUserId  |FieldIndex   |0.0001931| 5177.79
exact_getUserName|FieldIndex   |0.0001816| 5504.39
relationship |FieldIndex   |0.822| 12153.1
id   |FieldIndex   |0.822|12161.81
end  |DateIndex|0.623|16027.48
getGroups|FieldIndex   |0.278|35973.45
getArtistTitle   |FieldIndex   |0.259|38495.53
review_state |FieldIndex   |0.259|38582.22
Subject  |KeywordIndex |0.253|39413.57
getDaysOfTheWeek |KeywordIndex |0.247|40465.98
meta_type|FieldIndex   |0.199|50116.64
exact_getGroupId |FieldIndex   |0.162|61417.51
getVideoURL  |FieldIndex   |0.155| 64447.5
year |FieldIndex   |0.155|64460.43
Title|FieldIndex   |0.136|73381.01
getId|FieldIndex   |0.131|76056.97
Title|ZCTextIndex  |0.128|77809.46
startendrange|DateRangeIndex   |0.127|78485.82
expires  |DateIndex|0.126|79001.59
getObjPositionInParen|FieldIndex   |0.124| 80675.9
targetId |FieldIndex   |0.122|81418.68
effective|DateIndex|0.121| 82651.7
getProvince  |FieldIndex   |0.117|85198.54
month|FieldIndex   |0.116|85762.56
Description  |ZCTextIndex  |0.116|86241.39
Type |FieldIndex   |0.115|86345.17
getLast_login_time   |DateIndex|0.115|86698.98
Creator  |FieldIndex   |0.113|87840.03
getEmail |ZCTextIndex  |0.113|87849.05
cmf_uid  |FieldIndex   |0.113|88352.13
getDuration  |FieldIndex   |0.113|88454.29
SearchableText   |TextIndex|0.113|88466.69
sortable_title   |FieldIndex   |0.112|88698.49
getRating|FieldIndex   |0.112| 88747.5
getGenres|KeywordIndex |0.112|88796.55
object_provides  |KeywordIndex |0.112|88919.43
getEventType |KeywordIndex |0.112| 88953.9
in_reply_to  |FieldIndex   |0.112|89057.46
getReview_state  |FieldIndex   |0.112|89124.63
is_folderish |FieldIndex   |0.112|89240.51
getRawRelatedItems   |KeywordIndex |0.111|89568.91
getThumbSize |FieldIndex   |0.111|89653.89
getStudioCamURL  |FieldIndex   |0.111|89678.92
Date |DateIndex|0.111|89799.23
getHash  |FieldIndex   |0.111|90111.54
getNumberOfComments  |FieldIndex   |0.110|90141.88
start   

Re: [Zope-dev] catalog performance: query plan

2008-11-06 Thread Lennart Regebro
On Mon, Oct 27, 2008 at 17:32, Alan Runyan [EMAIL PROTECTED] wrote:
 I agree with Tres.  A lot more can be done with Indexes and Catalog
 without caching.

 The most exiciting development in Catalog optimizations comes out
 Jarn.  Helge Tesdal (iirc) did a buncha work  at a RDBMS company when
 he was in college.  He has a protoype of a query plan for ZCatalog.

 http://www.jarn.com/blog/catalog-query-plan

 I would like to ask Roche and others to look at the Query Plan.

 Caching is a total PITA because invalidation machinery becomes
 overwhelming complex and unwieldly quickly in production.

I don't know very much about searching, but this definitely sounds
like a good idea. Also, especially when doing free text searching that
has large result sets, incremental searching is very beneficial. I
know Dieter Maurer has made a Zope2 implementation of this.

I would be interested in seeing a bunch of Gurus sit down at some
sprint and trying to come up with a catalog engine that is incremental
and uses query plans. There is no reason that would not be stupidly
fast. :) We can then make a new catalog that uses this engine but has
the same API as the old one, to ship with some future version of Zope,
say 2.12.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Catalog performance

2003-09-21 Thread John Barratt
[EMAIL PROTECTED] wrote:
Just my 2 cents observation...
I ran this code and monitored the page Cache extreme detail in ZMI 
ControlPanel  DebugInfo.
With this method, the object was not loaded. However the intermediate
objects that the unrestrictedTraverse() passed by were loaded into memory.
e.g. If doc.getPath() is '/x/y/z/myobject', myobject was not loaded but x,
y, and z were loaded into memory.
This is a good point, and for catalog based retrievals of objects may be 
difficult (but not impossible) to avoid excess objects remaining in the 
cache.  If however you are doing a walk of part of your ZODB tree 
(unlike 'randomly' accessing objects from the ZODB like this) you could 
ensure that you do a depth first traversal, then as you come back up the 
tree, traversed nodes that weren't active before the walk would be 
deactivated.

eg something like this external method :

def traverseTree(self):
''' Traverse the tree and do something. '''
was_ghost = self._p_changed is None

for ob in self.objectValues():
traverseTree(ob)
# XXX Do something with self here :
self.doSomething()
if was_ghost:self._p_deactivate()

This should ensure that any 'traversed over' nodes that were previously 
not active are de-activated.

JB.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Catalog performance

2003-09-12 Thread Chris Withers
John Barratt wrote:
Chris Withers wrote:

John Barratt wrote:

docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
was_ghost = obj._p_changed is None
value = obj.attr
if was_ghost:obj._p_deactivate()
Bear in mind though, that you can only do this in an external method...
..or product code ;-)

Why can you only do this in an external method?  This idea (deactivating 
objects) is used quite a extensively in many parts of core zope such as 
OFS.ObjectManager as I mentioned in another post, and we use it in our 
product code quite a bit as well.
Nguyen was asking about a python script, you can't do these things there as the 
necessary methods don't have security declarations, and the methods start with 
'_', which I think the Zope Security Policy denies access to...

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Catalog performance

2003-09-11 Thread John Barratt
Chris Withers wrote:

John Barratt wrote:

docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
was_ghost = obj._p_changed is None
value = obj.attr
if was_ghost:obj._p_deactivate()


Bear in mind though, that you can only do this in an external method...
Why can you only do this in an external method?  This idea (deactivating 
objects) is used quite a extensively in many parts of core zope such as 
OFS.ObjectManager as I mentioned in another post, and we use it in our 
product code quite a bit as well.

JB.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Catalog performance

2003-09-10 Thread Romain Slootmaekers
Nguyen Quan Son wrote:
Hi,
I have a problem with performance and memory consumption when trying to do some 
statistics, using following code:
...
docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.getObject()
value = obj.attr
...
With about 10.000 documents this Python script takes 10 minutes and more than 500MB of 
memory, after that I had to restart Zope. I
am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
What's wrong with this code? Any suggestion is appreciated.
Nguyen Quan Son.
it's not the catalog that's slow:

t1=time.time()
docs = container.portal_catalog(meta_type='Document', ...)
t2=time.time()
for doc in docs:
 obj = doc.getObject()
 value = obj.attr
 ...
t3=time.time()
print out the times and you'll see that the the finding is fast.
the problem is that you are inflating each and every document one after 
another, and that takes time.

Romain Slootmaekers.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Catalog performance

2003-09-10 Thread John Barratt
Max M wrote:

Nguyen Quan Son wrote:
  Hi,
  I have a problem with performance and memory consumption when trying 
to do some statistics, using following code:
  ...
  docs = container.portal_catalog(meta_type='Document', ...)
  for doc in docs:
  obj = doc.getObject()
  value = obj.attr
  ...
 
  With about 10.000 documents this Python script takes 10 minutes and 
more than 500MB of memory, after that I had to restart Zope. I
  am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
  What's wrong with this code? Any suggestion is appreciated.
  Nguyen Quan Son.

Most likely you are filling the memory of your server so that you are 
swapping to disk.

Try cutting the query into smaller pieces so that the memory doesn't get 
filled up.
If you can't use catalog metadata as Seb suggests (eg. you are actually 
accessing many attributes, large values, etc.) and if indeeed memory is 
the problem (which seems likely) then you can ghostify the objects that 
were ghosts to begin with, and it will save memory (unless all those 
objects are already in cache).

The problem with this strategy though is that doc.getObject() method 
used in your code activates the object and hence you won't know if it 
was a ghost already or not.  To get around this you can shortcut this 
method and do something like :

docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
was_ghost = obj._p_changed is None
value = obj.attr
if was_ghost:obj._p_deactivate()
You can test this by running your code on a freshly restarted server, 
and check the number of objects in cache.  The number shouldn't change 
much after running the above method, but will increase dramatically if 
you just used 'obj = doc.getObject()' instead, or didn't do the 
deactivating of the objects.  The lower number of objects in your cache 
should in turn keep your memory usage down, and prevent your computer 
paging through the request, and hence speed things up considerably!

Another option would be to reduce the size of your cache so that the 
amount of memory your zope instance consumes doesn't cause your computer 
to swap, though doing the above code changes will also help keep your 
cache with the 'right' objects in it as well, which in turn will further 
help with the performance of subsequent requests.

Cheers,

JB.





___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Catalog performance - SOLVED

2003-09-10 Thread Nguyen Quan Son
I've added catalog metadata as Seb suggested and it works fine.
Thank you very much.
Nguyen Quan Son

 Nguyen Quan Son wrote:
  Hi,
  I have a problem with performance and memory consumption when trying to do some 
  statistics, using following code:
  ...
  docs = container.portal_catalog(meta_type='Document', ...)
  for doc in docs:
  obj = doc.getObject()
  value = obj.attr
  ...
 
  With about 10.000 documents this Python script takes 10 minutes and more than 
  500MB of memory, after that I had to restart Zope.
I
  am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
  What's wrong with this code? Any suggestion is appreciated.


From: John Barratt [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 6:41 PM
Subject: Re: [Zope-dev] Catalog performance



 If you can't use catalog metadata as Seb suggests (eg. you are actually
 accessing many attributes, large values, etc.) and if indeeed memory is
 the problem (which seems likely) then you can ghostify the objects that
 were ghosts to begin with, and it will save memory (unless all those
 objects are already in cache).

 The problem with this strategy though is that doc.getObject() method
 used in your code activates the object and hence you won't know if it
 was a ghost already or not.  To get around this you can shortcut this
 method and do something like :

 docs = container.portal_catalog(meta_type='Document', ...)
 for doc in docs:
  obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
  was_ghost = obj._p_changed is None
  value = obj.attr
  if was_ghost:obj._p_deactivate()

 You can test this by running your code on a freshly restarted server,
 and check the number of objects in cache.  The number shouldn't change
 much after running the above method, but will increase dramatically if
 you just used 'obj = doc.getObject()' instead, or didn't do the
 deactivating of the objects.  The lower number of objects in your cache
 should in turn keep your memory usage down, and prevent your computer
 paging through the request, and hence speed things up considerably!

 Another option would be to reduce the size of your cache so that the
 amount of memory your zope instance consumes doesn't cause your computer
 to swap, though doing the above code changes will also help keep your
 cache with the 'right' objects in it as well, which in turn will further
 help with the performance of subsequent requests.

 Cheers,

 JB.


From: Seb Bacon [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 6:18 PM
Subject: [Zope-dev] Re: Catalog performance



 With getObject(), you're loading entire objects into memory in order to
 grab a single attribute.  This is very wasteful.  Try putting the
 attribute into the metadata for the catalog and grabbing it from there.
   Then you can do:

   for doc in docs:
   value = doc.attr

 seb


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )