Re: [Zope-dev] catalog performance: query plan
On Sun, Nov 9, 2008 at 19:58, Roché Compaan [EMAIL PROTECTED] wrote: Since I'm in full agreement that we need to fix indexes that are problematic, I started doing some benchmarks on the large data set that gave us so many headaches. It is probably not surprising that the more complex indexes are performing badly. DateRangeIndex, KeywordIndex and Plone's ExtendedPathIndex performed the worst. Below are some stats showing timings around the apply_index call in Catalog.py that was done while testing the application with real data: ExtendedPathIndex doesn't need fixing, but we need to stop using it. It's done to support navigation trees from the catalog, but navigation should not be done via the same catalog as you do other things, but a dedicated tool. That would simplify and speed things up a lot. But OK, that's off-topic. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] catalog performance: query plan
On Sun, Nov 9, 2008 at 22:29, Matt Hamilton [EMAIL PROTECTED] wrote: Lennart Regebro regebro at gmail.com writes: I would be interested in seeing a bunch of Gurus sit down at some sprint and trying to come up with a catalog engine that is incremental and uses query plans. There is no reason that would not be stupidly fast. :) We can then make a new catalog that uses this engine but has the same API as the old one, to ship with some future version of Zope, say 2.12. There is the Plone Performance sprint we are hosting in Bristol, UK on the 11th - 14th Dec. http://plone.org/events/sprints/bristol-performance-sprint Whilst it is billed as a Plone sprint, of course much of the speedups can be done at the Zope level, so Zope-only developers are more than welcome :) This is exactly the kind of thing that I like hacking on personally, so would love to see it worked on at the sprint. Cool. I do not have time in December though, so some other time. And if we could get Dieter Maurer and Helge Tesdal in on this, as they has experience and understanding of the issues that would be great. That's probably going to take even more planning, so maybe for a future performance sprint somewhere? -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] catalog performance: query plan
Hi Tres, Index Name |Type |Avg Time |Calls/second == object_implements|KeywordIndex |0.2172234| 4.6 This is clearly not the same issue as the other KeywordIndexes: in fact, I am astonished that anybody would be using a KeywordIndex for this at all. I would suspect that the real problem here is in the appliation, rather than the index itself. Why? object_implements indexes a list of interface dotted names. Would another type of index be more appropriate? UID |FieldIndex |0.0003070| 3257.1 Note that this is the worst-case scenario for a FieldIndex: there is exactly one value for every key. This shouldn't be indexed at all, in fact, beyond a simple BTree (UID - rid). Good point. I wonder how many places we use a UID index. UID *metadata* is quite important, of course. targetUID|FieldIndex |0.0002287| 4372.12 I don't know what this one is used for, but it should probably be scrapped as well. Me neither ... sounds bogus. Title|ZCTextIndex |0.128|77809.46 This should be removed: there is no valid use case for doing a full-text search restricted only to the title. I'm pretty amazed that this is a ZCTextIndex as well. I always thought it was a FieldIndex. Description |ZCTextIndex |0.116|86241.39 Again, should be removed. Right. getEmail |ZCTextIndex |0.113|87849.05 Should *definitely* be removed: how can you do full-text search on an e-mail address? Surely this is application specific too? I don't think Plone has such an index. SearchableText |TextIndex|0.113|88466.69 Where did this one come from? The 'SearchableText' above is a ZCTextIndex. It certainly is in vanilla Plone. Martin -- Author of `Professional Plone Development`, a book for developers who want to work with Plone. See http://martinaspeli.net/plone-book ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] catalog performance: query plan
On Mon, 2008-11-10 at 18:38 +0200, Hedley Roos wrote: Kinda pointless for me to continue since this is turning into a Plone-specific discussion on zope-dev. But at least the whole exercise has forced us to look in detail into how all these indexes affect performance with a zodb with many many objects. Roche investigated Tesdal's queryplan today end it seems to solve nearly all our performance problems. He'll have to elaborate. Well that is not really true. What solved our performance problems is not querying on object_implements and getEffective_or_created. I have previously done benchmarks with query plan and it didn't make any noticeable difference. What might be true or is becoming more likely is that indexes are used where they don't fit the use case rather than that the indexes themselves need optimisation. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] catalog performance: query plan
On Mon, 2008-11-10 at 11:08 -0500, Tres Seaver wrote: Index Name |Type |Avg Time |Calls/second == object_implements|KeywordIndex |0.2172234| 4.6 This is clearly not the same issue as the other KeywordIndexes: in fact, I am astonished that anybody would be using a KeywordIndex for this at all. I would suspect that the real problem here is in the appliation, rather than the index itself. getEffective_or_creat|DateIndex|0.1941770|5.15 effectiveRange |DateRangeIndex |0.0086295| 115.88 allowedRolesAndUsers |KeywordIndex |0.0069754| 143.36 Hmm, I'm surprised there: what query is being passed to 'apply_index' for this call? Well it is not really performing badly at 6ms? path |ExtendedPathIndex|0.0040614| 246.22 I don't trust the EPI implementation at all. portal_type |FieldIndex |0.0025984| 384.84 This one is surprising: its performance should be pretty similar to the other FieldIndexes (e.g., 'review_state') which map a controlled vocabulary onto the entire corpus. Was the query different than 'review_state' (e.g., multi-valued vs. single-valued)? It's still not bad at 2ms. It has a lot more keys than review_state though. SearchableText |ZCTextIndex |0.0007645| 1308.04 sourceUID|FieldIndex |0.0004886| 2046.31 Probably bogus, but I don't know how it is used. I'm not really worried about indexes beyond this point - they're all returning results in less than a millisecond. Can you provide information on the corpus / configuration / test plan you used to generate these results? It's basically a Plone site with 300,000 remember based users and roughly 150,000 documents and images indexed. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] catalog performance: query plan
Lennart Regebro wrote: On Sun, Nov 9, 2008 at 19:58, Roché Compaan [EMAIL PROTECTED] wrote: Since I'm in full agreement that we need to fix indexes that are problematic, I started doing some benchmarks on the large data set that gave us so many headaches. It is probably not surprising that the more complex indexes are performing badly. DateRangeIndex, KeywordIndex and Plone's ExtendedPathIndex performed the worst. Below are some stats showing timings around the apply_index call in Catalog.py that was done while testing the application with real data: ExtendedPathIndex doesn't need fixing, but we need to stop using it. It's done to support navigation trees from the catalog, but navigation should not be done via the same catalog as you do other things, but a dedicated tool. That would simplify and speed things up a lot. But OK, that's off-topic. I wander if this could be replaced by zc.relationship / plone.relations? There is potential for removing the five.intid / zope.app.keyreference layer of indirection if the actual oid was stored instead, with an index to a list of database names packed into the first byte. There would even be room to store a reference to the objects class (using the pickle protocol 2 registry to convert this to an integer) in the next two or three bytes if creating ghosts were useful. This would still leave at least 32 bits of space (4 billion) for the actual object id. Without storing the aq_chain explicitly we would need to ensure that __parent__ pointers were pickled for all content objects. The objects themselves could be used instead of metadata rows (without a security check it would be as simple as loading the oid from the relevant db connection). So long as all the required metadata was stored on the object itself only one load would be required for each object. If this same keyreference were used in the indexes of the catalog instead of rowids then result sets could be merged. The downside is that the set intersections would require double the memory of the current 32 bit ids. Laurence ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] catalog performance: query plan
Lennart Regebro regebro at gmail.com writes: I would be interested in seeing a bunch of Gurus sit down at some sprint and trying to come up with a catalog engine that is incremental and uses query plans. There is no reason that would not be stupidly fast. :) We can then make a new catalog that uses this engine but has the same API as the old one, to ship with some future version of Zope, say 2.12. There is the Plone Performance sprint we are hosting in Bristol, UK on the 11th - 14th Dec. http://plone.org/events/sprints/bristol-performance-sprint Whilst it is billed as a Plone sprint, of course much of the speedups can be done at the Zope level, so Zope-only developers are more than welcome :) This is exactly the kind of thing that I like hacking on personally, so would love to see it worked on at the sprint. -Matt -- Matt Hamilton [EMAIL PROTECTED] Netsight Internet Solutions, Ltd. Understand. Develop. Deliver http://www.netsight.co.uk +44 (0)117 9090901 Web Design | Zope/Plone Development Consulting | Co-location | Hosting ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] catalog performance: query plan
On Mon, 2008-10-27 at 11:32 -0500, Alan Runyan wrote: I agree with Tres. A lot more can be done with Indexes and Catalog without caching. The most exiciting development in Catalog optimizations comes out Jarn. Helge Tesdal (iirc) did a buncha work at a RDBMS company when he was in college. He has a protoype of a query plan for ZCatalog. http://www.jarn.com/blog/catalog-query-plan I would like to ask Roche and others to look at the Query Plan. We looked at query plan but it didn't help us in any way. Some catalog indexes are performing very badly and most of our content is in a published state which doesn't help the query plan much. Caching is a total PITA because invalidation machinery becomes overwhelming complex and unwieldly quickly in production. I agree but this was the only thing that we could do to even go into production. Since I'm in full agreement that we need to fix indexes that are problematic, I started doing some benchmarks on the large data set that gave us so many headaches. It is probably not surprising that the more complex indexes are performing badly. DateRangeIndex, KeywordIndex and Plone's ExtendedPathIndex performed the worst. Below are some stats showing timings around the apply_index call in Catalog.py that was done while testing the application with real data: Index Name |Type |Avg Time |Calls/second == object_implements|KeywordIndex |0.2172234| 4.6 getEffective_or_creat|DateIndex|0.1941770|5.15 effectiveRange |DateRangeIndex |0.0086295| 115.88 allowedRolesAndUsers |KeywordIndex |0.0069754| 143.36 path |ExtendedPathIndex|0.0040614| 246.22 portal_type |FieldIndex |0.0025984| 384.84 SearchableText |ZCTextIndex |0.0007645| 1308.04 sourceUID|FieldIndex |0.0004886| 2046.31 UID |FieldIndex |0.0003070| 3257.1 targetUID|FieldIndex |0.0002287| 4372.12 exact_getUserId |FieldIndex |0.0001931| 5177.79 exact_getUserName|FieldIndex |0.0001816| 5504.39 relationship |FieldIndex |0.822| 12153.1 id |FieldIndex |0.822|12161.81 end |DateIndex|0.623|16027.48 getGroups|FieldIndex |0.278|35973.45 getArtistTitle |FieldIndex |0.259|38495.53 review_state |FieldIndex |0.259|38582.22 Subject |KeywordIndex |0.253|39413.57 getDaysOfTheWeek |KeywordIndex |0.247|40465.98 meta_type|FieldIndex |0.199|50116.64 exact_getGroupId |FieldIndex |0.162|61417.51 getVideoURL |FieldIndex |0.155| 64447.5 year |FieldIndex |0.155|64460.43 Title|FieldIndex |0.136|73381.01 getId|FieldIndex |0.131|76056.97 Title|ZCTextIndex |0.128|77809.46 startendrange|DateRangeIndex |0.127|78485.82 expires |DateIndex|0.126|79001.59 getObjPositionInParen|FieldIndex |0.124| 80675.9 targetId |FieldIndex |0.122|81418.68 effective|DateIndex|0.121| 82651.7 getProvince |FieldIndex |0.117|85198.54 month|FieldIndex |0.116|85762.56 Description |ZCTextIndex |0.116|86241.39 Type |FieldIndex |0.115|86345.17 getLast_login_time |DateIndex|0.115|86698.98 Creator |FieldIndex |0.113|87840.03 getEmail |ZCTextIndex |0.113|87849.05 cmf_uid |FieldIndex |0.113|88352.13 getDuration |FieldIndex |0.113|88454.29 SearchableText |TextIndex|0.113|88466.69 sortable_title |FieldIndex |0.112|88698.49 getRating|FieldIndex |0.112| 88747.5 getGenres|KeywordIndex |0.112|88796.55 object_provides |KeywordIndex |0.112|88919.43 getEventType |KeywordIndex |0.112| 88953.9 in_reply_to |FieldIndex |0.112|89057.46 getReview_state |FieldIndex |0.112|89124.63 is_folderish |FieldIndex |0.112|89240.51 getRawRelatedItems |KeywordIndex |0.111|89568.91 getThumbSize |FieldIndex |0.111|89653.89 getStudioCamURL |FieldIndex |0.111|89678.92 Date |DateIndex|0.111|89799.23 getHash |FieldIndex |0.111|90111.54 getNumberOfComments |FieldIndex |0.110|90141.88 start
Re: [Zope-dev] catalog performance: query plan
On Mon, Oct 27, 2008 at 17:32, Alan Runyan [EMAIL PROTECTED] wrote: I agree with Tres. A lot more can be done with Indexes and Catalog without caching. The most exiciting development in Catalog optimizations comes out Jarn. Helge Tesdal (iirc) did a buncha work at a RDBMS company when he was in college. He has a protoype of a query plan for ZCatalog. http://www.jarn.com/blog/catalog-query-plan I would like to ask Roche and others to look at the Query Plan. Caching is a total PITA because invalidation machinery becomes overwhelming complex and unwieldly quickly in production. I don't know very much about searching, but this definitely sounds like a good idea. Also, especially when doing free text searching that has large result sets, incremental searching is very beneficial. I know Dieter Maurer has made a Zope2 implementation of this. I would be interested in seeing a bunch of Gurus sit down at some sprint and trying to come up with a catalog engine that is incremental and uses query plans. There is no reason that would not be stupidly fast. :) We can then make a new catalog that uses this engine but has the same API as the old one, to ship with some future version of Zope, say 2.12. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] catalog performance: query plan
I agree with Tres. A lot more can be done with Indexes and Catalog without caching. The most exiciting development in Catalog optimizations comes out Jarn. Helge Tesdal (iirc) did a buncha work at a RDBMS company when he was in college. He has a protoype of a query plan for ZCatalog. http://www.jarn.com/blog/catalog-query-plan I would like to ask Roche and others to look at the Query Plan. Caching is a total PITA because invalidation machinery becomes overwhelming complex and unwieldly quickly in production. -- Alan Runyan Enfold Systems, Inc. http://www.enfoldsystems.com/ phone: +1.713.942.2377x111 fax: +1.832.201.8856 ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog performance
[EMAIL PROTECTED] wrote: Just my 2 cents observation... I ran this code and monitored the page Cache extreme detail in ZMI ControlPanel DebugInfo. With this method, the object was not loaded. However the intermediate objects that the unrestrictedTraverse() passed by were loaded into memory. e.g. If doc.getPath() is '/x/y/z/myobject', myobject was not loaded but x, y, and z were loaded into memory. This is a good point, and for catalog based retrievals of objects may be difficult (but not impossible) to avoid excess objects remaining in the cache. If however you are doing a walk of part of your ZODB tree (unlike 'randomly' accessing objects from the ZODB like this) you could ensure that you do a depth first traversal, then as you come back up the tree, traversed nodes that weren't active before the walk would be deactivated. eg something like this external method : def traverseTree(self): ''' Traverse the tree and do something. ''' was_ghost = self._p_changed is None for ob in self.objectValues(): traverseTree(ob) # XXX Do something with self here : self.doSomething() if was_ghost:self._p_deactivate() This should ensure that any 'traversed over' nodes that were previously not active are de-activated. JB. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] Catalog performance
John Barratt wrote: If you can't use catalog metadata as Seb suggests (eg. you are actually accessing many attributes, large values, etc.) and if indeeed memory is the problem (which seems likely) then you can ghostify the objects that were ghosts to begin with, and it will save memory (unless all those objects are already in cache). The problem with this strategy though is that doc.getObject() method used in your code activates the object and hence you won't know if it was a ghost already or not. To get around this you can shortcut this method and do something like : docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.aq_parent.unrestrictedTraverse(doc.getPath()) was_ghost = obj._p_changed is None value = obj.attr if was_ghost:obj._p_deactivate() Just my 2 cents observation... I ran this code and monitored the page Cache extreme detail in ZMI ControlPanel DebugInfo. With this method, the object was not loaded. However the intermediate objects that the unrestrictedTraverse() passed by were loaded into memory. e.g. If doc.getPath() is '/x/y/z/myobject', myobject was not loaded but x, y, and z were loaded into memory. I also tried the method suggested by Seb. This did not load myobject as well as x, y, and z into memory: http://mail.zope.org/pipermail/zope-dev/2003-September/020450.html The information on deactivating object into ghost state is very helpful. Thanks! cheers, zhi min ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog performance
John Barratt wrote: Chris Withers wrote: John Barratt wrote: docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.aq_parent.unrestrictedTraverse(doc.getPath()) was_ghost = obj._p_changed is None value = obj.attr if was_ghost:obj._p_deactivate() Bear in mind though, that you can only do this in an external method... ..or product code ;-) Why can you only do this in an external method? This idea (deactivating objects) is used quite a extensively in many parts of core zope such as OFS.ObjectManager as I mentioned in another post, and we use it in our product code quite a bit as well. Nguyen was asking about a python script, you can't do these things there as the necessary methods don't have security declarations, and the methods start with '_', which I think the Zope Security Policy denies access to... Chris ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog performance
Chris Withers wrote: John Barratt wrote: docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.aq_parent.unrestrictedTraverse(doc.getPath()) was_ghost = obj._p_changed is None value = obj.attr if was_ghost:obj._p_deactivate() Bear in mind though, that you can only do this in an external method... Why can you only do this in an external method? This idea (deactivating objects) is used quite a extensively in many parts of core zope such as OFS.ObjectManager as I mentioned in another post, and we use it in our product code quite a bit as well. JB. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] Catalog performance
Hi, I have a problem with performance and memory consumption when trying to do some statistics, using following code: ... docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.getObject() value = obj.attr ... With about 10.000 documents this Python script takes 10 minutes and more than 500MB of memory, after that I had to restart Zope. I am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM. What's wrong with this code? Any suggestion is appreciated. Nguyen Quan Son. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog performance
Nguyen Quan Son wrote: Hi, I have a problem with performance and memory consumption when trying to do some statistics, using following code: ... docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.getObject() value = obj.attr ... With about 10.000 documents this Python script takes 10 minutes and more than 500MB of memory, after that I had to restart Zope. I am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM. What's wrong with this code? Any suggestion is appreciated. Nguyen Quan Son. it's not the catalog that's slow: t1=time.time() docs = container.portal_catalog(meta_type='Document', ...) t2=time.time() for doc in docs: obj = doc.getObject() value = obj.attr ... t3=time.time() print out the times and you'll see that the the finding is fast. the problem is that you are inflating each and every document one after another, and that takes time. Romain Slootmaekers. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog performance
Max M wrote: Nguyen Quan Son wrote: Hi, I have a problem with performance and memory consumption when trying to do some statistics, using following code: ... docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.getObject() value = obj.attr ... With about 10.000 documents this Python script takes 10 minutes and more than 500MB of memory, after that I had to restart Zope. I am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM. What's wrong with this code? Any suggestion is appreciated. Nguyen Quan Son. Most likely you are filling the memory of your server so that you are swapping to disk. Try cutting the query into smaller pieces so that the memory doesn't get filled up. If you can't use catalog metadata as Seb suggests (eg. you are actually accessing many attributes, large values, etc.) and if indeeed memory is the problem (which seems likely) then you can ghostify the objects that were ghosts to begin with, and it will save memory (unless all those objects are already in cache). The problem with this strategy though is that doc.getObject() method used in your code activates the object and hence you won't know if it was a ghost already or not. To get around this you can shortcut this method and do something like : docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.aq_parent.unrestrictedTraverse(doc.getPath()) was_ghost = obj._p_changed is None value = obj.attr if was_ghost:obj._p_deactivate() You can test this by running your code on a freshly restarted server, and check the number of objects in cache. The number shouldn't change much after running the above method, but will increase dramatically if you just used 'obj = doc.getObject()' instead, or didn't do the deactivating of the objects. The lower number of objects in your cache should in turn keep your memory usage down, and prevent your computer paging through the request, and hence speed things up considerably! Another option would be to reduce the size of your cache so that the amount of memory your zope instance consumes doesn't cause your computer to swap, though doing the above code changes will also help keep your cache with the 'right' objects in it as well, which in turn will further help with the performance of subsequent requests. Cheers, JB. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog performance - SOLVED
I've added catalog metadata as Seb suggested and it works fine. Thank you very much. Nguyen Quan Son Nguyen Quan Son wrote: Hi, I have a problem with performance and memory consumption when trying to do some statistics, using following code: ... docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.getObject() value = obj.attr ... With about 10.000 documents this Python script takes 10 minutes and more than 500MB of memory, after that I had to restart Zope. I am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM. What's wrong with this code? Any suggestion is appreciated. From: John Barratt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, September 10, 2003 6:41 PM Subject: Re: [Zope-dev] Catalog performance If you can't use catalog metadata as Seb suggests (eg. you are actually accessing many attributes, large values, etc.) and if indeeed memory is the problem (which seems likely) then you can ghostify the objects that were ghosts to begin with, and it will save memory (unless all those objects are already in cache). The problem with this strategy though is that doc.getObject() method used in your code activates the object and hence you won't know if it was a ghost already or not. To get around this you can shortcut this method and do something like : docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.aq_parent.unrestrictedTraverse(doc.getPath()) was_ghost = obj._p_changed is None value = obj.attr if was_ghost:obj._p_deactivate() You can test this by running your code on a freshly restarted server, and check the number of objects in cache. The number shouldn't change much after running the above method, but will increase dramatically if you just used 'obj = doc.getObject()' instead, or didn't do the deactivating of the objects. The lower number of objects in your cache should in turn keep your memory usage down, and prevent your computer paging through the request, and hence speed things up considerably! Another option would be to reduce the size of your cache so that the amount of memory your zope instance consumes doesn't cause your computer to swap, though doing the above code changes will also help keep your cache with the 'right' objects in it as well, which in turn will further help with the performance of subsequent requests. Cheers, JB. From: Seb Bacon [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, September 10, 2003 6:18 PM Subject: [Zope-dev] Re: Catalog performance With getObject(), you're loading entire objects into memory in order to grab a single attribute. This is very wasteful. Try putting the attribute into the metadata for the catalog and grabbing it from there. Then you can do: for doc in docs: value = doc.attr seb ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )