Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes: Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's OFS.Image.Image class (and also Zope3's implementation) is using a so called possible large data class (Pdata) that is a subclass of Persistent. Pdata is using a simple and genious approach to minimize the memory usage when storing large binary data in ZODB. The data is read from a [...] Actually Pdata has some drawbacks. When the blobsupport branch gets declared stable (I think it's not gonna happen in 3.4, but nobody told me otherwise) we'll have really good blob support without this black magic. Cheers, Christian -- gocept gmbh co. kg - schalaunische str. 6 - 06366 koethen - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 - fax +49 3496 30 99 118 - zope and plone consulting and development signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
--On 29. Mai 2005 11:29:06 +0200 Christian Theune [EMAIL PROTECTED] wrote: Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes: Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's OFS.Image.Image class (and also Zope3's implementation) is using a so called possible large data class (Pdata) that is a subclass of Persistent. Pdata is using a simple and genious approach to minimize the memory usage when storing large binary data in ZODB. The data is read from a [...] Actually Pdata has some drawbacks. When the blobsupport branch gets declared stable (I think it's not gonna happen in 3.4, but nobody told me otherwise) we'll have really good blob support without this black magic. The Pdata approach in general is not bad. I have implemented a CVS-like file repository lately where we store binary content using a pdata like structure. Our largest files are around (100MB) and the performance and efficiency is not bad although it could be better. The bottleneck is either the ZEO communication or just the network. I reach about 3.5 MB/second while reading such a large file from the ZEO server. -aj pgpZxItW5QTm4.pgp Description: PGP signature ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
On 5/29/05, Shane Hathaway [EMAIL PROTECTED] wrote: Would a multi thread ZEO server improve anything here? Especially with concurrent access? It's possible. Although ZEO talks over the network using async sockets, it reads files synchronously, so I suspect it will frequently sit around doing nothing for 10 ms, waiting for the disk to read data. If your ZEO server has a load of 1.0 or more but low CPU usage, this is likely happening. The easiest way to overcome this is to buy gigabytes of RAM for the ZEO server--ideally, enough gigabytes to hold your whole database. A related problem is that the ZEO cache on the client is on disk, too. You may end up waiting for a disk seek to get it off disk on the client. If you've got it in memory on the server and if the ZEO protocol were more efficient, that would be a drag. Also, the design of ZEO clients tends to serialize communication with the ZEO server, so the throughput between client and server is likely to be limited significantly by network latency. ping is a good tool for measuring latency; 1 ms is good and .1 ms is excellent. There are ways to tune the network. You can also reduce the effects of network latency by creating and load balancing a lot of ZEO clients. It's really too bad that ZEO only allows a single outstanding request. Restructuring the protocol to allow multiple simulatenous requests was on the task list years ago, but the protocol implementation is so complex I doubt it will get done :-(. I can't help but think building on top of an existing message/RPC layer would be profitable. (What's twisted's RPC layer?) Or at least something less difficult to use than asyncore. Jeremy ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
On 5/21/05, DJTB [EMAIL PROTECTED] wrote: [posted to comp.lang.python, mailed to [EMAIL PROTECTED] [Following up to both places.] I'm having problems storing large amounts of objects in a ZODB. After committing changes to the database, elements are not cleared from memory. Since the number of objects I'd like to store in the ZODB is too large to fit in RAM, my program gets killed with signal 11 or signal 9... The problem here is a common one with a first attempt at using ZODB. The problem is that ZODB manages memory at the granularity of first-class persistent objects -- that is, instances of classes that inherit from Persistent. ZODB can move such objects in and out of memory at transaction boundaries, which allows your application to use many more objects than it has physical memory for. It looks like your application has a single persistent instance -- the root ExtendedTupleTable -- so there's no way for ZODB to manage the memory. That object and everything reachable from it must be in memory at all times. You need to re-structure the program so that is has more first-class persistent objects. If, for example, the ExtendedTuple objects inherited from Persistent, then they could reside on disk except when you are manipulating them. The ObjectInterning instance is another source of problem, because it's a dictionary that has an entry for every object you touch. The various other dictionaries in your program will also be memory hogs in they have very many entries. The typical way to structure a ZODB program is to use one of the BTrees implementation types instead of a dictionary, because the BTree does not keep all its keys and values in memory at one time. (Its internal organization is a large collection of first-class persistent objects representing the BTree buckets and internal tree nodes.) You must use some care with BTrees, because the data structure maintains a total ordering on the keys. (And a dictionary does not.) The ZODB/ZEO programming guide has a good section on BTrees here: http://www.zope.org/Wikis/ZODB/guide/node6.html Jeremy ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
DJTB wrote: What should I do to make sure RAM is no longer a limiting factor? (in other words: The program should work with any (large) value of self.__range and self.__et_count Because in my case, self.__et_count = 5000 is only a toy example...) I'm now working on a PC with 2.5 GB RAM and even that's not enough! Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's OFS.Image.Image class (and also Zope3's implementation) is using a so called possible large data class (Pdata) that is a subclass of Persistent. Pdata is using a simple and genious approach to minimize the memory usage when storing large binary data in ZODB. The data is read from a temporary file chunk by chunk. Each chunk is stored inside a Pdata object and committed in a subtransaction. The Pdata objects are linked in a simple linear chain just like a linear list connected with pointers in old style C. Try to understand the code. It might help to solve your problem. In general: Don't try to store large data in one block like a binary string. Use small, persistent chunks. Christian ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
DJTB wrote at 2005-5-21 13:00 +0200: ... I'm having problems storing large amounts of objects in a ZODB. After committing changes to the database, elements are not cleared from memory. You can control how many objects the ZODB cache may contain. Note, however, that the objects are usually flushed from cache only at transaction boundaries. Furthermore, there are methods to flush individual objects from cache (obj._p_invalidate()), perform a cache cleanup mid-transaction (connection.cacheGC()) and perform a full flush connection.cacheMinimize()). Note that an object can only be flushed from the cache when it was not modified in the current transaction. This is independent from the way you try to flush it (_p_invalidate, cacheGC or cacheMinimize). -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev