Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Christian Theune
Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes:
 Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's 
 OFS.Image.Image class (and also Zope3's implementation) is using a so 
 called possible large data class (Pdata) that is a subclass of Persistent.
 
 Pdata is using a simple and genious approach to minimize the memory 
 usage when storing large binary data in ZODB. The data is read from a 
 [...]

Actually Pdata has some drawbacks. When the blobsupport branch gets
declared stable (I think it's not gonna happen in 3.4, but nobody told
me otherwise) we'll have really good blob support without this black
magic.

Cheers,
Christian

-- 
gocept gmbh  co. kg - schalaunische str. 6 - 06366 koethen - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 -
fax +49 3496 30 99 118 - zope and plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Andreas Jung



--On 29. Mai 2005 11:29:06 +0200 Christian Theune [EMAIL PROTECTED] wrote:


Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes:

Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's
OFS.Image.Image class (and also Zope3's implementation) is using a so
called possible large data class (Pdata) that is a subclass of
Persistent.

Pdata is using a simple and genious approach to minimize the memory
usage when storing large binary data in ZODB. The data is read from a
[...]


Actually Pdata has some drawbacks. When the blobsupport branch gets
declared stable (I think it's not gonna happen in 3.4, but nobody told
me otherwise) we'll have really good blob support without this black
magic.



The Pdata approach in general is not bad. I have implemented a CVS-like file
repository lately where we store binary content using a pdata like 
structure.
Our largest files are around (100MB) and the performance and efficiency is 
not bad
although it could be better. The bottleneck is either the ZEO communication 
or just the network.
I reach about 3.5 MB/second while reading such a large file from the ZEO 
server.


-aj


pgpZxItW5QTm4.pgp
Description: PGP signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Jeremy Hylton
On 5/29/05, Shane Hathaway [EMAIL PROTECTED] wrote:
  Would a multi thread ZEO server improve anything here? Especially
  with concurrent access?
 
 It's possible.  Although ZEO talks over the network using async sockets,
 it reads files synchronously, so I suspect it will frequently sit around
 doing nothing for 10 ms, waiting for the disk to read data.  If your ZEO
 server has a load of 1.0 or more but low CPU usage, this is likely
 happening.  The easiest way to overcome this is to buy gigabytes of RAM
 for the ZEO server--ideally, enough gigabytes to hold your whole database.

A related problem is that the ZEO cache on the client is on disk, too.
 You may end up waiting for a disk seek to get it off disk on the
client.  If you've got it in memory on the server and if the ZEO
protocol were more efficient, that would be a drag.

 Also, the design of ZEO clients tends to serialize communication with
 the ZEO server, so the throughput between client and server is likely to
 be limited significantly by network latency.  ping is a good tool for
 measuring latency; 1 ms is good and .1 ms is excellent.  There are ways
 to tune the network.  You can also reduce the effects of network latency
 by creating and load balancing a lot of ZEO clients.

It's really too bad that ZEO only allows a single outstanding request.
 Restructuring the protocol to allow multiple simulatenous requests
was on the task list years ago, but the protocol implementation is so
complex I doubt it will get done :-(.  I can't help but think building
on top of an existing message/RPC layer would be profitable.  (What's
twisted's RPC layer?)  Or at least something less difficult to use
than asyncore.

Jeremy
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-22 Thread Jeremy Hylton
On 5/21/05, DJTB [EMAIL PROTECTED] wrote:
 [posted to comp.lang.python, mailed to [EMAIL PROTECTED]

[Following up to both places.]

 I'm having problems storing large amounts of objects in a ZODB.
 After committing changes to the database, elements are not cleared from
 memory. Since the number of objects I'd like to store in the ZODB is too
 large to fit in RAM, my program gets killed with signal 11 or signal 9...

The problem here is a common one with a first attempt at using ZODB. 
The problem is that ZODB manages memory at the granularity of
first-class persistent objects --  that is, instances of classes that
inherit from Persistent.  ZODB can move such objects in and out of
memory at transaction boundaries, which allows your application to use
many more objects than it has physical memory for.

It looks like your application has a single persistent instance -- the
root ExtendedTupleTable -- so there's no way for ZODB to manage the
memory.  That object and everything reachable from it must be in
memory at all times.

You need to re-structure the program so that is has more first-class
persistent objects.  If, for example, the ExtendedTuple objects
inherited from Persistent, then they could reside on disk except when
you are manipulating them.

The ObjectInterning instance is another source of problem, because
it's a dictionary that has an entry for every object you touch.  The
various other dictionaries in your program will also be memory hogs in
they have very many entries.  The typical way to structure a ZODB
program is to use one of the BTrees implementation types instead of a
dictionary, because the BTree does not keep all its keys and values in
memory at one time.  (Its internal organization is a large collection
of first-class persistent objects representing the BTree buckets and
internal tree nodes.)

You must use some care with BTrees, because the data structure
maintains a total ordering on the keys.  (And a dictionary does not.) 
  The ZODB/ZEO programming guide has a good section on BTrees here:
http://www.zope.org/Wikis/ZODB/guide/node6.html

Jeremy
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-21 Thread Christian Heimes

DJTB wrote:

What should I do to make sure RAM is no longer a limiting factor?
(in other words: The program should work with any (large) value of
self.__range and self.__et_count
Because in my case, self.__et_count = 5000 is only a toy example...)
I'm now working on a PC with 2.5 GB RAM and even that's not enough!


Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's 
OFS.Image.Image class (and also Zope3's implementation) is using a so 
called possible large data class (Pdata) that is a subclass of Persistent.


Pdata is using a simple and genious approach to minimize the memory 
usage when storing large binary data in ZODB. The data is read from a 
temporary file chunk by chunk. Each chunk is stored inside a Pdata 
object and committed in a subtransaction. The Pdata objects are linked 
in a simple linear chain just like a linear list connected with pointers 
in old style C.


Try to understand the code. It might help to solve your problem. In 
general: Don't try to store large data in one block like a binary 
string. Use small, persistent chunks.


Christian
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-21 Thread Dieter Maurer
DJTB wrote at 2005-5-21 13:00 +0200:
 ...
I'm having problems storing large amounts of objects in a ZODB.
After committing changes to the database, elements are not cleared from
memory.

You can control how many objects the ZODB cache may contain.

Note, however, that the objects are usually flushed from cache
only at transaction boundaries.

Furthermore, there are methods to flush individual objects
from cache (obj._p_invalidate()), perform a cache cleanup
mid-transaction (connection.cacheGC()) and perform
a full flush connection.cacheMinimize()).

Note that an object can only be flushed from the cache
when it was not modified in the current transaction.
This is independent from the way you try to flush it
(_p_invalidate, cacheGC or cacheMinimize).

-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev