RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-06-01 Thread Christian Theune
Am Mittwoch, den 01.06.2005, 12:59 -0400 schrieb Chris McDonough:
> How's Sat June 11?  I'll just mark that on my calendar as a day to write
> some blob tests and I'll catch you (and whoever else might want to
> contribute) on irc.freenode.net between then and now to talk about what
> needs to be tested?

Sounds like a good plan to me.

-- 
gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 -
fax +49 3496 30 99 118 - zope and plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-06-01 Thread Chris McDonough
On Wed, 2005-06-01 at 12:04 +0200, Christian Theune wrote:
> Am Dienstag, den 31.05.2005, 16:02 -0400 schrieb Chris McDonough:
> > Christian,
> > 
> > Can we pick a day next week to have a sprint via IRC?  Would you be
> > willing to help write some tests during that sprint?
> 
> Does "next week" include saturday/sunday? We just acquired a new client
> and next week is very busy. The weekend would be possible for me. Maybe
> a one-hour planning chat would be possible during the week, so people
> could get started. OTOH doing it all together might be even better. 
> 
> Eventually my friday afternoon (like 4pm over here) would mean the
> normal Friday to you, so we can leverage the time shift to get me more
> time. 8)
> 
> And yes, I'm willing to write tests, if someone hints me at the test
> cases. :)

How's Sat June 11?  I'll just mark that on my calendar as a day to write
some blob tests and I'll catch you (and whoever else might want to
contribute) on irc.freenode.net between then and now to talk about what
needs to be tested?

- C


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-06-01 Thread Christian Theune
Am Dienstag, den 31.05.2005, 16:02 -0400 schrieb Chris McDonough:
> Christian,
> 
> Can we pick a day next week to have a sprint via IRC?  Would you be
> willing to help write some tests during that sprint?

Does "next week" include saturday/sunday? We just acquired a new client
and next week is very busy. The weekend would be possible for me. Maybe
a one-hour planning chat would be possible during the week, so people
could get started. OTOH doing it all together might be even better. 

Eventually my friday afternoon (like 4pm over here) would mean the
normal Friday to you, so we can leverage the time shift to get me more
time. 8)

And yes, I'm willing to write tests, if someone hints me at the test
cases. :)

Christian

-- 
gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 -
fax +49 3496 30 99 118 - zope and plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems

2005-05-31 Thread Shane Hathaway
Tim Peters wrote:
> [Jeremy Hylton]
> 
>>>It's really too bad that ZEO only allows a single outstanding request.
>>> Restructuring the protocol to allow multiple simulatenous requests
>>>was on the task list years ago, but the protocol implementation is so
>>>complex I doubt it will get done :-(.  I can't help but think building
>>>on top of an existing message/RPC layer would be profitable.  (What's
>>>twisted's RPC layer?)  Or at least something less difficult to use than
>>>asyncore.
> 
> 
> [Shane Hathaway]
> 
>>Do you think the RPC layer is the source of the problem?
> 
> 
> Probably depends on what "the problem" refers to?  If the protocol allows
> for at most one outstanding request, then that's clearly _a_ bottleneck,
> right?

Yes.  I meant to ask whether the RPC layer is currently the worst
bottleneck.  Lately I've been dealing with problems that require a
minimum of 50% utilization of a gigabit network connection, and the 3.5
MB/s figure Andreas quoted made me cringe. ;-)

> I get the impression that Jim thinks the ZEO protocol is simple.  I don't
> know -- because I haven't had to "fix bugs" in it recently, I know little
> about it.  It sure isn't obvious from staring at 8000+ lines of ZEO code,
> and Jeremy, Guido and I spent weeks a few years ago "fixing bugs" then.  I
> felt pretty lost the whole time, never sure how many threads there were,
> which code those threads may be executing, how exactly asyncore cooperated
> (or fought) with the threads, or even clear on which RPC calls were
> synchronous and which async.  There's so much machinery of various kinds
> that it's hard to divine the _intent_ of it all.  I remember that sometimes
> the letter "x" gets sent to a socket, and that's important .

Yes, it is mysterious.  Poor readability seems to be common with code
that deals with both events (i.e. asyncore events) and many threads.  I
used to mix them at a whim.  Lately I've used both events and threads,
but not in the same program, and I think it's done a lot of good for the
maintainability of what I've written.

> It was my ignorant impression at the time that asyncore didn't make much
> sense here, because mixing threads with asyncore is always a nightmare in
> reality, and a ZEO server doesn't intend to service hundreds of clients
> simultaneously regardless.

Having stared at ZEO for a while, I've convinced myself that the ZEO
client code has no reason to use asyncore.  A blocking socket and
makefile() seem like a much better fit.

I'm not sure whether the ZEO server should be event driven or threaded,
but being both is probably wrong.  Since it's event driven now, the ZEO
server may be less susceptible to concurrency gremlins than it would be
with threads.  However, last time I looked, the ZEO server uses a few
threads for miscellaneous work.

>>I feel like the way ZODB uses references in pickles is the main thing
>>that slows it down.  Even if you have a protocol that can request many
>>objects at once, the unpickling machinery only asks for one at a time.
> 
> 
> I'm unclear on what "the unpickling machinery" means.  The most obvious
> meaning is cPickle, but that doesn't ask for anything.  In general, no
> object's state gets unpickled before something in the application _needs_
> its state, so unpickling is currently driven by the application.  Maybe
> you're suggesting some form of "eager" unpickling/state-materialization?

Yes.  For each object, ZODB could store a list of referenced OIDs.  When
ZODB is about to unpickle an object, it could read the list of
referenced OIDs and tell its storage that it will need the pickles for
those objects very shortly (except the objects already loaded.)  Then
the ZEO client code could make a single request for all of the
referenced objects that aren't already in the cache.

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-05-31 Thread Chris McDonough
Christian,

Can we pick a day next week to have a sprint via IRC?  Would you be
willing to help write some tests during that sprint?

On Tue, 2005-05-31 at 11:21 -0400, Tim Peters wrote:
> [Paul Winkler]
> >> Cool!  This is on ctheune-blobsupport branch, right? Did that ever get
> >> merged to trunk?
> 
> [Christian Theune]
> > AFAIK not. It was a "maybe" for the 3.4 release but really didn't have
> > enough public exposure to make it in there.
> 
> That's all correct.  3.4 is in beta now, so blob support will be a
> new-in-3.5 feature.
> 
> > ChrisM wanted to write some more unit tests for it which would rectify
> > including it on the main branch. I hope I can make him do that at a
> > sprint over here in September ... :)
> 
> I hope there's action before September -- that would be half a year after
> the branch was created, and merging gets harder over time.  Maybe we could
> do a virtual sprint via zodb-dev <0.9 wink>.
> 
> ___
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
> 
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> http://mail.zope.org/mailman/listinfo/zodb-dev
> 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems

2005-05-31 Thread Tim Peters
...

[Toby Dickenson]
> pb; perspective broker. I thought that suggestion was crazy when Jeremey
> first presented it in his blog, but increasing exposure to other twisted
> tools makes it seem more reasonable.

Just thought I'd guess at the probably-intended references:

http://www.python.org/~jeremy/weblog/030418.html

http://twistedmatrix.com/products/spread

I'm not sure that going from cutesy talk about pickles and jars to cutesy
talk about jelly and bananas would be a pure win on its own .

...

> Both problems seem interestingly non-trivial; a storage layer that will
> prefetch related pickles, and an rpc/storage layer that can express
> those prefetching requests with appropriate security, concurrency, and
> prioritization.

"Yup" to both (interesting, and non-trivial).

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems

2005-05-31 Thread Tim Peters
[Jeremy Hylton]
>> It's really too bad that ZEO only allows a single outstanding request.
>>  Restructuring the protocol to allow multiple simulatenous requests
>> was on the task list years ago, but the protocol implementation is so
>> complex I doubt it will get done :-(.  I can't help but think building
>> on top of an existing message/RPC layer would be profitable.  (What's
>> twisted's RPC layer?)  Or at least something less difficult to use than
>> asyncore.

[Shane Hathaway]
> Do you think the RPC layer is the source of the problem?

Probably depends on what "the problem" refers to?  If the protocol allows
for at most one outstanding request, then that's clearly _a_ bottleneck,
right?

I get the impression that Jim thinks the ZEO protocol is simple.  I don't
know -- because I haven't had to "fix bugs" in it recently, I know little
about it.  It sure isn't obvious from staring at 8000+ lines of ZEO code,
and Jeremy, Guido and I spent weeks a few years ago "fixing bugs" then.  I
felt pretty lost the whole time, never sure how many threads there were,
which code those threads may be executing, how exactly asyncore cooperated
(or fought) with the threads, or even clear on which RPC calls were
synchronous and which async.  There's so much machinery of various kinds
that it's hard to divine the _intent_ of it all.  I remember that sometimes
the letter "x" gets sent to a socket, and that's important .

It was my ignorant impression at the time that asyncore didn't make much
sense here, because mixing threads with asyncore is always a nightmare in
reality, and a ZEO server doesn't intend to service hundreds of clients
simultaneously regardless.

Anyway, to the extent that the RPC machinery is mysterious, that's "a
problem" of its own.

> I feel like the way ZODB uses references in pickles is the main thing
> that slows it down.  Even if you have a protocol that can request many
> objects at once, the unpickling machinery only asks for one at a time.

I'm unclear on what "the unpickling machinery" means.  The most obvious
meaning is cPickle, but that doesn't ask for anything.  In general, no
object's state gets unpickled before something in the application _needs_
its state, so unpickling is currently driven by the application.  Maybe
you're suggesting some form of "eager" unpickling/state-materialization?

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems

2005-05-31 Thread Toby Dickenson
On Tuesday 31 May 2005 18:43, Shane Hathaway wrote:
> Jeremy Hylton wrote:
> > It's really too bad that ZEO only allows a single outstanding request.
> >  Restructuring the protocol to allow multiple simulatenous requests
> > was on the task list years ago, but the protocol implementation is so
> > complex I doubt it will get done :-(.  I can't help but think building
> > on top of an existing message/RPC layer would be profitable.  (What's
> > twisted's RPC layer?

pb; perspective broker. I thought that suggestion was crazy when Jeremey first 
presented it in his blog, but increasing exposure to other twisted tools 
makes it seem more reasonable.

In short, Ive not yet been disappointed by anything coming from the twisted 
camp.

> > )  Or at least something less difficult to use 
> > than asyncore.
> 
> Do you think the RPC layer is the source of the problem?  I feel like
> the way ZODB uses references in pickles is the main thing that slows it
> down.  Even if you have a protocol that can request many objects at
> once, the unpickling machinery only asks for one at a time.

Both problems seem interestingly non-trivial; a storage layer that will 
prefetch related pickles, and an rpc/storage layer that can express those 
prefetching requests with appropriate security, concurrency, and 
prioritization.


-- 
Toby Dickenson
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems

2005-05-31 Thread Shane Hathaway
Jeremy Hylton wrote:
> It's really too bad that ZEO only allows a single outstanding request.
>  Restructuring the protocol to allow multiple simulatenous requests
> was on the task list years ago, but the protocol implementation is so
> complex I doubt it will get done :-(.  I can't help but think building
> on top of an existing message/RPC layer would be profitable.  (What's
> twisted's RPC layer?)  Or at least something less difficult to use
> than asyncore.

Do you think the RPC layer is the source of the problem?  I feel like
the way ZODB uses references in pickles is the main thing that slows it
down.  Even if you have a protocol that can request many objects at
once, the unpickling machinery only asks for one at a time.

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-05-31 Thread Christian Theune
Am Dienstag, den 31.05.2005, 11:21 -0400 schrieb Tim Peters:
> I hope there's action before September -- that would be half a year after
> the branch was created, and merging gets harder over time.  Maybe we could
> do a virtual sprint via zodb-dev <0.9 wink>.

Ok. There is not much todo except of "find all the stuff that doesn't
work". I'm not too predestined for that because I wrote all the stuff,
so I'm convinced it works anyway ...

Something that is open: I introduced an inconsistency when doing the ZEO
support. The ZEO cache directory is in a different structure than the
ProxyBlobStorage, so the directory structures are incompatible. This
should be changed.

And if time permits (looking at the pretty stuffed calendar) I could
participated in a virtual sprint. At least we should set an official
plan (time and actions) how to integrate it into an upcoming 3.5.

Cheers,
Christian

-- 
gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 -
fax +49 3496 30 99 118 - zope and plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-05-31 Thread Tim Peters
[Paul Winkler]
>> Cool!  This is on ctheune-blobsupport branch, right? Did that ever get
>> merged to trunk?

[Christian Theune]
> AFAIK not. It was a "maybe" for the 3.4 release but really didn't have
> enough public exposure to make it in there.

That's all correct.  3.4 is in beta now, so blob support will be a
new-in-3.5 feature.

> ChrisM wanted to write some more unit tests for it which would rectify
> including it on the main branch. I hope I can make him do that at a
> sprint over here in September ... :)

I hope there's action before September -- that would be half a year after
the branch was created, and merging gets harder over time.  Maybe we could
do a virtual sprint via zodb-dev <0.9 wink>.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-31 Thread Christian Theune
Am Montag, den 30.05.2005, 23:59 -0400 schrieb Paul Winkler:
> On Mon, May 30, 2005 at 11:11:52AM +0200, Christian Theune wrote:
> > * Blocking during large streams is solved by a protocol extension for
> > streaming blobs. Additionally you can make the ZEO-Clients share the
> > Blob data via a networked file system.
> > * The pData cache problem is solved as well
> 
> Cool!  This is on ctheune-blobsupport branch, right?
> Did that ever get merged to trunk?

AFAIK not. It was a "maybe" for the 3.4 release but really didn't have
enough public exposure to make it in there.

ChrisM wanted to write some more unit tests for it which would rectify
including it on the main branch. I hope I can make him do that at a
sprint over here in September ... :)

Cheers,
Christian

-- 
gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 -
fax +49 3496 30 99 118 - zope and plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-30 Thread Paul Winkler
On Mon, May 30, 2005 at 11:11:52AM +0200, Christian Theune wrote:
> * Blocking during large streams is solved by a protocol extension for
> streaming blobs. Additionally you can make the ZEO-Clients share the
> Blob data via a networked file system.
> * The pData cache problem is solved as well

Cool!  This is on ctheune-blobsupport branch, right?
Did that ever get merged to trunk?

-- 

Paul Winkler
http://www.slinkp.com
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-30 Thread Dieter Maurer
Jeremy Hylton wrote at 2005-5-29 21:04 -0400:
> ...
>What's
>twisted's RPC layer?)  Or at least something less difficult to use
>than asyncore.

"medusa" (on top of "asyncore") obviously supports multi-threaded
request execution. But, probably, it is more difficult than
"asyncore"...

-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-30 Thread Christian Theune
Am Sonntag, den 29.05.2005, 10:37 +0200 schrieb Tino Wildenhain:
> > > Actually Pdata has some drawbacks. When the blobsupport branch gets
> > > declared stable (I think it's not gonna happen in 3.4, but nobody told
> > > me otherwise) we'll have really good blob support without this black
> > > magic.
> 
> Especially the ZEO handling of blobs could be improved IIRC.

* Blocking during large streams is solved by a protocol extension for
streaming blobs. Additionally you can make the ZEO-Clients share the
Blob data via a networked file system.
* The pData cache problem is solved as well

-- 
gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 -
fax +49 3496 30 99 118 - zope and plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Jeremy Hylton
On 5/29/05, Shane Hathaway <[EMAIL PROTECTED]> wrote:
> > Would a multi thread ZEO server improve anything here? Especially
> > with concurrent access?
> 
> It's possible.  Although ZEO talks over the network using async sockets,
> it reads files synchronously, so I suspect it will frequently sit around
> doing nothing for 10 ms, waiting for the disk to read data.  If your ZEO
> server has a load of 1.0 or more but low CPU usage, this is likely
> happening.  The easiest way to overcome this is to buy gigabytes of RAM
> for the ZEO server--ideally, enough gigabytes to hold your whole database.

A related problem is that the ZEO cache on the client is on disk, too.
 You may end up waiting for a disk seek to get it off disk on the
client.  If you've got it in memory on the server and if the ZEO
protocol were more efficient, that would be a drag.

> Also, the design of ZEO clients tends to serialize communication with
> the ZEO server, so the throughput between client and server is likely to
> be limited significantly by network latency.  "ping" is a good tool for
> measuring latency; 1 ms is good and .1 ms is excellent.  There are ways
> to tune the network.  You can also reduce the effects of network latency
> by creating and load balancing a lot of ZEO clients.

It's really too bad that ZEO only allows a single outstanding request.
 Restructuring the protocol to allow multiple simulatenous requests
was on the task list years ago, but the protocol implementation is so
complex I doubt it will get done :-(.  I can't help but think building
on top of an existing message/RPC layer would be profitable.  (What's
twisted's RPC layer?)  Or at least something less difficult to use
than asyncore.

Jeremy
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Shane Hathaway
Tino Wildenhain wrote:
> Am Sonntag, den 29.05.2005, 09:51 +0200 schrieb Andreas Jung:
>>The Pdata approach in general is not bad. I have implemented a CVS-like file
>>repository lately where we store binary content using a pdata like 
>>structure.
>>Our largest files are around (100MB) and the performance and efficiency is 
>>not bad
>>although it could be better. The bottleneck is either the ZEO communication 
>>or just the network.
>>I reach about 3.5 MB/second while reading such a large file from the ZEO 
>>server.
> 
> 
> Thats not bad given that this might at least saturate most customers
> downstream :)
> Would a multi thread ZEO server improve anything here? Especially
> with concurrent access?

It's possible.  Although ZEO talks over the network using async sockets,
it reads files synchronously, so I suspect it will frequently sit around
doing nothing for 10 ms, waiting for the disk to read data.  If your ZEO
server has a load of 1.0 or more but low CPU usage, this is likely
happening.  The easiest way to overcome this is to buy gigabytes of RAM
for the ZEO server--ideally, enough gigabytes to hold your whole database.

Also, the design of ZEO clients tends to serialize communication with
the ZEO server, so the throughput between client and server is likely to
be limited significantly by network latency.  "ping" is a good tool for
measuring latency; 1 ms is good and .1 ms is excellent.  There are ways
to tune the network.  You can also reduce the effects of network latency
by creating and load balancing a lot of ZEO clients.

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Tino Wildenhain
Am Sonntag, den 29.05.2005, 09:51 +0200 schrieb Andreas Jung:
> 
> --On 29. Mai 2005 11:29:06 +0200 Christian Theune <[EMAIL PROTECTED]> wrote:
> 
> > Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes:
> >> Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's
> >> OFS.Image.Image class (and also Zope3's implementation) is using a so
> >> called possible large data class (Pdata) that is a subclass of
> >> Persistent.
> >>
> >> Pdata is using a simple and genious approach to minimize the memory
> >> usage when storing large binary data in ZODB. The data is read from a
> >> [...]
> >
> > Actually Pdata has some drawbacks. When the blobsupport branch gets
> > declared stable (I think it's not gonna happen in 3.4, but nobody told
> > me otherwise) we'll have really good blob support without this black
> > magic.

Especially the ZEO handling of blobs could be improved IIRC.
> 
> The Pdata approach in general is not bad. I have implemented a CVS-like file
> repository lately where we store binary content using a pdata like 
> structure.
> Our largest files are around (100MB) and the performance and efficiency is 
> not bad
> although it could be better. The bottleneck is either the ZEO communication 
> or just the network.
> I reach about 3.5 MB/second while reading such a large file from the ZEO 
> server.

Thats not bad given that this might at least saturate most customers
downstream :)
Would a multi thread ZEO server improve anything here? Especially
with concurrent access?


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Andreas Jung



--On 29. Mai 2005 11:29:06 +0200 Christian Theune <[EMAIL PROTECTED]> wrote:


Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes:

Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's
OFS.Image.Image class (and also Zope3's implementation) is using a so
called possible large data class (Pdata) that is a subclass of
Persistent.

Pdata is using a simple and genious approach to minimize the memory
usage when storing large binary data in ZODB. The data is read from a
[...]


Actually Pdata has some drawbacks. When the blobsupport branch gets
declared stable (I think it's not gonna happen in 3.4, but nobody told
me otherwise) we'll have really good blob support without this black
magic.



The Pdata approach in general is not bad. I have implemented a CVS-like file
repository lately where we store binary content using a pdata like 
structure.
Our largest files are around (100MB) and the performance and efficiency is 
not bad
although it could be better. The bottleneck is either the ZEO communication 
or just the network.
I reach about 3.5 MB/second while reading such a large file from the ZEO 
server.


-aj


pgpZxItW5QTm4.pgp
Description: PGP signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-29 Thread Christian Theune
Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes:
> Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's 
> OFS.Image.Image class (and also Zope3's implementation) is using a so 
> called possible large data class (Pdata) that is a subclass of Persistent.
> 
> Pdata is using a simple and genious approach to minimize the memory 
> usage when storing large binary data in ZODB. The data is read from a 
> [...]

Actually Pdata has some drawbacks. When the blobsupport branch gets
declared stable (I think it's not gonna happen in 3.4, but nobody told
me otherwise) we'll have really good blob support without this black
magic.

Cheers,
Christian

-- 
gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 -
fax +49 3496 30 99 118 - zope and plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-05-24 Thread Tim Peters
[Jeremy Hylton]
> ...
> It looks like your application has a single persistent instance -- the
> root ExtendedTupleTable -- so there's no way for ZODB to manage the
> memory.  That object and everything reachable from it must be in memory
> at all times.

Indeed, I tried running this program under ZODB 3.4b1 on Windows, and
about 4% of the way done it dies during one of the subtransaction
commits, with a StackError:  the object is so sprawling that a
megabyte C stack is blown by recursion while trying to serialize it.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-05-22 Thread Tim Peters
[Jeremy Hylton]
> ...
> The ObjectInterning instance is another source of problem, because it's
> a dictionary that has an entry for every object you touch.

Some vital context was missing in this post.  Originally, on c.l.py, DJTB
wasn't using ZODB at all.  In effect, he had about 5000 lists each
containing about 5000 "not small" integers, so Python created about 5000**2
= 25 million integer objects to hold them all, consuming 100s of megabytes
of RAM.  However, due to the semantics of the application, there were only
about 5000 _distinct_ integers.  What became the `ObjectInterning` class
here started as a suggestion to keep a dict of the distinct integers,
effectively "intern"ing them.  That cut the memory use by a factor of
thousands.

This has all gotten generalized and micro-optimized to the point that I
can't follow the code anymore.  Regardless, the same basic trick won't work
with ZODB (or via any other way of storing the data to disk and reading it
up again later):  if we write the same "not small" integer object out
100 times, then read them all back in, Python will again create 100
distinct integer objects to hold them.  Object identity doesn't survive for
"second class" persistent objects, and interning needs to be applied again
_every_ time one is created.

[DJTB]
> ... The only thing I can't change is that ExtendedTuple inherits
> from tuple

Let me suggest that you may be jumping in at the deep ends of too many pools
at once here.

> class ExtendedTuple(tuple):
>
>def __init__(self, els):
>tuple.__init__(self,els)

That line doesn't accomplish anything:  tuples are immutable, and by the
time __init__ is called the tuple contents are already set forever.  You
should probably be overriding tuple.__new__ instead.

> ...
>def __hash__(self):
>return hash(tuple(self))

This method isn't needed.  If you leave it out, the base class
tuple.__hash__ will get called directly, and will compute the same result.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-22 Thread Jeremy Hylton
On 5/21/05, DJTB <[EMAIL PROTECTED]> wrote:
> [posted to comp.lang.python, mailed to [EMAIL PROTECTED]

[Following up to both places.]

> I'm having problems storing large amounts of objects in a ZODB.
> After committing changes to the database, elements are not cleared from
> memory. Since the number of objects I'd like to store in the ZODB is too
> large to fit in RAM, my program gets killed with signal 11 or signal 9...

The problem here is a common one with a first attempt at using ZODB. 
The problem is that ZODB manages memory at the granularity of
first-class persistent objects --  that is, instances of classes that
inherit from Persistent.  ZODB can move such objects in and out of
memory at transaction boundaries, which allows your application to use
many more objects than it has physical memory for.

It looks like your application has a single persistent instance -- the
root ExtendedTupleTable -- so there's no way for ZODB to manage the
memory.  That object and everything reachable from it must be in
memory at all times.

You need to re-structure the program so that is has more first-class
persistent objects.  If, for example, the ExtendedTuple objects
inherited from Persistent, then they could reside on disk except when
you are manipulating them.

The ObjectInterning instance is another source of problem, because
it's a dictionary that has an entry for every object you touch.  The
various other dictionaries in your program will also be memory hogs in
they have very many entries.  The typical way to structure a ZODB
program is to use one of the BTrees implementation types instead of a
dictionary, because the BTree does not keep all its keys and values in
memory at one time.  (Its internal organization is a large collection
of first-class persistent objects representing the BTree buckets and
internal tree nodes.)

You must use some care with BTrees, because the data structure
maintains a total ordering on the keys.  (And a dictionary does not.) 
  The ZODB/ZEO programming guide has a good section on BTrees here:
http://www.zope.org/Wikis/ZODB/guide/node6.html

Jeremy
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-21 Thread Dieter Maurer
DJTB wrote at 2005-5-21 13:00 +0200:
> ...
>I'm having problems storing large amounts of objects in a ZODB.
>After committing changes to the database, elements are not cleared from
>memory.

You can control how many objects the ZODB cache may contain.

Note, however, that the objects are usually flushed from cache
only at transaction boundaries.

Furthermore, there are methods to flush individual objects
from cache ("obj._p_invalidate()"), perform a cache cleanup
mid-transaction ("connection.cacheGC()") and perform
a full flush "connection.cacheMinimize()").

Note that an object can only be flushed from the cache
when it was not modified in the current transaction.
This is independent from the way you try to flush it
("_p_invalidate", "cacheGC" or "cacheMinimize").

-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)

2005-05-21 Thread Christian Heimes

DJTB wrote:

What should I do to make sure RAM is no longer a limiting factor?
(in other words: The program should work with any (large) value of
self.__range and self.__et_count
Because in my case, self.__et_count = 5000 is only a toy example...)
I'm now working on a PC with 2.5 GB RAM and even that's not enough!


Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's 
OFS.Image.Image class (and also Zope3's implementation) is using a so 
called possible large data class (Pdata) that is a subclass of Persistent.


Pdata is using a simple and genious approach to minimize the memory 
usage when storing large binary data in ZODB. The data is read from a 
temporary file chunk by chunk. Each chunk is stored inside a Pdata 
object and committed in a subtransaction. The Pdata objects are linked 
in a simple linear chain just like a linear list connected with pointers 
in old style C.


Try to understand the code. It might help to solve your problem. In 
general: Don't try to store large data in one block like a binary 
string. Use small, persistent chunks.


Christian
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev