RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
Am Mittwoch, den 01.06.2005, 12:59 -0400 schrieb Chris McDonough: > How's Sat June 11? I'll just mark that on my calendar as a day to write > some blob tests and I'll catch you (and whoever else might want to > contribute) on irc.freenode.net between then and now to talk about what > needs to be tested? Sounds like a good plan to me. -- gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 - fax +49 3496 30 99 118 - zope and plone consulting and development signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
On Wed, 2005-06-01 at 12:04 +0200, Christian Theune wrote: > Am Dienstag, den 31.05.2005, 16:02 -0400 schrieb Chris McDonough: > > Christian, > > > > Can we pick a day next week to have a sprint via IRC? Would you be > > willing to help write some tests during that sprint? > > Does "next week" include saturday/sunday? We just acquired a new client > and next week is very busy. The weekend would be possible for me. Maybe > a one-hour planning chat would be possible during the week, so people > could get started. OTOH doing it all together might be even better. > > Eventually my friday afternoon (like 4pm over here) would mean the > normal Friday to you, so we can leverage the time shift to get me more > time. 8) > > And yes, I'm willing to write tests, if someone hints me at the test > cases. :) How's Sat June 11? I'll just mark that on my calendar as a day to write some blob tests and I'll catch you (and whoever else might want to contribute) on irc.freenode.net between then and now to talk about what needs to be tested? - C ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
Am Dienstag, den 31.05.2005, 16:02 -0400 schrieb Chris McDonough: > Christian, > > Can we pick a day next week to have a sprint via IRC? Would you be > willing to help write some tests during that sprint? Does "next week" include saturday/sunday? We just acquired a new client and next week is very busy. The weekend would be possible for me. Maybe a one-hour planning chat would be possible during the week, so people could get started. OTOH doing it all together might be even better. Eventually my friday afternoon (like 4pm over here) would mean the normal Friday to you, so we can leverage the time shift to get me more time. 8) And yes, I'm willing to write tests, if someone hints me at the test cases. :) Christian -- gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 - fax +49 3496 30 99 118 - zope and plone consulting and development signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems
Tim Peters wrote: > [Jeremy Hylton] > >>>It's really too bad that ZEO only allows a single outstanding request. >>> Restructuring the protocol to allow multiple simulatenous requests >>>was on the task list years ago, but the protocol implementation is so >>>complex I doubt it will get done :-(. I can't help but think building >>>on top of an existing message/RPC layer would be profitable. (What's >>>twisted's RPC layer?) Or at least something less difficult to use than >>>asyncore. > > > [Shane Hathaway] > >>Do you think the RPC layer is the source of the problem? > > > Probably depends on what "the problem" refers to? If the protocol allows > for at most one outstanding request, then that's clearly _a_ bottleneck, > right? Yes. I meant to ask whether the RPC layer is currently the worst bottleneck. Lately I've been dealing with problems that require a minimum of 50% utilization of a gigabit network connection, and the 3.5 MB/s figure Andreas quoted made me cringe. ;-) > I get the impression that Jim thinks the ZEO protocol is simple. I don't > know -- because I haven't had to "fix bugs" in it recently, I know little > about it. It sure isn't obvious from staring at 8000+ lines of ZEO code, > and Jeremy, Guido and I spent weeks a few years ago "fixing bugs" then. I > felt pretty lost the whole time, never sure how many threads there were, > which code those threads may be executing, how exactly asyncore cooperated > (or fought) with the threads, or even clear on which RPC calls were > synchronous and which async. There's so much machinery of various kinds > that it's hard to divine the _intent_ of it all. I remember that sometimes > the letter "x" gets sent to a socket, and that's important . Yes, it is mysterious. Poor readability seems to be common with code that deals with both events (i.e. asyncore events) and many threads. I used to mix them at a whim. Lately I've used both events and threads, but not in the same program, and I think it's done a lot of good for the maintainability of what I've written. > It was my ignorant impression at the time that asyncore didn't make much > sense here, because mixing threads with asyncore is always a nightmare in > reality, and a ZEO server doesn't intend to service hundreds of clients > simultaneously regardless. Having stared at ZEO for a while, I've convinced myself that the ZEO client code has no reason to use asyncore. A blocking socket and makefile() seem like a much better fit. I'm not sure whether the ZEO server should be event driven or threaded, but being both is probably wrong. Since it's event driven now, the ZEO server may be less susceptible to concurrency gremlins than it would be with threads. However, last time I looked, the ZEO server uses a few threads for miscellaneous work. >>I feel like the way ZODB uses references in pickles is the main thing >>that slows it down. Even if you have a protocol that can request many >>objects at once, the unpickling machinery only asks for one at a time. > > > I'm unclear on what "the unpickling machinery" means. The most obvious > meaning is cPickle, but that doesn't ask for anything. In general, no > object's state gets unpickled before something in the application _needs_ > its state, so unpickling is currently driven by the application. Maybe > you're suggesting some form of "eager" unpickling/state-materialization? Yes. For each object, ZODB could store a list of referenced OIDs. When ZODB is about to unpickle an object, it could read the list of referenced OIDs and tell its storage that it will need the pickles for those objects very shortly (except the objects already loaded.) Then the ZEO client code could make a single request for all of the referenced objects that aren't already in the cache. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
Christian, Can we pick a day next week to have a sprint via IRC? Would you be willing to help write some tests during that sprint? On Tue, 2005-05-31 at 11:21 -0400, Tim Peters wrote: > [Paul Winkler] > >> Cool! This is on ctheune-blobsupport branch, right? Did that ever get > >> merged to trunk? > > [Christian Theune] > > AFAIK not. It was a "maybe" for the 3.4 release but really didn't have > > enough public exposure to make it in there. > > That's all correct. 3.4 is in beta now, so blob support will be a > new-in-3.5 feature. > > > ChrisM wanted to write some more unit tests for it which would rectify > > including it on the main branch. I hope I can make him do that at a > > sprint over here in September ... :) > > I hope there's action before September -- that would be half a year after > the branch was created, and merging gets harder over time. Maybe we could > do a virtual sprint via zodb-dev <0.9 wink>. > > ___ > For more information about ZODB, see the ZODB Wiki: > http://www.zope.org/Wikis/ZODB/ > > ZODB-Dev mailing list - ZODB-Dev@zope.org > http://mail.zope.org/mailman/listinfo/zodb-dev > ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems
... [Toby Dickenson] > pb; perspective broker. I thought that suggestion was crazy when Jeremey > first presented it in his blog, but increasing exposure to other twisted > tools makes it seem more reasonable. Just thought I'd guess at the probably-intended references: http://www.python.org/~jeremy/weblog/030418.html http://twistedmatrix.com/products/spread I'm not sure that going from cutesy talk about pickles and jars to cutesy talk about jelly and bananas would be a pure win on its own . ... > Both problems seem interestingly non-trivial; a storage layer that will > prefetch related pickles, and an rpc/storage layer that can express > those prefetching requests with appropriate security, concurrency, and > prioritization. "Yup" to both (interesting, and non-trivial). ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems
[Jeremy Hylton] >> It's really too bad that ZEO only allows a single outstanding request. >> Restructuring the protocol to allow multiple simulatenous requests >> was on the task list years ago, but the protocol implementation is so >> complex I doubt it will get done :-(. I can't help but think building >> on top of an existing message/RPC layer would be profitable. (What's >> twisted's RPC layer?) Or at least something less difficult to use than >> asyncore. [Shane Hathaway] > Do you think the RPC layer is the source of the problem? Probably depends on what "the problem" refers to? If the protocol allows for at most one outstanding request, then that's clearly _a_ bottleneck, right? I get the impression that Jim thinks the ZEO protocol is simple. I don't know -- because I haven't had to "fix bugs" in it recently, I know little about it. It sure isn't obvious from staring at 8000+ lines of ZEO code, and Jeremy, Guido and I spent weeks a few years ago "fixing bugs" then. I felt pretty lost the whole time, never sure how many threads there were, which code those threads may be executing, how exactly asyncore cooperated (or fought) with the threads, or even clear on which RPC calls were synchronous and which async. There's so much machinery of various kinds that it's hard to divine the _intent_ of it all. I remember that sometimes the letter "x" gets sent to a socket, and that's important . It was my ignorant impression at the time that asyncore didn't make much sense here, because mixing threads with asyncore is always a nightmare in reality, and a ZEO server doesn't intend to service hundreds of clients simultaneously regardless. Anyway, to the extent that the RPC machinery is mysterious, that's "a problem" of its own. > I feel like the way ZODB uses references in pickles is the main thing > that slows it down. Even if you have a protocol that can request many > objects at once, the unpickling machinery only asks for one at a time. I'm unclear on what "the unpickling machinery" means. The most obvious meaning is cPickle, but that doesn't ask for anything. In general, no object's state gets unpickled before something in the application _needs_ its state, so unpickling is currently driven by the application. Maybe you're suggesting some form of "eager" unpickling/state-materialization? ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems
On Tuesday 31 May 2005 18:43, Shane Hathaway wrote: > Jeremy Hylton wrote: > > It's really too bad that ZEO only allows a single outstanding request. > > Restructuring the protocol to allow multiple simulatenous requests > > was on the task list years ago, but the protocol implementation is so > > complex I doubt it will get done :-(. I can't help but think building > > on top of an existing message/RPC layer would be profitable. (What's > > twisted's RPC layer? pb; perspective broker. I thought that suggestion was crazy when Jeremey first presented it in his blog, but increasing exposure to other twisted tools makes it seem more reasonable. In short, Ive not yet been disappointed by anything coming from the twisted camp. > > ) Or at least something less difficult to use > > than asyncore. > > Do you think the RPC layer is the source of the problem? I feel like > the way ZODB uses references in pickles is the main thing that slows it > down. Even if you have a protocol that can request many objects at > once, the unpickling machinery only asks for one at a time. Both problems seem interestingly non-trivial; a storage layer that will prefetch related pickles, and an rpc/storage layer that can express those prefetching requests with appropriate security, concurrency, and prioritization. -- Toby Dickenson ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems
Jeremy Hylton wrote: > It's really too bad that ZEO only allows a single outstanding request. > Restructuring the protocol to allow multiple simulatenous requests > was on the task list years ago, but the protocol implementation is so > complex I doubt it will get done :-(. I can't help but think building > on top of an existing message/RPC layer would be profitable. (What's > twisted's RPC layer?) Or at least something less difficult to use > than asyncore. Do you think the RPC layer is the source of the problem? I feel like the way ZODB uses references in pickles is the main thing that slows it down. Even if you have a protocol that can request many objects at once, the unpickling machinery only asks for one at a time. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
Am Dienstag, den 31.05.2005, 11:21 -0400 schrieb Tim Peters: > I hope there's action before September -- that would be half a year after > the branch was created, and merging gets harder over time. Maybe we could > do a virtual sprint via zodb-dev <0.9 wink>. Ok. There is not much todo except of "find all the stuff that doesn't work". I'm not too predestined for that because I wrote all the stuff, so I'm convinced it works anyway ... Something that is open: I introduced an inconsistency when doing the ZEO support. The ZEO cache directory is in a different structure than the ProxyBlobStorage, so the directory structures are incompatible. This should be changed. And if time permits (looking at the pretty stuffed calendar) I could participated in a virtual sprint. At least we should set an official plan (time and actions) how to integrate it into an upcoming 3.5. Cheers, Christian -- gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 - fax +49 3496 30 99 118 - zope and plone consulting and development signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
[Paul Winkler] >> Cool! This is on ctheune-blobsupport branch, right? Did that ever get >> merged to trunk? [Christian Theune] > AFAIK not. It was a "maybe" for the 3.4 release but really didn't have > enough public exposure to make it in there. That's all correct. 3.4 is in beta now, so blob support will be a new-in-3.5 feature. > ChrisM wanted to write some more unit tests for it which would rectify > including it on the main branch. I hope I can make him do that at a > sprint over here in September ... :) I hope there's action before September -- that would be half a year after the branch was created, and merging gets harder over time. Maybe we could do a virtual sprint via zodb-dev <0.9 wink>. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
Am Montag, den 30.05.2005, 23:59 -0400 schrieb Paul Winkler: > On Mon, May 30, 2005 at 11:11:52AM +0200, Christian Theune wrote: > > * Blocking during large streams is solved by a protocol extension for > > streaming blobs. Additionally you can make the ZEO-Clients share the > > Blob data via a networked file system. > > * The pData cache problem is solved as well > > Cool! This is on ctheune-blobsupport branch, right? > Did that ever get merged to trunk? AFAIK not. It was a "maybe" for the 3.4 release but really didn't have enough public exposure to make it in there. ChrisM wanted to write some more unit tests for it which would rectify including it on the main branch. I hope I can make him do that at a sprint over here in September ... :) Cheers, Christian -- gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 - fax +49 3496 30 99 118 - zope and plone consulting and development signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
On Mon, May 30, 2005 at 11:11:52AM +0200, Christian Theune wrote: > * Blocking during large streams is solved by a protocol extension for > streaming blobs. Additionally you can make the ZEO-Clients share the > Blob data via a networked file system. > * The pData cache problem is solved as well Cool! This is on ctheune-blobsupport branch, right? Did that ever get merged to trunk? -- Paul Winkler http://www.slinkp.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
Jeremy Hylton wrote at 2005-5-29 21:04 -0400: > ... >What's >twisted's RPC layer?) Or at least something less difficult to use >than asyncore. "medusa" (on top of "asyncore") obviously supports multi-threaded request execution. But, probably, it is more difficult than "asyncore"... -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
Am Sonntag, den 29.05.2005, 10:37 +0200 schrieb Tino Wildenhain: > > > Actually Pdata has some drawbacks. When the blobsupport branch gets > > > declared stable (I think it's not gonna happen in 3.4, but nobody told > > > me otherwise) we'll have really good blob support without this black > > > magic. > > Especially the ZEO handling of blobs could be improved IIRC. * Blocking during large streams is solved by a protocol extension for streaming blobs. Additionally you can make the ZEO-Clients share the Blob data via a networked file system. * The pData cache problem is solved as well -- gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 - fax +49 3496 30 99 118 - zope and plone consulting and development signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
On 5/29/05, Shane Hathaway <[EMAIL PROTECTED]> wrote: > > Would a multi thread ZEO server improve anything here? Especially > > with concurrent access? > > It's possible. Although ZEO talks over the network using async sockets, > it reads files synchronously, so I suspect it will frequently sit around > doing nothing for 10 ms, waiting for the disk to read data. If your ZEO > server has a load of 1.0 or more but low CPU usage, this is likely > happening. The easiest way to overcome this is to buy gigabytes of RAM > for the ZEO server--ideally, enough gigabytes to hold your whole database. A related problem is that the ZEO cache on the client is on disk, too. You may end up waiting for a disk seek to get it off disk on the client. If you've got it in memory on the server and if the ZEO protocol were more efficient, that would be a drag. > Also, the design of ZEO clients tends to serialize communication with > the ZEO server, so the throughput between client and server is likely to > be limited significantly by network latency. "ping" is a good tool for > measuring latency; 1 ms is good and .1 ms is excellent. There are ways > to tune the network. You can also reduce the effects of network latency > by creating and load balancing a lot of ZEO clients. It's really too bad that ZEO only allows a single outstanding request. Restructuring the protocol to allow multiple simulatenous requests was on the task list years ago, but the protocol implementation is so complex I doubt it will get done :-(. I can't help but think building on top of an existing message/RPC layer would be profitable. (What's twisted's RPC layer?) Or at least something less difficult to use than asyncore. Jeremy ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
Tino Wildenhain wrote: > Am Sonntag, den 29.05.2005, 09:51 +0200 schrieb Andreas Jung: >>The Pdata approach in general is not bad. I have implemented a CVS-like file >>repository lately where we store binary content using a pdata like >>structure. >>Our largest files are around (100MB) and the performance and efficiency is >>not bad >>although it could be better. The bottleneck is either the ZEO communication >>or just the network. >>I reach about 3.5 MB/second while reading such a large file from the ZEO >>server. > > > Thats not bad given that this might at least saturate most customers > downstream :) > Would a multi thread ZEO server improve anything here? Especially > with concurrent access? It's possible. Although ZEO talks over the network using async sockets, it reads files synchronously, so I suspect it will frequently sit around doing nothing for 10 ms, waiting for the disk to read data. If your ZEO server has a load of 1.0 or more but low CPU usage, this is likely happening. The easiest way to overcome this is to buy gigabytes of RAM for the ZEO server--ideally, enough gigabytes to hold your whole database. Also, the design of ZEO clients tends to serialize communication with the ZEO server, so the throughput between client and server is likely to be limited significantly by network latency. "ping" is a good tool for measuring latency; 1 ms is good and .1 ms is excellent. There are ways to tune the network. You can also reduce the effects of network latency by creating and load balancing a lot of ZEO clients. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
Am Sonntag, den 29.05.2005, 09:51 +0200 schrieb Andreas Jung: > > --On 29. Mai 2005 11:29:06 +0200 Christian Theune <[EMAIL PROTECTED]> wrote: > > > Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes: > >> Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's > >> OFS.Image.Image class (and also Zope3's implementation) is using a so > >> called possible large data class (Pdata) that is a subclass of > >> Persistent. > >> > >> Pdata is using a simple and genious approach to minimize the memory > >> usage when storing large binary data in ZODB. The data is read from a > >> [...] > > > > Actually Pdata has some drawbacks. When the blobsupport branch gets > > declared stable (I think it's not gonna happen in 3.4, but nobody told > > me otherwise) we'll have really good blob support without this black > > magic. Especially the ZEO handling of blobs could be improved IIRC. > > The Pdata approach in general is not bad. I have implemented a CVS-like file > repository lately where we store binary content using a pdata like > structure. > Our largest files are around (100MB) and the performance and efficiency is > not bad > although it could be better. The bottleneck is either the ZEO communication > or just the network. > I reach about 3.5 MB/second while reading such a large file from the ZEO > server. Thats not bad given that this might at least saturate most customers downstream :) Would a multi thread ZEO server improve anything here? Especially with concurrent access? ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
--On 29. Mai 2005 11:29:06 +0200 Christian Theune <[EMAIL PROTECTED]> wrote: Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes: Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's OFS.Image.Image class (and also Zope3's implementation) is using a so called possible large data class (Pdata) that is a subclass of Persistent. Pdata is using a simple and genious approach to minimize the memory usage when storing large binary data in ZODB. The data is read from a [...] Actually Pdata has some drawbacks. When the blobsupport branch gets declared stable (I think it's not gonna happen in 3.4, but nobody told me otherwise) we'll have really good blob support without this black magic. The Pdata approach in general is not bad. I have implemented a CVS-like file repository lately where we store binary content using a pdata like structure. Our largest files are around (100MB) and the performance and efficiency is not bad although it could be better. The bottleneck is either the ZEO communication or just the network. I reach about 3.5 MB/second while reading such a large file from the ZEO server. -aj pgpZxItW5QTm4.pgp Description: PGP signature ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
Am Samstag, den 21.05.2005, 17:38 +0200 schrieb Christian Heimes: > Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's > OFS.Image.Image class (and also Zope3's implementation) is using a so > called possible large data class (Pdata) that is a subclass of Persistent. > > Pdata is using a simple and genious approach to minimize the memory > usage when storing large binary data in ZODB. The data is read from a > [...] Actually Pdata has some drawbacks. When the blobsupport branch gets declared stable (I think it's not gonna happen in 3.4, but nobody told me otherwise) we'll have really good blob support without this black magic. Cheers, Christian -- gocept gmbh & co. kg - schalaunische str. 6 - 06366 koethen - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 3496 30 99 112 - fax +49 3496 30 99 118 - zope and plone consulting and development signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
[Jeremy Hylton] > ... > It looks like your application has a single persistent instance -- the > root ExtendedTupleTable -- so there's no way for ZODB to manage the > memory. That object and everything reachable from it must be in memory > at all times. Indeed, I tried running this program under ZODB 3.4b1 on Windows, and about 4% of the way done it dies during one of the subtransaction commits, with a StackError: the object is so sprawling that a megabyte C stack is blown by recursion while trying to serialize it. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)
[Jeremy Hylton] > ... > The ObjectInterning instance is another source of problem, because it's > a dictionary that has an entry for every object you touch. Some vital context was missing in this post. Originally, on c.l.py, DJTB wasn't using ZODB at all. In effect, he had about 5000 lists each containing about 5000 "not small" integers, so Python created about 5000**2 = 25 million integer objects to hold them all, consuming 100s of megabytes of RAM. However, due to the semantics of the application, there were only about 5000 _distinct_ integers. What became the `ObjectInterning` class here started as a suggestion to keep a dict of the distinct integers, effectively "intern"ing them. That cut the memory use by a factor of thousands. This has all gotten generalized and micro-optimized to the point that I can't follow the code anymore. Regardless, the same basic trick won't work with ZODB (or via any other way of storing the data to disk and reading it up again later): if we write the same "not small" integer object out 100 times, then read them all back in, Python will again create 100 distinct integer objects to hold them. Object identity doesn't survive for "second class" persistent objects, and interning needs to be applied again _every_ time one is created. [DJTB] > ... The only thing I can't change is that ExtendedTuple inherits > from tuple Let me suggest that you may be jumping in at the deep ends of too many pools at once here. > class ExtendedTuple(tuple): > >def __init__(self, els): >tuple.__init__(self,els) That line doesn't accomplish anything: tuples are immutable, and by the time __init__ is called the tuple contents are already set forever. You should probably be overriding tuple.__new__ instead. > ... >def __hash__(self): >return hash(tuple(self)) This method isn't needed. If you leave it out, the base class tuple.__hash__ will get called directly, and will compute the same result. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
On 5/21/05, DJTB <[EMAIL PROTECTED]> wrote: > [posted to comp.lang.python, mailed to [EMAIL PROTECTED] [Following up to both places.] > I'm having problems storing large amounts of objects in a ZODB. > After committing changes to the database, elements are not cleared from > memory. Since the number of objects I'd like to store in the ZODB is too > large to fit in RAM, my program gets killed with signal 11 or signal 9... The problem here is a common one with a first attempt at using ZODB. The problem is that ZODB manages memory at the granularity of first-class persistent objects -- that is, instances of classes that inherit from Persistent. ZODB can move such objects in and out of memory at transaction boundaries, which allows your application to use many more objects than it has physical memory for. It looks like your application has a single persistent instance -- the root ExtendedTupleTable -- so there's no way for ZODB to manage the memory. That object and everything reachable from it must be in memory at all times. You need to re-structure the program so that is has more first-class persistent objects. If, for example, the ExtendedTuple objects inherited from Persistent, then they could reside on disk except when you are manipulating them. The ObjectInterning instance is another source of problem, because it's a dictionary that has an entry for every object you touch. The various other dictionaries in your program will also be memory hogs in they have very many entries. The typical way to structure a ZODB program is to use one of the BTrees implementation types instead of a dictionary, because the BTree does not keep all its keys and values in memory at one time. (Its internal organization is a large collection of first-class persistent objects representing the BTree buckets and internal tree nodes.) You must use some care with BTrees, because the data structure maintains a total ordering on the keys. (And a dictionary does not.) The ZODB/ZEO programming guide has a good section on BTrees here: http://www.zope.org/Wikis/ZODB/guide/node6.html Jeremy ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
DJTB wrote at 2005-5-21 13:00 +0200: > ... >I'm having problems storing large amounts of objects in a ZODB. >After committing changes to the database, elements are not cleared from >memory. You can control how many objects the ZODB cache may contain. Note, however, that the objects are usually flushed from cache only at transaction boundaries. Furthermore, there are methods to flush individual objects from cache ("obj._p_invalidate()"), perform a cache cleanup mid-transaction ("connection.cacheGC()") and perform a full flush "connection.cacheMinimize()"). Note that an object can only be flushed from the cache when it was not modified in the current transaction. This is independent from the way you try to flush it ("_p_invalidate", "cacheGC" or "cacheMinimize"). -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB memory problems (was: processing a Very Large file)
DJTB wrote: What should I do to make sure RAM is no longer a limiting factor? (in other words: The program should work with any (large) value of self.__range and self.__et_count Because in my case, self.__et_count = 5000 is only a toy example...) I'm now working on a PC with 2.5 GB RAM and even that's not enough! Grab the Zope2 sources and read lib/python/OFS/Image.py. Zope's OFS.Image.Image class (and also Zope3's implementation) is using a so called possible large data class (Pdata) that is a subclass of Persistent. Pdata is using a simple and genious approach to minimize the memory usage when storing large binary data in ZODB. The data is read from a temporary file chunk by chunk. Each chunk is stored inside a Pdata object and committed in a subtransaction. The Pdata objects are linked in a simple linear chain just like a linear list connected with pointers in old style C. Try to understand the code. It might help to solve your problem. In general: Don't try to store large data in one block like a binary string. Use small, persistent chunks. Christian ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev