Re: [ZODB-Dev] Using zodb and blobs
Hello Tres, thanks for your detailed answers! Am 12.04.2010, 22:42 Uhr, schrieb Tres Seaver tsea...@palladion.com: Additionally I made some quick performance tests. I committed 1kb sized objects and I can do about 40 transaction/s if one object is changed per transaction. For 100kb objects it's also around 40 transactions/s. Only for object sizes bigger than that the raw I/O throughput seems to start to matter. 40 tps sounds low: are you pushing blob content over the wire somehow? No, that test was with a plain file storage. Just a plain Persistent object with a differently sized string and an integer attribute. I did something like 1) create object with attribute x (integer) and y (variably sized string) 2) for i in range(100): obj.x = i; transaction.commit() 3) Measure time taken for step 2 Still don't know the answers to these: - Does it make sense to use ZODB in this scenario? My data is not suited well for an RDBMS. YMMV. I still default to using ZODB for anything at all, unless the problem smells very strongly relational. Ok, the problem at hand certainly doesn't smell relational. It is more about storing lots of different data than querying it extensively. It's a mixture of digital asset management (the blobs are useful for this part) and projects which reference the assets. The projects are shared between the clients and will consist of a big tree with Persistent objects hooked up to it. - Are there more complications to blobs other than a slightly different backup procedure? You need to think about how the blob data is shared between ZEO clients (your appserver) and the ZEO storage server: opinions vary here, but I would prefer to have the blobs living in a writable shared filesystem, in order to avoid the necessity of fetching their data over ZEO on the individual clients which were not the one pushing the blob into the database. The zeo server and clients will be in different physical locations, so I'd probably have to employ some shared filesystem which can deal with that. Speaking of locations of server and clients, is it a problem - as in zeo will perform very badly under these circumstances as it was not designed for this - if they are not in the same location (typical latency 0-100ms)? - Are there any performance penalties by using very large invalidation queues (i.e. 300,000 objects) to reduce client cache verification time? At a minimum, RAM occupied by that queue might be better used elsewhere. I just don't use persistent caches, and tend to reboot appservers in rotation after the ZEO storage has been down for any significant period (almost never happens). In my case the clients might be down for a couple of days (typically 1 or 2 days) and they should not spend 30 mins in cache verification time each time they reconnect. So if these 300k objects take up 1k each, then they occupy 300 MB of ram which I am fine with. From what I've read it only seems to consume memory. Note that the ZEO storage server makes copies of that queue to avoid race conditions. Ok, I can see how copying and storing 300k objects is slow and can take up excessive amounts of memory. Thanks, -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Using zodb and blobs
40 tps sounds low: are you pushing blob content over the wire somehow? I have seen the ZEO storage committing transactions at least an order of magnitude faster than that (e.g., when processing incoming newswire feeds). I would guess that there could have been some other latencies involved in your setup (e.g., that 0-100ms lag you mention below). See my attached test script. It outputs 45-55 transactions/s for 100 byte sized payload. Maybe there's a very fundamental flaw in the way the test is setup. Note that I am testing on a regular desktop machine (Windows 7, WoW64, 4GB RAM, 1TB hard disk capable of transfer rates 100MB/s). The zeo server and clients will be in different physical locations, so I'd probably have to employ some shared filesystem which can deal with that. Speaking of locations of server and clients, is it a problem - as in zeo will perform very badly under these circumstances as it was not designed for this - if they are not in the same location (typical latency 0-100ms)? That depends on the mix of reads and writes in your application. I have personnally witnessed a case where the clients stayed up and serving pages over a whole weekend in a clusterfsck where both the ZEO server and the monitoring infrastructure went belly up. This was for a large corporate intranet, in case that helps: the problem surfaced mid-morning on Monday when the employee in charge of updating the lunch menu for the week couldn't save the changes. Haha, I hope they solved this critical problem in time! In my case the clients might be down for a couple of days (typically 1 or 2 days) and they should not spend 30 mins in cache verification time each time they reconnect. So if these 300k objects take up 1k each, then they occupy 300 MB of ram which I am fine with. If the client is disconnected for any period of time, it is far more likely that just dumping the cache and starting over fresh will be a win. The 'invalidation_queue' is primarily to support clients which remain up while the storage server is down or unreachable. Yes, taking the verification time hit is my plan for now. However, dumping the whole client cache is something I'd like to avoid, since the app I am working on will not work over a corporate intranet and thus the bandwidth for transferring the blobs is limited (and so can take up considerable time). Maybe I am overestimating the whole client cache problem though. Thanks again for your valuable advice, -Matthias test.py Description: Binary data ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Using zodb and blobs
Running your test script on my small amazon EC2 instance on linux takes between 0.0 and 0.04 seconds (I had to remove the divide by total to avoid a zero division error). 0.02 is 5000/s. Laurence On 14 April 2010 00:25, Nitro ni...@dr-code.org wrote: 40 tps sounds low: are you pushing blob content over the wire somehow? I have seen the ZEO storage committing transactions at least an order of magnitude faster than that (e.g., when processing incoming newswire feeds). I would guess that there could have been some other latencies involved in your setup (e.g., that 0-100ms lag you mention below). See my attached test script. It outputs 45-55 transactions/s for 100 byte sized payload. Maybe there's a very fundamental flaw in the way the test is setup. Note that I am testing on a regular desktop machine (Windows 7, WoW64, 4GB RAM, 1TB hard disk capable of transfer rates 100MB/s). The zeo server and clients will be in different physical locations, so I'd probably have to employ some shared filesystem which can deal with that. Speaking of locations of server and clients, is it a problem - as in zeo will perform very badly under these circumstances as it was not designed for this - if they are not in the same location (typical latency 0-100ms)? That depends on the mix of reads and writes in your application. I have personnally witnessed a case where the clients stayed up and serving pages over a whole weekend in a clusterfsck where both the ZEO server and the monitoring infrastructure went belly up. This was for a large corporate intranet, in case that helps: the problem surfaced mid-morning on Monday when the employee in charge of updating the lunch menu for the week couldn't save the changes. Haha, I hope they solved this critical problem in time! In my case the clients might be down for a couple of days (typically 1 or 2 days) and they should not spend 30 mins in cache verification time each time they reconnect. So if these 300k objects take up 1k each, then they occupy 300 MB of ram which I am fine with. If the client is disconnected for any period of time, it is far more likely that just dumping the cache and starting over fresh will be a win. The 'invalidation_queue' is primarily to support clients which remain up while the storage server is down or unreachable. Yes, taking the verification time hit is my plan for now. However, dumping the whole client cache is something I'd like to avoid, since the app I am working on will not work over a corporate intranet and thus the bandwidth for transferring the blobs is limited (and so can take up considerable time). Maybe I am overestimating the whole client cache problem though. Thanks again for your valuable advice, -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Using zodb and blobs
Am 14.04.2010, 04:08 Uhr, schrieb Laurence Rowe l...@lrowe.co.uk: Running your test script on my small amazon EC2 instance on linux takes between 0.0 and 0.04 seconds (I had to remove the divide by total to avoid a zero division error). 0.02 is 5000/s. Thanks for running the test. Intrigued by this extreme difference I've done a little run with cProfile, script attached. On my machine the 100 runs take ~2.65 seconds, at least 2.55 seconds are spent in the nt.fsync function. That's an alias for os.fsync on windows. According to the python docs it calls the _commit C function on windows. See http://msdn.microsoft.com/en-us/library/17618685(VS.80).aspx . I wonder if _commit is really *that* slow or if there's another (faster) function which can be called... -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Using zodb and blobs
[Nitro] ... I wonder if _commit is really *that* slow Six years ago I timed factor-of-100 speed differences due to using MS _commit() on WinXP at the time: https://mail.zope.org/pipermail/zodb-dev/2004-July/007720.html or if there's another (faster) function which can be called... No MS function that does the same thing. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Using zodb and blobs
For the record, with import os os.fsync = lambda fd: 0 at the top of the test app I get ~3700 tx/s. -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev