Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app
Hi, Last week I started this thread to try and figure out the bottle neck in our web-app. After trying out different approaches, we decided to profile the app. using repoze.profile[1] in the middleware and hit the app. using funkload[2]. This is what we saw for a test run: (although my question below is not related to the specifics of the test run, details of the same can be provided if needed): http://pastebin.com/pzwA74EN As you can see, most of the time is being spent, acquiring locks, or (validating ?) cache code. After looking at ZODB3-3.10.2-py2.6-linux-x86_64.egg/ZODB/Connection.py line 864 (_setstate) (line 13 in that ^ paste), I think most of those acquire()s are coming from line 900: self._inv_lock.acquire() So, my question here is: a. Is the invalidation-queue-size just a start-up time optimization or does it play a part in invalidation of the zeo client cache for every transaction ? (we use the repoze.zodbconn middleware, using which, every request to the webserver is treated as a transaction) b. Does drop_cache_rather_verify() play a part on a 'connected' zeo client -- ie: would the client always drop cache rather than verify each transaction or is this just a startup optimization ? cheers, - steve [...snip...] -- random spiel: http://lonetwin.net/ what i'm stumbling into: http://lonetwin.stumbleupon.com/ ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app
Hi Alex, On 01/24/2012 07:30 PM, Alex Clark wrote: On 1/24/12 8:50 AM, steve wrote: Hi All, ...snip... You might try New Relic Thanks for your reply and sorry for the delayed response (been fighting fires). New Relic does seem interesting. Unfortunately however, their python agent doesn't seem to log the zodb layer as db (they support an impressive number of other python modules tho[1]). That said, we shall try tracking the zodb layer by using custom instrumentation as described in http://newrelic.com/docs/python/adding-python-instrumentation . thanks for the suggestion, cheers, - steve [1] http://newrelic.com/docs/python/instrumented-python-packages -- random spiel: http://lonetwin.net/ what i'm stumbling into: http://lonetwin.stumbleupon.com/ ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app
Hi Laurence, Thanks for your reply and sorry for the delayed response (been fighting fires). On 01/24/2012 09:18 PM, Laurence Rowe wrote: On 24 January 2012 13:50, steve st...@lonetwin.net wrote: Hi All, ...snip... Any pointers/comments would be appreciated. (Following up only on zodb-dev as I'm not subscribed to the other lists.) I'm guessing, but I suspect your load tests may only be reading from the ZODB so you rarely see any cache misses. Yes, that's correct, we do test (mostly) read operations. The most important tuning paramaters for ZODB in respect to memory usage are the number of threads and the connection_cache_size. The connection_cache_size controls the number of persistent objects kept live in the interpreter at a time. It's a per-connection setting and as each thread needs its own connection. Memory usage increases proportionally to connection_cache_size * number of threads. Most people use either one or two threads per process with the ZODB. I know plone.recipe.zope2instance defaults to two threads per process, though I think this is only to avoid locking up in the case of Plone being configured to load an RSS feed from itself. The Python Global Interpreter Lock prevents threads from running concurrently, so with ZEO running so many threads per process is likely to be counter-productive. Try with one or two threads and perhaps up the connection_cache_size (though loading from the zeo cache is very quick you must ensure your working set fits in the connection cache or else you'll be loading the same objects every request). If your CPU usage goes down a lot it means your spending time waiting for objects to be loaded in the connection and you might want to run a couple more processes than you have cpus if you are running one thread per process. Thanks for that information. That is helpful. Yes, I shall try and experiment with the connection_cache_size. I also suggest you check that your mod_wsgi config is correctly specifying WSGIProcessGroup, see: http://www.martinaspeli.net/articles/update-repoze-under-mod-wsgi-is-not-slow and http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIDaemonProcess. Yes, it is. To get more information about what is going on, try porting https://github.com/collective/collective.stats to work as WSGI middleware instead of hooking into Zope2's ZPublisher. Hmm, this is interesting too. I'll try to see if I can port and fit it into our app. Thanks for taking the time. cheers, - steve -- random spiel: http://lonetwin.net/ what i'm stumbling into: http://lonetwin.stumbleupon.com/ ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app
Hi Aijun, On 01/25/2012 10:36 AM, Aijun Jian wrote: You might try Nginx + uWSGI http://projects.unbit.it/uwsgi/ (+Green) Yep, thanks for the suggestion and we shall consider it but first I'd like confirm that it is the webserver that's at fault rather than shoot in the dark because changing the stack on production is a bit tedious for release reasons. cheers, - steve -- random spiel: http://lonetwin.net/ what i'm stumbling into: http://lonetwin.stumbleupon.com/ ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app
On 1/24/12 8:50 AM, steve wrote: Hi All, I apologize for the cross-post but by this mail I simply hope to get a few pointers on how to narrow down to the problem I am seeing. I shall post to the relevant list if I have further questions. So here is the issue: Short description: I've got a repoze.bfg application running on top of zeo/zodb across multiple servers, served using mod_wsgi and it's showing bad resource usage (both high memory consumption as well as CPU usage). Are there any steps i can do to localise whether this is an issue with zeo/zodb/mod_wsgi configuration, and/or usage ? Long description: * I have a repoze.bfg (version 1.3) based app, which uses zodb (over zeo, version 3.10.2) as the backend and is served up using apache+mod_wsgi. All running on a minimal debian 6.0 based amazon instances. * The architecture is 1 zodb server and 4 app instances running on individual EC2 instances (all in the same availability zone). All of the instances are behind an amazon Elastic Load Balancer * At the web-server, we don't customize apache much (ie: we pretty much use the stock debian apache config). We use mod_wsgi (version 3.3-2) to serve the application in daemon mode, with the following parameters: WSGIDaemonProcess webapp user=appname threads=7 processes=4 maximum-requests=1 python-path=/path/to/virtualenv/eggs * The web app is the only thing that is served from these instances and we serve the static content for the using apache rather than the web app. * The zodb config on the db server looks like: zeo address 8886 read-only false invalidation-queue-size 1000 pid-filename $INSTANCE/var/ZEO.pid # monitor-address 8887 # transaction-timeout SECONDS /zeo blobstorage 1 filestorage path $INSTANCE/var/webapp.db /filestorage blob-dir $INSTANCE/var/blobs /blobstorage * The zeo connection string (for repoze.zodbconn-0.11) is: zodb_uri = zeo://zodb server ip:8886/?blob_dir=/path/to/var/blobsshared_blob_dir=falseconnection_pool_size=50cache_size=1024MBdrop_cache_rather_verify=true (Note: the drop_cache_rather_verify=true is for faster startups) Now with this, on live we have typical load such as: top - 13:34:54 up 1 day, 8:22, 2 users, load average: 11.87, 8.75, 6.37 Tasks: 85 total, 2 running, 82 sleeping, 0 stopped, 1 zombie Cpu(s): 81.1%us, 6.7%sy, 0.0%ni, 11.8%id, 0.0%wa, 0.0%hi, 0.1%si, 0.2%st Mem: 15736220k total, 7867340k used, 7868880k free, 283332k buffers Swap:0k total,0k used,0k free, 1840876k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5079 appname 21 0 1587m 1.2g 6264 S 77 8.1 9:23.86 apache2 5065 appname 20 0 1545m 1.2g 6272 S 95 7.9 9:31.24 apache2 5144 appname 20 0 1480m 1.1g 6260 S 86 7.4 5:49.92 apache2 5127 appname 20 0 1443m 1.1g 6264 S 94 7.2 7:13.10 apache2 As you can see that very high load avg. and the apache processes spawned for mod_wsgi (identifiable because of the user whose context they run under) consume about 1.2Gs resident memory each. With a constant load like this, the app. response progressively degrades. We've tried to tweak the number of processes, the cache_size in the zeo connection string but all to no avail. So, now rather than shoot in the dark, I would appreciate suggestions on how I might be able to isolate the bottle-neck in the stack. One thing to note is that is high load and memory usage is only seen on the production instances. When we test the app. using ab or funkload on a similar setup (2 app instances instead of 4), we do not see this problem. Any pointers/comments would be appreciated. You might try New Relic Alex cheers, - steve ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev -- Alex Clark · http://pythonpackages.com ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app
On 24 January 2012 13:50, steve st...@lonetwin.net wrote: Hi All, I apologize for the cross-post but by this mail I simply hope to get a few pointers on how to narrow down to the problem I am seeing. I shall post to the relevant list if I have further questions. So here is the issue: Short description: I've got a repoze.bfg application running on top of zeo/zodb across multiple servers, served using mod_wsgi and it's showing bad resource usage (both high memory consumption as well as CPU usage). Are there any steps i can do to localise whether this is an issue with zeo/zodb/mod_wsgi configuration, and/or usage ? Long description: * I have a repoze.bfg (version 1.3) based app, which uses zodb (over zeo, version 3.10.2) as the backend and is served up using apache+mod_wsgi. All running on a minimal debian 6.0 based amazon instances. * The architecture is 1 zodb server and 4 app instances running on individual EC2 instances (all in the same availability zone). All of the instances are behind an amazon Elastic Load Balancer * At the web-server, we don't customize apache much (ie: we pretty much use the stock debian apache config). We use mod_wsgi (version 3.3-2) to serve the application in daemon mode, with the following parameters: WSGIDaemonProcess webapp user=appname threads=7 processes=4 maximum-requests=1 python-path=/path/to/virtualenv/eggs * The web app is the only thing that is served from these instances and we serve the static content for the using apache rather than the web app. * The zodb config on the db server looks like: zeo address 8886 read-only false invalidation-queue-size 1000 pid-filename $INSTANCE/var/ZEO.pid # monitor-address 8887 # transaction-timeout SECONDS /zeo blobstorage 1 filestorage path $INSTANCE/var/webapp.db /filestorage blob-dir $INSTANCE/var/blobs /blobstorage * The zeo connection string (for repoze.zodbconn-0.11) is: zodb_uri = zeo://zodb server ip:8886/?blob_dir=/path/to/var/blobsshared_blob_dir=falseconnection_pool_size=50cache_size=1024MBdrop_cache_rather_verify=true (Note: the drop_cache_rather_verify=true is for faster startups) Now with this, on live we have typical load such as: top - 13:34:54 up 1 day, 8:22, 2 users, load average: 11.87, 8.75, 6.37 Tasks: 85 total, 2 running, 82 sleeping, 0 stopped, 1 zombie Cpu(s): 81.1%us, 6.7%sy, 0.0%ni, 11.8%id, 0.0%wa, 0.0%hi, 0.1%si, 0.2%st Mem: 15736220k total, 7867340k used, 7868880k free, 283332k buffers Swap: 0k total, 0k used, 0k free, 1840876k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5079 appname 21 0 1587m 1.2g 6264 S 77 8.1 9:23.86 apache2 5065 appname 20 0 1545m 1.2g 6272 S 95 7.9 9:31.24 apache2 5144 appname 20 0 1480m 1.1g 6260 S 86 7.4 5:49.92 apache2 5127 appname 20 0 1443m 1.1g 6264 S 94 7.2 7:13.10 apache2 As you can see that very high load avg. and the apache processes spawned for mod_wsgi (identifiable because of the user whose context they run under) consume about 1.2Gs resident memory each. With a constant load like this, the app. response progressively degrades. We've tried to tweak the number of processes, the cache_size in the zeo connection string but all to no avail. So, now rather than shoot in the dark, I would appreciate suggestions on how I might be able to isolate the bottle-neck in the stack. One thing to note is that is high load and memory usage is only seen on the production instances. When we test the app. using ab or funkload on a similar setup (2 app instances instead of 4), we do not see this problem. Any pointers/comments would be appreciated. (Following up only on zodb-dev as I'm not subscribed to the other lists.) I'm guessing, but I suspect your load tests may only be reading from the ZODB so you rarely see any cache misses. The most important tuning paramaters for ZODB in respect to memory usage are the number of threads and the connection_cache_size. The connection_cache_size controls the number of persistent objects kept live in the interpreter at a time. It's a per-connection setting and as each thread needs its own connection. Memory usage increases proportionally to connection_cache_size * number of threads. Most people use either one or two threads per process with the ZODB. I know plone.recipe.zope2instance defaults to two threads per process, though I think this is only to avoid locking up in the case of Plone being configured to load an RSS feed from itself. The Python Global Interpreter Lock prevents threads from running concurrently, so with ZEO running so many threads per process is likely to be counter-productive. Try with one or two threads and perhaps up the connection_cache_size (though loading from the zeo cache is very quick you must ensure your working set fits in the connection cache or else you'll be loading the