Hi All, I apologize for the cross-post but by this mail I simply hope to get a few pointers on how to narrow down to the problem I am seeing. I shall post to the relevant list if I have further questions.
So here is the issue: Short description: I've got a repoze.bfg application running on top of zeo/zodb across multiple servers, served using mod_wsgi and it's showing bad resource usage (both high memory consumption as well as CPU usage). Are there any steps i can do to localise whether this is an issue with zeo/zodb/mod_wsgi configuration, and/or usage ? Long description: * I have a repoze.bfg (version 1.3) based app, which uses zodb (over zeo, version 3.10.2) as the backend and is served up using apache+mod_wsgi. All running on a minimal debian 6.0 based amazon instances. * The architecture is 1 zodb server and 4 app instances running on individual EC2 instances (all in the same availability zone). All of the instances are behind an amazon Elastic Load Balancer * At the web-server, we don't customize apache much (ie: we pretty much use the stock debian apache config). We use mod_wsgi (version 3.3-2) to serve the application in daemon mode, with the following parameters: WSGIDaemonProcess webapp user=appname threads=7 processes=4 maximum-requests=10000 python-path=/path/to/virtualenv/eggs * The web app is the only thing that is served from these instances and we serve the static content for the using apache rather than the web app. * The zodb config on the db server looks like: <zeo> address 8886 read-only false invalidation-queue-size 1000 pid-filename $INSTANCE/var/ZEO.pid # monitor-address 8887 # transaction-timeout SECONDS </zeo> <blobstorage 1> <filestorage> path $INSTANCE/var/webapp.db </filestorage> blob-dir $INSTANCE/var/blobs </blobstorage> * The zeo connection string (for repoze.zodbconn-0.11) is: zodb_uri = zeo://<zodb server ip>:8886/?blob_dir=/path/to/var/blobs&shared_blob_dir=false&connection_pool_size=50&cache_size=1024MB&drop_cache_rather_verify=true (Note: the drop_cache_rather_verify=true is for faster startups) Now with this, on live we have typical load such as: top - 13:34:54 up 1 day, 8:22, 2 users, load average: 11.87, 8.75, 6.37 Tasks: 85 total, 2 running, 82 sleeping, 0 stopped, 1 zombie Cpu(s): 81.1%us, 6.7%sy, 0.0%ni, 11.8%id, 0.0%wa, 0.0%hi, 0.1%si, 0.2%st Mem: 15736220k total, 7867340k used, 7868880k free, 283332k buffers Swap: 0k total, 0k used, 0k free, 1840876k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5079 appname 21 0 1587m 1.2g 6264 S 77 8.1 9:23.86 apache2 5065 appname 20 0 1545m 1.2g 6272 S 95 7.9 9:31.24 apache2 5144 appname 20 0 1480m 1.1g 6260 S 86 7.4 5:49.92 apache2 5127 appname 20 0 1443m 1.1g 6264 S 94 7.2 7:13.10 apache2 .... .... .... As you can see that very high load avg. and the apache processes spawned for mod_wsgi (identifiable because of the user whose context they run under) consume about 1.2Gs resident memory each. With a constant load like this, the app. response progressively degrades. We've tried to tweak the number of processes, the cache_size in the zeo connection string but all to no avail. So, now rather than shoot in the dark, I would appreciate suggestions on how I might be able to isolate the bottle-neck in the stack. One thing to note is that is high load and memory usage is only seen on the production instances. When we test the app. using ab or funkload on a similar setup (2 app instances instead of 4), we do not see this problem. Any pointers/comments would be appreciated. cheers, - steve _______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev