(resent to list as I realized I just did a Reply) Cool! This is great stuff. Look forward to seeing the branch.
I started working on a similar tool that takes the data collected from Tach and fetches the data from Graphite to look at the performance issues (no changes to nova trunk requires since Tach is awesome). It's a shell of an idea yet, but the basics work: https://github.com/ohthree/novaprof But if there is something already existing, I'm happy to kill it off. I don't doubt for a second the db is the culprit for many of our woes. The thing I like about internal caching using established tools is that it works for db issues too without having to resort to custom tables. SQL query optimization, I'm sure, will go equally far. Thanks again for the great feedback ... keep it comin'! -S On 03/22/2012 11:53 PM, Mark Washenberger wrote: > Working on this independently, I created a branch with some simple > performance logging around the nova-api, and individually around > glance, nova.db, and nova.rpc calls. (Sorry, I only have a local > copy and its on a different computer right now, and probably needs > a rebase. I will rebase and publish it on GitHub tomorrow.) > > With this logging, I could get some simple profiling that I found > very useful. Here is a GH project with the analysis code as well > as some nova-api logs I was using as input. > > https://github.com/markwash/nova-perflog > > With these tools, you can get a wall-time profile for individual > requests. For example, looking at one server create request (and > you can run this directly from the checkout as the logs are saved > there): > > markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python > profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f > key count avg > nova.api.openstack.wsgi.POST 1 0.657 > nova.db.api.instance_update 1 0.191 > nova.image.show 1 0.179 > nova.db.api.instance_add_security_group 1 0.082 > nova.rpc.cast 1 0.059 > nova.db.api.instance_get_all_by_filters 1 0.034 > nova.db.api.security_group_get_by_name 2 0.029 > nova.db.api.instance_create 1 0.011 > nova.db.api.quota_get_all_by_project 3 0.003 > nova.db.api.instance_data_get_for_project 1 0.003 > > key count total > nova.api.openstack.wsgi 1 0.657 > nova.db.api 10 0.388 > nova.image 1 0.179 > nova.rpc 1 0.059 > > All times are in seconds. The nova.rpc time is probably high > since this was the first call since server restart, so the > connection handshake is probably included. This is also probably > 1.5 months stale. > > The conclusion I reached from this profiling is that we just plain > overuse the db (and we might do the same in glance). For example, > whenever we do updates, we actually re-retrieve the item from the > database, update its dictionary, and save it. This is double the > cost it needs to be. We also handle updates for data across tables > inefficiently, where they could be handled in single database round > trip. > > In particular, in the case of server listings, extensions are just > rough on performance. Most extensions hit the database again > at least once. This isn't really so bad, but it clearly is an area > where we should improve, since these are the most frequent api > queries. > > I just see a ton of specific performance problems that are easier > to address one by one, rather than diving into a general (albeit > obvious) solution such as caching. > > > "Sandy Walsh" <sandy.wa...@rackspace.com> said: > >> We're doing tests to find out where the bottlenecks are, caching is the >> most obvious solution, but there may be others. Tools like memcache do a >> really good job of sharing memory across servers so we don't have to >> reinvent the wheel or hit the db at all. >> >> In addition to looking into caching technologies/approaches we're gluing >> together some tools for finding those bottlenecks. Our first step will >> be finding them, then squashing them ... however. >> >> -S >> >> On 03/22/2012 06:25 PM, Mark Washenberger wrote: >>> What problems are caching strategies supposed to solve? >>> >>> On the nova compute side, it seems like streamlining db access and >>> api-view tables would solve any performance problems caching would >>> address, while keeping the stale data management problem small. >>> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack >> Post to : email@example.com >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp >> > > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : firstname.lastname@example.org > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : email@example.com Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp