Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
Thanks ... that's good feedback and we were discussing cache
invalidation issues today.

Any tips or suggestions?

-S



On 03/22/2012 09:28 PM, Joshua Harlow wrote:
 Just from experience.
 
 They do a great job. But the killer thing about caching is how u do the
 cache invalidation.
 
 Just caching stuff is easy-peasy, making sure it is invalidated on all
 servers in all conditions, not so easy...
 
 On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:
 
 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.
 
 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.
 
 -S
 
 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
  What problems are caching strategies supposed to solve?
 
  On the nova compute side, it seems like streamlining db access and
  api-view tables would solve any performance problems caching would
  address, while keeping the stale data management problem small.
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
Yup, makes sense. Thanks for the feedback. I agree that the external
caches are troublesome and we'll likely be focusing on the internal
ones. Whether that manifests itself as a memcache-like implementation or
another db view is unknown.

The other thing about in-process caching I like is the ability to have
it in a common (nova-common?) library where we can easily compute
hit/miss ratios and adjust accordingly.

-S


On 03/23/2012 12:02 AM, Mark Washenberger wrote:
 This is precisely my concern.
 
 It must be brought up that with Rackspace Cloud Servers, nearly
 all client codes routinely submit requests with a query parameter 
 cache-busting=some random string just to get around problems with
 cache invalidation. And woe to the client that does not.
 
 I get the feeling that once trust like this is lost, a project has
 a hard time regaining it. I'm not saying that we can avoid
 inconsistency entirely. Rather, I believe we will have to embrace
 some eventual-consistency models to enable the performance and
 scale we will ultimately attain. But I just get the feeling that
 generic caches are really only appropriate for write-once or at
 least write-rarely data. So personally I would rule out external
 caches entirely and try to be very judicious in selecting internal
 caches as well.
 
 Joshua Harlow harlo...@yahoo-inc.com said:
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 Just from experience.

 They do a great job. But the killer thing about caching is how u do the cache
 invalidation.

 Just caching stuff is easy-peasy, making sure it is invalidated on all 
 servers in
 all conditions, not so easy...

 On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:

 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.

 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.

 -S

 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?

 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


 
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
(resent to list as I realized I just did a Reply)

Cool! This is great stuff. Look forward to seeing the branch.

I started working on a similar tool that takes the data collected from
Tach and fetches the data from Graphite to look at the performance
issues (no changes to nova trunk requires since Tach is awesome).

It's a shell of an idea yet, but the basics work:
https://github.com/ohthree/novaprof

But if there is something already existing, I'm happy to kill it off.

I don't doubt for a second the db is the culprit for many of our woes.

The thing I like about internal caching using established tools is that
it works for db issues too without having to resort to custom tables.
SQL query optimization, I'm sure, will go equally far.

Thanks again for the great feedback ... keep it comin'!

-S


On 03/22/2012 11:53 PM, Mark Washenberger wrote:
 Working on this independently, I created a branch with some simple
 performance logging around the nova-api, and individually around 
 glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
 copy and its on a different computer right now, and probably needs
 a rebase. I will rebase and publish it on GitHub tomorrow.) 
 
 With this logging, I could get some simple profiling that I found
 very useful. Here is a GH project with the analysis code as well
 as some nova-api logs I was using as input. 
 
 https://github.com/markwash/nova-perflog
 
 With these tools, you can get a wall-time profile for individual
 requests. For example, looking at one server create request (and
 you can run this directly from the checkout as the logs are saved
 there):
 
 markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python 
 profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
 keycountavg
 nova.api.openstack.wsgi.POST   1  0.657
 nova.db.api.instance_update1  0.191
 nova.image.show1  0.179
 nova.db.api.instance_add_security_group1  0.082
 nova.rpc.cast  1  0.059
 nova.db.api.instance_get_all_by_filters1  0.034
 nova.db.api.security_group_get_by_name 2  0.029
 nova.db.api.instance_create1  0.011
 nova.db.api.quota_get_all_by_project   3  0.003
 nova.db.api.instance_data_get_for_project  1  0.003
 
 key  count  total
 nova.api.openstack.wsgi  1  0.657
 nova.db.api 10  0.388
 nova.image   1  0.179
 nova.rpc 1  0.059
 
 All times are in seconds. The nova.rpc time is probably high
 since this was the first call since server restart, so the
 connection handshake is probably included. This is also probably
 1.5 months stale.
 
 The conclusion I reached from this profiling is that we just plain
 overuse the db (and we might do the same in glance). For example,
 whenever we do updates, we actually re-retrieve the item from the
 database, update its dictionary, and save it. This is double the
 cost it needs to be. We also handle updates for data across tables
 inefficiently, where they could be handled in single database round
 trip.
 
 In particular, in the case of server listings, extensions are just
 rough on performance. Most extensions hit the database again
 at least once. This isn't really so bad, but it clearly is an area
 where we should improve, since these are the most frequent api
 queries.
 
 I just see a ton of specific performance problems that are easier
 to address one by one, rather than diving into a general (albeit
 obvious) solution such as caching.
 
 
 Sandy Walsh sandy.wa...@rackspace.com said:
 
 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.

 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.

 -S

 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?

 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

 
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
Was reading up some more on cache invalidation schemes last night. The
best practice approach seems to be using a sequence ID in the key. When
you want to invalidate a large set of keys, just bump the sequence id.

This could easily be handled with a notifier that listens to instance
state changes.

Thoughts?


On 03/22/2012 09:28 PM, Joshua Harlow wrote:
 Just from experience.
 
 They do a great job. But the killer thing about caching is how u do the
 cache invalidation.
 
 Just caching stuff is easy-peasy, making sure it is invalidated on all
 servers in all conditions, not so easy...
 
 On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:
 
 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.
 
 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.
 
 -S
 
 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
  What problems are caching strategies supposed to solve?
 
  On the nova compute side, it seems like streamlining db access and
  api-view tables would solve any performance problems caching would
  address, while keeping the stale data management problem small.
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Gabe Westmaas
I'd prefer to just set a different expectation for the user.  Rather than 
worrying about state change and invalidation, lets just set the expectation 
that the system as a whole is eventually consistent.  I would love to prevent 
any cache busting strategies or expectations as well as anything that requires 
something other than time based data refreshing.  We can all agree, I hope, 
that there is some level of eventual consistency even without caching in our 
current system.  The fact is that db updates are not instantaneous with other 
changes in the system; see snapshotting, instance creation, etc.  

What I'd like to see is additional fields included in the API response that how 
old this particular piece of data is.  This way the consumer can decide if they 
need to be concerned about the fact that this state hasn't changed, and it 
allows operators to tune their system to whatever their deployments can handle. 
 If we are exploring caching, I think that gives us the advantage of not a lot 
of extra code that worries about invalidation, allowing deployers to not use 
caching at all if its unneeded, and paves the way for view tables in large 
deployments which I think is important when we are thinking about this on a 
large scale.

Gabe

 -Original Message-
 From: openstack-
 bounces+gabe.westmaas=rackspace@lists.launchpad.net
 [mailto:openstack-
 bounces+gabe.westmaas=rackspace@lists.launchpad.net] On Behalf Of
 Sandy Walsh
 Sent: Friday, March 23, 2012 7:58 AM
 To: Joshua Harlow
 Cc: openstack
 Subject: Re: [Openstack] Caching strategies in Nova ...
 
 Was reading up some more on cache invalidation schemes last night. The
 best practice approach seems to be using a sequence ID in the key. When
 you want to invalidate a large set of keys, just bump the sequence id.
 
 This could easily be handled with a notifier that listens to instance state
 changes.
 
 Thoughts?
 
 
 On 03/22/2012 09:28 PM, Joshua Harlow wrote:
  Just from experience.
 
  They do a great job. But the killer thing about caching is how u do
  the cache invalidation.
 
  Just caching stuff is easy-peasy, making sure it is invalidated on all
  servers in all conditions, not so easy...
 
  On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:
 
  We're doing tests to find out where the bottlenecks are, caching is the
  most obvious solution, but there may be others. Tools like memcache do
 a
  really good job of sharing memory across servers so we don't have to
  reinvent the wheel or hit the db at all.
 
  In addition to looking into caching technologies/approaches we're gluing
  together some tools for finding those bottlenecks. Our first step will
  be finding them, then squashing them ... however.
 
  -S
 
  On 03/22/2012 06:25 PM, Mark Washenberger wrote:
   What problems are caching strategies supposed to solve?
  
   On the nova compute side, it seems like streamlining db access and
   api-view tables would solve any performance problems caching would
   address, while keeping the stale data management problem small.
  
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh


On 03/23/2012 09:44 AM, Gabe Westmaas wrote:
 I'd prefer to just set a different expectation for the user.  Rather than 
 worrying about state change and invalidation, lets just set the expectation 
 that the system as a whole is eventually consistent.  I would love to prevent 
 any cache busting strategies or expectations as well as anything that 
 requires something other than time based data refreshing.  We can all agree, 
 I hope, that there is some level of eventual consistency even without caching 
 in our current system.  The fact is that db updates are not instantaneous 
 with other changes in the system; see snapshotting, instance creation, etc.  

I think that's completely valid. The in-process caching schemes are
really just implementation techniques. The end-result (of view tables vs
key/value in-memory dicts vs whatever) is the same.


 What I'd like to see is additional fields included in the API response that 
 how old this particular piece of data is.  This way the consumer can decide 
 if they need to be concerned about the fact that this state hasn't changed, 
 and it allows operators to tune their system to whatever their deployments 
 can handle.  If we are exploring caching, I think that gives us the advantage 
 of not a lot of extra code that worries about invalidation, allowing 
 deployers to not use caching at all if its unneeded, and paves the way for 
 view tables in large deployments which I think is important when we are 
 thinking about this on a large scale.

My fear is clients will simply start to poll the system until new data
magically appears. An alternative might be, rather than say how old the
data is, how long until the cache expires?


 
 Gabe
 
 -Original Message-
 From: openstack-
 bounces+gabe.westmaas=rackspace@lists.launchpad.net
 [mailto:openstack-
 bounces+gabe.westmaas=rackspace@lists.launchpad.net] On Behalf Of
 Sandy Walsh
 Sent: Friday, March 23, 2012 7:58 AM
 To: Joshua Harlow
 Cc: openstack
 Subject: Re: [Openstack] Caching strategies in Nova ...

 Was reading up some more on cache invalidation schemes last night. The
 best practice approach seems to be using a sequence ID in the key. When
 you want to invalidate a large set of keys, just bump the sequence id.

 This could easily be handled with a notifier that listens to instance state
 changes.

 Thoughts?


 On 03/22/2012 09:28 PM, Joshua Harlow wrote:
 Just from experience.

 They do a great job. But the killer thing about caching is how u do
 the cache invalidation.

 Just caching stuff is easy-peasy, making sure it is invalidated on all
 servers in all conditions, not so easy...

 On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:

 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do
 a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.

 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.

 -S

 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
  What problems are caching strategies supposed to solve?
 
  On the nova compute side, it seems like streamlining db access and
  api-view tables would solve any performance problems caching would
  address, while keeping the stale data management problem small.
 

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Gabe Westmaas


On 3/23/12 8:56 AM, Sandy Walsh sandy.wa...@rackspace.com wrote:



On 03/23/2012 09:44 AM, Gabe Westmaas wrote:
 I'd prefer to just set a different expectation for the user.  Rather
than worrying about state change and invalidation, lets just set the
expectation that the system as a whole is eventually consistent.  I
would love to prevent any cache busting strategies or expectations as
well as anything that requires something other than time based data
refreshing.  We can all agree, I hope, that there is some level of
eventual consistency even without caching in our current system.  The
fact is that db updates are not instantaneous with other changes in the
system; see snapshotting, instance creation, etc.

I think that's completely valid. The in-process caching schemes are
really just implementation techniques. The end-result (of view tables vs
key/value in-memory dicts vs whatever) is the same.
Agreed! As long as the interface doesn't imply one implementation over
another (see below).



 What I'd like to see is additional fields included in the API response
that how old this particular piece of data is.  This way the consumer
can decide if they need to be concerned about the fact that this state
hasn't changed, and it allows operators to tune their system to whatever
their deployments can handle.  If we are exploring caching, I think that
gives us the advantage of not a lot of extra code that worries about
invalidation, allowing deployers to not use caching at all if its
unneeded, and paves the way for view tables in large deployments which I
think is important when we are thinking about this on a large scale.

My fear is clients will simply start to poll the system until new data
magically appears. An alternative might be, rather than say how old the
data is, how long until the cache expires?
Definitely a valid concern.  However, I kind of expect that many users
will still poll even if they know they won't get new data until X time.
In addition, I think if we say how old the data is, it still implies too
much knowledge unless we go with a strict caching system.  I'd love for us
to leave the ability for us to update that data asynchronously, and
hopefully really quickly, except in the cases where the system is under
unexpected load.  Basically, if we give them that information, and we miss
it, that¹s a call in to support, not to say they won't call in if it takes
too long to update, of course.

Also, if its hitting a cache or something optimized for GETs, hopefully we
can handle lots of polling by adding more API nodes.

Gabe



 
 Gabe
 
 -Original Message-
 From: openstack-
 bounces+gabe.westmaas=rackspace@lists.launchpad.net
 [mailto:openstack-
 bounces+gabe.westmaas=rackspace@lists.launchpad.net] On Behalf Of
 Sandy Walsh
 Sent: Friday, March 23, 2012 7:58 AM
 To: Joshua Harlow
 Cc: openstack
 Subject: Re: [Openstack] Caching strategies in Nova ...

 Was reading up some more on cache invalidation schemes last night. The
 best practice approach seems to be using a sequence ID in the key. When
 you want to invalidate a large set of keys, just bump the sequence id.

 This could easily be handled with a notifier that listens to instance
state
 changes.

 Thoughts?


 On 03/22/2012 09:28 PM, Joshua Harlow wrote:
 Just from experience.

 They do a great job. But the killer thing about caching is how u do
 the cache invalidation.

 Just caching stuff is easy-peasy, making sure it is invalidated on all
 servers in all conditions, not so easy...

 On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:

 We're doing tests to find out where the bottlenecks are, caching
is the
 most obvious solution, but there may be others. Tools like
memcache do
 a
 really good job of sharing memory across servers so we don't have
to
 reinvent the wheel or hit the db at all.

 In addition to looking into caching technologies/approaches we're
gluing
 together some tools for finding those bottlenecks. Our first step
will
 be finding them, then squashing them ... however.

 -S

 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
  What problems are caching strategies supposed to solve?
 
  On the nova compute side, it seems like streamlining db access
and
  api-view tables would solve any performance problems caching
would
  address, while keeping the stale data management problem small.
 

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing

Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Mark Washenberger
Alas, I let my patch get too stale to rebase properly. However, it is a
fairly dumb approach I took that can be demonstrated just from the
patch. And in any case, I think the approach you're taking, profiling
based on Tach, is going to be better in the long run and more share-able
in the community.

+ 1 gazillion to getting good metrics!

Sandy Walsh sandy.wa...@rackspace.com said:

 (resent to list as I realized I just did a Reply)
 
 Cool! This is great stuff. Look forward to seeing the branch.
 
 I started working on a similar tool that takes the data collected from
 Tach and fetches the data from Graphite to look at the performance
 issues (no changes to nova trunk requires since Tach is awesome).
 
 It's a shell of an idea yet, but the basics work:
 https://github.com/ohthree/novaprof
 
 But if there is something already existing, I'm happy to kill it off.
 
 I don't doubt for a second the db is the culprit for many of our woes.
 
 The thing I like about internal caching using established tools is that
 it works for db issues too without having to resort to custom tables.
 SQL query optimization, I'm sure, will go equally far.
 
 Thanks again for the great feedback ... keep it comin'!
 
 -S
 
 
 On 03/22/2012 11:53 PM, Mark Washenberger wrote:
 Working on this independently, I created a branch with some simple
 performance logging around the nova-api, and individually around
 glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
 copy and its on a different computer right now, and probably needs
 a rebase. I will rebase and publish it on GitHub tomorrow.)

 With this logging, I could get some simple profiling that I found
 very useful. Here is a GH project with the analysis code as well
 as some nova-api logs I was using as input.

 https://github.com/markwash/nova-perflog

 With these tools, you can get a wall-time profile for individual
 requests. For example, looking at one server create request (and
 you can run this directly from the checkout as the logs are saved
 there):

 markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python
 profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
 keycountavg
 nova.api.openstack.wsgi.POST   1  0.657
 nova.db.api.instance_update1  0.191
 nova.image.show1  0.179
 nova.db.api.instance_add_security_group1  0.082
 nova.rpc.cast  1  0.059
 nova.db.api.instance_get_all_by_filters1  0.034
 nova.db.api.security_group_get_by_name 2  0.029
 nova.db.api.instance_create1  0.011
 nova.db.api.quota_get_all_by_project   3  0.003
 nova.db.api.instance_data_get_for_project  1  0.003

 key  count  total
 nova.api.openstack.wsgi  1  0.657
 nova.db.api 10  0.388
 nova.image   1  0.179
 nova.rpc 1  0.059

 All times are in seconds. The nova.rpc time is probably high
 since this was the first call since server restart, so the
 connection handshake is probably included. This is also probably
 1.5 months stale.

 The conclusion I reached from this profiling is that we just plain
 overuse the db (and we might do the same in glance). For example,
 whenever we do updates, we actually re-retrieve the item from the
 database, update its dictionary, and save it. This is double the
 cost it needs to be. We also handle updates for data across tables
 inefficiently, where they could be handled in single database round
 trip.

 In particular, in the case of server listings, extensions are just
 rough on performance. Most extensions hit the database again
 at least once. This isn't really so bad, but it clearly is an area
 where we should improve, since these are the most frequent api
 queries.

 I just see a ton of specific performance problems that are easier
 to address one by one, rather than diving into a general (albeit
 obvious) solution such as caching.


 Sandy Walsh sandy.wa...@rackspace.com said:

 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.

 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.

 -S

 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?

 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : 

Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Kevin L. Mitchell
On Fri, 2012-03-23 at 13:43 +, Gabe Westmaas wrote:
 However, I kind of expect that many users
 will still poll even if they know they won't get new data until X
 time. 

I wish there was some kind of way for us to issue push notifications to
the client, i.e., have the client register some sort of callback and
what piece of data / state change they're interested in, then nova would
call that callback when the condition occurred.  It probably wouldn't
stop polling, but we could ratchet down rate limits to encourage users
to use the callback mechanism.

Of course, then there's the problem of, what if the user is behind a
firewall or some sort of NAT... :/
-- 
Kevin L. Mitchell kevin.mitch...@rackspace.com


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Kevin L. Mitchell
On Fri, 2012-03-23 at 08:55 -0300, Sandy Walsh wrote:
 I don't doubt for a second the db is the culprit for many of our woes.
 
 The thing I like about internal caching using established tools is
 that
 it works for db issues too without having to resort to custom tables.
 SQL query optimization, I'm sure, will go equally far. 

For that matter, I wouldn't be surprised if there were things we could
do to nova's DB to speed things up.  For instance, what if we supported
non-SQL data stores?
-- 
Kevin L. Mitchell kevin.mitch...@rackspace.com


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Johannes Erdfelt
On Fri, Mar 23, 2012, Kevin L. Mitchell kevin.mitch...@rackspace.com wrote:
 On Fri, 2012-03-23 at 13:43 +, Gabe Westmaas wrote:
  However, I kind of expect that many users
  will still poll even if they know they won't get new data until X
  time. 
 
 I wish there was some kind of way for us to issue push notifications to
 the client, i.e., have the client register some sort of callback and
 what piece of data / state change they're interested in, then nova would
 call that callback when the condition occurred.  It probably wouldn't
 stop polling, but we could ratchet down rate limits to encourage users
 to use the callback mechanism.
 
 Of course, then there's the problem of, what if the user is behind a
 firewall or some sort of NAT... :/

Long polling is always an option.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Russell Bryant
On 03/23/2012 11:36 AM, Johannes Erdfelt wrote:
 On Fri, Mar 23, 2012, Kevin L. Mitchell kevin.mitch...@rackspace.com wrote:
 On Fri, 2012-03-23 at 13:43 +, Gabe Westmaas wrote:
 However, I kind of expect that many users
 will still poll even if they know they won't get new data until X
 time. 

 I wish there was some kind of way for us to issue push notifications to
 the client, i.e., have the client register some sort of callback and
 what piece of data / state change they're interested in, then nova would
 call that callback when the condition occurred.  It probably wouldn't
 stop polling, but we could ratchet down rate limits to encourage users
 to use the callback mechanism.

 Of course, then there's the problem of, what if the user is behind a
 firewall or some sort of NAT... :/
 
 Long polling is always an option.

Or WebSockets.

-- 
Russell Bryant

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Brian Lamar

On Mar 23, 2012, at 11:22 AM, Kevin L. Mitchell wrote:

 On Fri, 2012-03-23 at 08:55 -0300, Sandy Walsh wrote:
 I don't doubt for a second the db is the culprit for many of our woes.
 
 The thing I like about internal caching using established tools is
 that
 it works for db issues too without having to resort to custom tables.
 SQL query optimization, I'm sure, will go equally far. 
 
 For that matter, I wouldn't be surprised if there were things we could
 do to nova's DB to speed things up.  For instance, what if we supported
 non-SQL data stores?

Any database is going to be slow if you're talking to it more than necessary. 
Even if we replaced MySQL with the latest and greatest web-scale noSQL database 
out there we'd still be slow. I'd love to see a combination effort of improving 
the flexibility of the DB layer as well as improvements surrounding the sheer 
number of calls to the database.

 -- 
 Kevin L. Mitchell kevin.mitch...@rackspace.com
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Joshua Harlow
Right,

Lets fix the problem, not add a patch that hides the problem.

U can't put lipstick on a pig, haha. Its still a pig...

On 3/22/12 8:02 PM, Mark Washenberger mark.washenber...@rackspace.com wrote:

This is precisely my concern.

It must be brought up that with Rackspace Cloud Servers, nearly
all client codes routinely submit requests with a query parameter
cache-busting=some random string just to get around problems with
cache invalidation. And woe to the client that does not.

I get the feeling that once trust like this is lost, a project has
a hard time regaining it. I'm not saying that we can avoid
inconsistency entirely. Rather, I believe we will have to embrace
some eventual-consistency models to enable the performance and
scale we will ultimately attain. But I just get the feeling that
generic caches are really only appropriate for write-once or at
least write-rarely data. So personally I would rule out external
caches entirely and try to be very judicious in selecting internal
caches as well.

Joshua Harlow harlo...@yahoo-inc.com said:

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 Just from experience.

 They do a great job. But the killer thing about caching is how u do the cache
 invalidation.

 Just caching stuff is easy-peasy, making sure it is invalidated on all 
 servers in
 all conditions, not so easy...

 On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:

 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.

 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.

 -S

 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?

 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp





___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Joshua Harlow
+ 100

On 3/23/12 10:50 AM, Brian Lamar brian.la...@rackspace.com wrote:



On Mar 23, 2012, at 11:22 AM, Kevin L. Mitchell wrote:

 On Fri, 2012-03-23 at 08:55 -0300, Sandy Walsh wrote:
 I don't doubt for a second the db is the culprit for many of our woes.

 The thing I like about internal caching using established tools is
 that
 it works for db issues too without having to resort to custom tables.
 SQL query optimization, I'm sure, will go equally far.

 For that matter, I wouldn't be surprised if there were things we could
 do to nova's DB to speed things up.  For instance, what if we supported
 non-SQL data stores?

Any database is going to be slow if you're talking to it more than necessary. 
Even if we replaced MySQL with the latest and greatest web-scale noSQL database 
out there we'd still be slow. I'd love to see a combination effort of improving 
the flexibility of the DB layer as well as improvements surrounding the sheer 
number of calls to the database.

 --
 Kevin L. Mitchell kevin.mitch...@rackspace.com


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Debo Dutta (dedutta)
+1 to DBs being slow. But what if we used a combo of memcache and db. Or
use couch/mongo. 

Comparision:
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis 

Anyone has experience on large deployments to see the kind of db traffic
we need to optimize for?

Another thing could be to avoid joins and then do sharding. 

debo

-Original Message-
From: openstack-bounces+dedutta=cisco@lists.launchpad.net
[mailto:openstack-bounces+dedutta=cisco@lists.launchpad.net] On
Behalf Of Brian Lamar
Sent: Friday, March 23, 2012 10:51 AM
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] Caching strategies in Nova ...


On Mar 23, 2012, at 11:22 AM, Kevin L. Mitchell wrote:

 On Fri, 2012-03-23 at 08:55 -0300, Sandy Walsh wrote:
 I don't doubt for a second the db is the culprit for many of our
woes.
 
 The thing I like about internal caching using established tools is
 that
 it works for db issues too without having to resort to custom tables.
 SQL query optimization, I'm sure, will go equally far. 
 
 For that matter, I wouldn't be surprised if there were things we could
 do to nova's DB to speed things up.  For instance, what if we
supported
 non-SQL data stores?

Any database is going to be slow if you're talking to it more than
necessary. Even if we replaced MySQL with the latest and greatest
web-scale noSQL database out there we'd still be slow. I'd love to see a
combination effort of improving the flexibility of the DB layer as well
as improvements surrounding the sheer number of calls to the database.

 -- 
 Kevin L. Mitchell kevin.mitch...@rackspace.com
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Johannes Erdfelt
On Fri, Mar 23, 2012, Debo Dutta (dedutta) dedu...@cisco.com wrote:
 +1 to DBs being slow. But what if we used a combo of memcache and db. Or
 use couch/mongo. 
 
 Comparision:
 http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis 
 
 Anyone has experience on large deployments to see the kind of db traffic
 we need to optimize for?
 
 Another thing could be to avoid joins and then do sharding. 

Seems like that's the opposite of what we want.

MySQL isn't exactly slow and Nova doesn't have particularly large
tables. It looks like the slowness is coming from the network and how
many queries are being made.

Avoiding joins would mean even more queries, which looks like it would
slow it down even further.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
You can. The sanctioned approach is to use Yagi with a feed into
something like PubSubHubBub that lives on the public interweeb.

It's just an optional component.

-S

On 03/23/2012 12:20 PM, Kevin L. Mitchell wrote:
 On Fri, 2012-03-23 at 13:43 +, Gabe Westmaas wrote:
 However, I kind of expect that many users
 will still poll even if they know they won't get new data until X
 time. 
 
 I wish there was some kind of way for us to issue push notifications to
 the client, i.e., have the client register some sort of callback and
 what piece of data / state change they're interested in, then nova would
 call that callback when the condition occurred.  It probably wouldn't
 stop polling, but we could ratchet down rate limits to encourage users
 to use the callback mechanism.
 
 Of course, then there's the problem of, what if the user is behind a
 firewall or some sort of NAT... :/

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
Ugh (reply vs reply-all again)

On 03/23/2012 02:58 PM, Joshua Harlow wrote:
 Right,
 
 Lets fix the problem, not add a patch that hides the problem.
 
 U can’t put lipstick on a pig, haha. Its still a pig...

When stuff is expensive to compute, caching is the only option (yes?).
Whether that lives in memcache, a db or in a dict. Tuning sql queries
will only get us so far. I think creating custom view tables is a
laborious and error prone tact ... additionally you get developers that
start to depend on the view tables as gospel.

Or am I missing something here?

-S


 On 3/22/12 8:02 PM, Mark Washenberger
 mark.washenber...@rackspace.com wrote:
 
 This is precisely my concern.
 
 It must be brought up that with Rackspace Cloud Servers, nearly
 all client codes routinely submit requests with a query parameter
 cache-busting=some random string just to get around problems with
 cache invalidation. And woe to the client that does not.
 
 I get the feeling that once trust like this is lost, a project has
 a hard time regaining it. I'm not saying that we can avoid
 inconsistency entirely. Rather, I believe we will have to embrace
 some eventual-consistency models to enable the performance and
 scale we will ultimately attain. But I just get the feeling that
 generic caches are really only appropriate for write-once or at
 least write-rarely data. So personally I would rule out external
 caches entirely and try to be very judicious in selecting internal
 caches as well.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Mark Washenberger


Johannes Erdfelt johan...@erdfelt.com said:

 
 MySQL isn't exactly slow and Nova doesn't have particularly large
 tables. It looks like the slowness is coming from the network and how
 many queries are being made.
 
 Avoiding joins would mean even more queries, which looks like it would
 slow it down even further.
 

This is exactly what I saw in my profiling. More complex queries did
still seem to take longer than less complex ones, but it was a second
order effect compared to the overall volume of queries. 

I'm not sure that network was the culprit though, since my ping
roundtrip time was small relative to the wall time I measured for each
nova.db.api call.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Monsyne Dragon

On Mar 23, 2012, at 10:20 AM, Kevin L. Mitchell wrote:

 On Fri, 2012-03-23 at 13:43 +, Gabe Westmaas wrote:
 However, I kind of expect that many users
 will still poll even if they know they won't get new data until X
 time. 
 
 I wish there was some kind of way for us to issue push notifications to
 the client, i.e., have the client register some sort of callback and
 what piece of data / state change they're interested in, then nova would
 call that callback when the condition occurred.  It probably wouldn't
 stop polling, but we could ratchet down rate limits to encourage users
 to use the callback mechanism.

Actually, that is (one) of the things the notifications system was designed to 
accommodate.   If you use attach a feed generator (like Yagi) to the 
notification queues, plus a PubSubHubub hub, folks can subscribe to events by 
event type.  (Other pubsub strategies would work too, like XMPP pubsub) 


 Of course, then there's the problem of, what if the user is behind a
 firewall or some sort of NAT... :/

 PSH pushes to a web callback supplied by the client. Presumably they could run 
the callback receiver somewhere else, or some thru some proxy. 


 -- 
 Kevin L. Mitchell kevin.mitch...@rackspace.com
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

--
Monsyne M. Dragon
OpenStack/Nova 
cell 210-441-0965
work x 5014190


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
Was the db on a separate server or loopback?

On 03/23/2012 05:26 PM, Mark Washenberger wrote:
 
 
 Johannes Erdfelt johan...@erdfelt.com said:
 

 MySQL isn't exactly slow and Nova doesn't have particularly large
 tables. It looks like the slowness is coming from the network and how
 many queries are being made.

 Avoiding joins would mean even more queries, which looks like it would
 slow it down even further.

 
 This is exactly what I saw in my profiling. More complex queries did
 still seem to take longer than less complex ones, but it was a second
 order effect compared to the overall volume of queries. 
 
 I'm not sure that network was the culprit though, since my ping
 roundtrip time was small relative to the wall time I measured for each
 nova.db.api call.
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Rick Jones

On 03/23/2012 01:26 PM, Mark Washenberger wrote:



Johannes Erdfeltjohan...@erdfelt.com  said:



MySQL isn't exactly slow and Nova doesn't have particularly large
tables. It looks like the slowness is coming from the network and how
many queries are being made.

Avoiding joins would mean even more queries, which looks like it would
slow it down even further.



This is exactly what I saw in my profiling. More complex queries did
still seem to take longer than less complex ones, but it was a second
order effect compared to the overall volume of queries.

I'm not sure that network was the culprit though, since my ping
roundtrip time was small relative to the wall time I measured for each
nova.db.api call.


How much data would the queries return, and how long between queries? 
One networking thing that might come into play would be slow start 
after idle - if the query returns are  INITCWND (either 3 or 10 
segments depending on which kernel) and they are separated by at least 
one RTO (or is it RTT?) then they will hit slow start each time.  Now, 
the extent to which that matters is a function of how large the return 
is, and it is only adding RTTs so it wouldn't be minutes, but it could 
add up a bit I suppose.


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Justin Santa Barbara
This is great: hard numbers are exactly what we need.  I would love to see
a statement-by-statement SQL log with timings from someone that has a
performance issue.  I'm happy to look into any DB problems that
demonstrates.

The nova database is small enough that it should always be in-memory (if
you're running a million VMs, I don't think asking for one gigabyte of RAM
on your DB is unreasonable!)

If it isn't hitting disk, PostgreSQL or MySQL with InnoDB can serve 10k
'indexed' requests per second through SQL on a low-end ($1000) box.  With
tuning you can get 10x that.  Using one of the SQL bypass engines (e.g.
MySQL HandlerSocket) can supposedly give you 10x again.  Throwing money at
the problem in the form of multi-processor boxes (or disks if you're I/O
bound) can probably get you 10x again.

However, if you put a DB on a remote host, you'll have to wait for a
network round-trip per query.  If your ORM is doing a 1+N query, the total
read time will be slow.  If your DB is doing a sync on every write, writes
will be slow.  If the DB isn't tuned with a sensible amount of cache (at
least as big as the DB size), it will be slow(er).  Each of these has a
very simple fix for OpenStack.

Relational databases have very efficient caching mechanisms built in.  Any
out-of-process cache will have a hard time beating it.  Let's make sure the
bottleneck is the DB, and not (for example) RabbitMQ, before we go off a
huge rearchitecture.

Justin



On Thu, Mar 22, 2012 at 7:53 PM, Mark Washenberger 
mark.washenber...@rackspace.com wrote:

 Working on this independently, I created a branch with some simple
 performance logging around the nova-api, and individually around
 glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
 copy and its on a different computer right now, and probably needs
 a rebase. I will rebase and publish it on GitHub tomorrow.)

 With this logging, I could get some simple profiling that I found
 very useful. Here is a GH project with the analysis code as well
 as some nova-api logs I was using as input.

 https://github.com/markwash/nova-perflog

 With these tools, you can get a wall-time profile for individual
 requests. For example, looking at one server create request (and
 you can run this directly from the checkout as the logs are saved
 there):

 markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python
 profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
 keycountavg
 nova.api.openstack.wsgi.POST   1  0.657
 nova.db.api.instance_update1  0.191
 nova.image.show1  0.179
 nova.db.api.instance_add_security_group1  0.082
 nova.rpc.cast  1  0.059
 nova.db.api.instance_get_all_by_filters1  0.034
 nova.db.api.security_group_get_by_name 2  0.029
 nova.db.api.instance_create1  0.011
 nova.db.api.quota_get_all_by_project   3  0.003
 nova.db.api.instance_data_get_for_project  1  0.003

 key  count  total
 nova.api.openstack.wsgi  1  0.657
 nova.db.api 10  0.388
 nova.image   1  0.179
 nova.rpc 1  0.059

 All times are in seconds. The nova.rpc time is probably high
 since this was the first call since server restart, so the
 connection handshake is probably included. This is also probably
 1.5 months stale.

 The conclusion I reached from this profiling is that we just plain
 overuse the db (and we might do the same in glance). For example,
 whenever we do updates, we actually re-retrieve the item from the
 database, update its dictionary, and save it. This is double the
 cost it needs to be. We also handle updates for data across tables
 inefficiently, where they could be handled in single database round
 trip.

 In particular, in the case of server listings, extensions are just
 rough on performance. Most extensions hit the database again
 at least once. This isn't really so bad, but it clearly is an area
 where we should improve, since these are the most frequent api
 queries.

 I just see a ton of specific performance problems that are easier
 to address one by one, rather than diving into a general (albeit
 obvious) solution such as caching.


 Sandy Walsh sandy.wa...@rackspace.com said:

  We're doing tests to find out where the bottlenecks are, caching is the
  most obvious solution, but there may be others. Tools like memcache do a
  really good job of sharing memory across servers so we don't have to
  reinvent the wheel or hit the db at all.
 
  In addition to looking into caching technologies/approaches we're gluing
  together some tools for finding those bottlenecks. Our first step will
  be finding them, then squashing them ... however.
 
  -S
 
  On 03/22/2012 06:25 PM, Mark Washenberger wrote:
  What problems are caching strategies supposed to solve?
 
  On the nova compute side, it seems 

Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Yun Mao
Hi Mark,

what workload and what setup do you have while you are profiling? e.g.
how many compute nodes do you have, how many VMs do you have, are you
creating/destroying/migrating VMs, volumes, networks?

Thanks,

Yun

On Fri, Mar 23, 2012 at 4:26 PM, Mark Washenberger
mark.washenber...@rackspace.com wrote:


 Johannes Erdfelt johan...@erdfelt.com said:


 MySQL isn't exactly slow and Nova doesn't have particularly large
 tables. It looks like the slowness is coming from the network and how
 many queries are being made.

 Avoiding joins would mean even more queries, which looks like it would
 slow it down even further.


 This is exactly what I saw in my profiling. More complex queries did
 still seem to take longer than less complex ones, but it was a second
 order effect compared to the overall volume of queries.

 I'm not sure that network was the culprit though, since my ping
 roundtrip time was small relative to the wall time I measured for each
 nova.db.api call.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Joe Gordon
+1

Documenting these findings would be nice too.


best,
Joe

On Fri, Mar 23, 2012 at 2:15 PM, Justin Santa Barbara
jus...@fathomdb.comwrote:

 This is great: hard numbers are exactly what we need.  I would love to see
 a statement-by-statement SQL log with timings from someone that has a
 performance issue.  I'm happy to look into any DB problems that
 demonstrates.

 The nova database is small enough that it should always be in-memory (if
 you're running a million VMs, I don't think asking for one gigabyte of RAM
 on your DB is unreasonable!)

 If it isn't hitting disk, PostgreSQL or MySQL with InnoDB can serve 10k
 'indexed' requests per second through SQL on a low-end ($1000) box.  With
 tuning you can get 10x that.  Using one of the SQL bypass engines (e.g.
 MySQL HandlerSocket) can supposedly give you 10x again.  Throwing money at
 the problem in the form of multi-processor boxes (or disks if you're I/O
 bound) can probably get you 10x again.

 However, if you put a DB on a remote host, you'll have to wait for a
 network round-trip per query.  If your ORM is doing a 1+N query, the total
 read time will be slow.  If your DB is doing a sync on every write, writes
 will be slow.  If the DB isn't tuned with a sensible amount of cache (at
 least as big as the DB size), it will be slow(er).  Each of these has a
 very simple fix for OpenStack.

 Relational databases have very efficient caching mechanisms built in.  Any
 out-of-process cache will have a hard time beating it.  Let's make sure the
 bottleneck is the DB, and not (for example) RabbitMQ, before we go off a
 huge rearchitecture.

 Justin




 On Thu, Mar 22, 2012 at 7:53 PM, Mark Washenberger 
 mark.washenber...@rackspace.com wrote:

 Working on this independently, I created a branch with some simple
 performance logging around the nova-api, and individually around
 glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
 copy and its on a different computer right now, and probably needs
 a rebase. I will rebase and publish it on GitHub tomorrow.)

 With this logging, I could get some simple profiling that I found
 very useful. Here is a GH project with the analysis code as well
 as some nova-api logs I was using as input.

 https://github.com/markwash/nova-perflog

 With these tools, you can get a wall-time profile for individual
 requests. For example, looking at one server create request (and
 you can run this directly from the checkout as the logs are saved
 there):

 markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python
 profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
 keycountavg
 nova.api.openstack.wsgi.POST   1  0.657
 nova.db.api.instance_update1  0.191
 nova.image.show1  0.179
 nova.db.api.instance_add_security_group1  0.082
 nova.rpc.cast  1  0.059
 nova.db.api.instance_get_all_by_filters1  0.034
 nova.db.api.security_group_get_by_name 2  0.029
 nova.db.api.instance_create1  0.011
 nova.db.api.quota_get_all_by_project   3  0.003
 nova.db.api.instance_data_get_for_project  1  0.003

 key  count  total
 nova.api.openstack.wsgi  1  0.657
 nova.db.api 10  0.388
 nova.image   1  0.179
 nova.rpc 1  0.059

 All times are in seconds. The nova.rpc time is probably high
 since this was the first call since server restart, so the
 connection handshake is probably included. This is also probably
 1.5 months stale.

 The conclusion I reached from this profiling is that we just plain
 overuse the db (and we might do the same in glance). For example,
 whenever we do updates, we actually re-retrieve the item from the
 database, update its dictionary, and save it. This is double the
 cost it needs to be. We also handle updates for data across tables
 inefficiently, where they could be handled in single database round
 trip.

 In particular, in the case of server listings, extensions are just
 rough on performance. Most extensions hit the database again
 at least once. This isn't really so bad, but it clearly is an area
 where we should improve, since these are the most frequent api
 queries.

 I just see a ton of specific performance problems that are easier
 to address one by one, rather than diving into a general (albeit
 obvious) solution such as caching.


 Sandy Walsh sandy.wa...@rackspace.com said:

  We're doing tests to find out where the bottlenecks are, caching is the
  most obvious solution, but there may be others. Tools like memcache do a
  really good job of sharing memory across servers so we don't have to
  reinvent the wheel or hit the db at all.
 
  In addition to looking into caching technologies/approaches we're gluing
  together some tools for finding those bottlenecks. Our first step will
  be finding them, then squashing them 

Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Mark Washenberger
Hmm. . it was definitely different xen virtual machines on either the
same hypervisor or one that was adjacent to it in an L2 sense. On a
similar environment I have set up now, I notice that the ping time
from one vm to another on the same hypervisor is not noticeably less
than the ping time to a vm on a different hypervisor. Not sure why that
is the case! In any case it is trivial. . ~3 ms for first ping, ~0.3 ms
for subsequent pings. 

Sandy Walsh sandy.wa...@rackspace.com said:

 Was the db on a separate server or loopback?
 
 On 03/23/2012 05:26 PM, Mark Washenberger wrote:


 Johannes Erdfelt johan...@erdfelt.com said:


 MySQL isn't exactly slow and Nova doesn't have particularly large
 tables. It looks like the slowness is coming from the network and how
 many queries are being made.

 Avoiding joins would mean even more queries, which looks like it would
 slow it down even further.


 This is exactly what I saw in my profiling. More complex queries did
 still seem to take longer than less complex ones, but it was a second
 order effect compared to the overall volume of queries.

 I'm not sure that network was the culprit though, since my ping
 roundtrip time was small relative to the wall time I measured for each
 nova.db.api call.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Mark Washenberger
Yun,

I was working with a very small but fairly realistic setup. In this
case I had only 3 Xen hosts, no more than 10 nova vms up at a time.
And the environment was very nearly fresh so I believe the db 
tables were as small as they could be. I believe the utilization
across the board in my setup was very low, and indeed the numbers
were very consistent (I ran a large number of times, but didn't
save all of the data :-(). Also, there were only 2 compute nodes
running, but as the workflow only had rpc casts, I'm not sure that
really mattered very much.

The profile I gave was for vm creation. But I also ran tests for
deletion, listing, and showing vms in the OS API.

Networks were static throughout the process. Volumes were absent.

Yun Mao yun...@gmail.com said:

 Hi Mark,
 
 what workload and what setup do you have while you are profiling? e.g.
 how many compute nodes do you have, how many VMs do you have, are you
 creating/destroying/migrating VMs, volumes, networks?
 
 Thanks,
 
 Yun
 
 On Fri, Mar 23, 2012 at 4:26 PM, Mark Washenberger
 mark.washenber...@rackspace.com wrote:


 Johannes Erdfelt johan...@erdfelt.com said:


 MySQL isn't exactly slow and Nova doesn't have particularly large
 tables. It looks like the slowness is coming from the network and how
 many queries are being made.

 Avoiding joins would mean even more queries, which looks like it would
 slow it down even further.


 This is exactly what I saw in my profiling. More complex queries did
 still seem to take longer than less complex ones, but it was a second
 order effect compared to the overall volume of queries.

 I'm not sure that network was the culprit though, since my ping
 roundtrip time was small relative to the wall time I measured for each
 nova.db.api call.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Sandy Walsh
Great suggestions guys ... we'll give some thought on how the community
can share and compare performance measurements in a consistent way.

-S

On 03/23/2012 07:26 PM, Joe Gordon wrote:
 +1
 
 Documenting these findings would be nice too.
 
 
 best,
 Joe
 
 On Fri, Mar 23, 2012 at 2:15 PM, Justin Santa Barbara
 jus...@fathomdb.com mailto:jus...@fathomdb.com wrote:
 
 This is great: hard numbers are exactly what we need.  I would love
 to see a statement-by-statement SQL log with timings from someone
 that has a performance issue.  I'm happy to look into any DB
 problems that demonstrates.
 
 The nova database is small enough that it should always be in-memory
 (if you're running a million VMs, I don't think asking for one
 gigabyte of RAM on your DB is unreasonable!)
 
 If it isn't hitting disk, PostgreSQL or MySQL with InnoDB can serve
 10k 'indexed' requests per second through SQL on a low-end ($1000)
 box.  With tuning you can get 10x that.  Using one of the SQL bypass
 engines (e.g. MySQL HandlerSocket) can supposedly give you 10x
 again.  Throwing money at the problem in the form of multi-processor
 boxes (or disks if you're I/O bound) can probably get you 10x again.
 
 However, if you put a DB on a remote host, you'll have to wait for a
 network round-trip per query.  If your ORM is doing a 1+N query, the
 total read time will be slow.  If your DB is doing a sync on every
 write, writes will be slow.  If the DB isn't tuned with a sensible
 amount of cache (at least as big as the DB size), it will be
 slow(er).  Each of these has a very simple fix for OpenStack.
 
 Relational databases have very efficient caching mechanisms built
 in.  Any out-of-process cache will have a hard time beating it.
  Let's make sure the bottleneck is the DB, and not (for example)
 RabbitMQ, before we go off a huge rearchitecture.
 
 Justin
 
 
 
 
 On Thu, Mar 22, 2012 at 7:53 PM, Mark Washenberger
 mark.washenber...@rackspace.com
 mailto:mark.washenber...@rackspace.com wrote:
 
 Working on this independently, I created a branch with some simple
 performance logging around the nova-api, and individually around
 glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
 copy and its on a different computer right now, and probably needs
 a rebase. I will rebase and publish it on GitHub tomorrow.)
 
 With this logging, I could get some simple profiling that I found
 very useful. Here is a GH project with the analysis code as well
 as some nova-api logs I was using as input.
 
 https://github.com/markwash/nova-perflog
 
 With these tools, you can get a wall-time profile for individual
 requests. For example, looking at one server create request (and
 you can run this directly from the checkout as the logs are saved
 there):
 
 markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python
 profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
 keycountavg
 nova.api.openstack.wsgi.POST   1  0.657
 nova.db.api.instance_update1  0.191
 nova.image.show1  0.179
 nova.db.api.instance_add_security_group1  0.082
 nova.rpc.cast  1  0.059
 nova.db.api.instance_get_all_by_filters1  0.034
 nova.db.api.security_group_get_by_name 2  0.029
 nova.db.api.instance_create1  0.011
 nova.db.api.quota_get_all_by_project   3  0.003
 nova.db.api.instance_data_get_for_project  1  0.003
 
 key  count  total
 nova.api.openstack.wsgi  1  0.657
 nova.db.api 10  0.388
 nova.image   1  0.179
 nova.rpc 1  0.059
 
 All times are in seconds. The nova.rpc time is probably high
 since this was the first call since server restart, so the
 connection handshake is probably included. This is also probably
 1.5 months stale.
 
 The conclusion I reached from this profiling is that we just plain
 overuse the db (and we might do the same in glance). For example,
 whenever we do updates, we actually re-retrieve the item from the
 database, update its dictionary, and save it. This is double the
 cost it needs to be. We also handle updates for data across tables
 inefficiently, where they could be handled in single database round
 trip.
 
 In particular, in the case of server listings, extensions are just
 rough on performance. Most extensions hit the database again
 at least once. This isn't really so bad, but it clearly is 

Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Yun Mao
Got it. Thanks,

If I read your number correctly, there are 10 db api calls, with total
time 0.388 seconds.

This is certainly not lightning fast. But it's not really slow, given
that the user is expecting to have the VM created in more than 10
seconds. 0.5 s latency is tolerable. If most of the time is spent in
network to db, then I'd say when we scale up a lot in compute/vm
numbers, the latency won't increase much.

One thing to note is that right now the DB APIs are all blocking
calls. So it could be tricky to get the performance number right when
measuring multiple concurrent requests.

Yun

On Fri, Mar 23, 2012 at 6:47 PM, Mark Washenberger
mark.washenber...@rackspace.com wrote:
 Yun,

 I was working with a very small but fairly realistic setup. In this
 case I had only 3 Xen hosts, no more than 10 nova vms up at a time.
 And the environment was very nearly fresh so I believe the db
 tables were as small as they could be. I believe the utilization
 across the board in my setup was very low, and indeed the numbers
 were very consistent (I ran a large number of times, but didn't
 save all of the data :-(). Also, there were only 2 compute nodes
 running, but as the workflow only had rpc casts, I'm not sure that
 really mattered very much.

 The profile I gave was for vm creation. But I also ran tests for
 deletion, listing, and showing vms in the OS API.

 Networks were static throughout the process. Volumes were absent.

 Yun Mao yun...@gmail.com said:

 Hi Mark,

 what workload and what setup do you have while you are profiling? e.g.
 how many compute nodes do you have, how many VMs do you have, are you
 creating/destroying/migrating VMs, volumes, networks?

 Thanks,

 Yun

 On Fri, Mar 23, 2012 at 4:26 PM, Mark Washenberger
 mark.washenber...@rackspace.com wrote:


 Johannes Erdfelt johan...@erdfelt.com said:


 MySQL isn't exactly slow and Nova doesn't have particularly large
 tables. It looks like the slowness is coming from the network and how
 many queries are being made.

 Avoiding joins would mean even more queries, which looks like it would
 slow it down even further.


 This is exactly what I saw in my profiling. More complex queries did
 still seem to take longer than less complex ones, but it was a second
 order effect compared to the overall volume of queries.

 I'm not sure that network was the culprit though, since my ping
 roundtrip time was small relative to the wall time I measured for each
 nova.db.api call.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp




 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-22 Thread Vishvananda Ishaya

On Mar 22, 2012, at 8:06 AM, Sandy Walsh wrote:

 o/
 
 Vek and myself are looking into caching strategies in and around Nova.
 
 There are essentially two approaches: in-process and external (proxy).
 The in-process schemes sit in with the python code while the external
 ones basically proxy the the HTTP requests.

We may need http caches as well in some cases, but we already use memcached in 
a few places, so I think we need internal caching as well.

 
 There are some obvious pro's and con's to each approach. The external is
 easier for operations to manage, but in-process allows us greater
 control over the caching (for things like caching db calls and not just
 HTTP calls). But, in-memory also means more code, more memory usage on
 the servers, monolithic services, limited to python based solutions,
 etc. In-process also gives us access to tools like Tach
 https://github.com/ohthree/tach for profiling performance.
 
 I see Jesse recently landed a branch that touches on the in-process
 approach:
 https://github.com/openstack/nova/commit/1bcf5f5431d3c9620596f5329d7654872235c7ee#nova/common/memorycache.py
 
 I don't know if people think putting caching code inside nova is a good
 or bad idea. If we do continue down this road, it would be nice to make
 it a little more modular/plug-in-based (YAPI .. yet another plug-in).
 Perhaps a hybrid solution is required?

openstack-common is where jesse was planning on putting memorycache

 
 We're looking at tools like memcache, beaker, varnish, etc.
 

I kind of like keeping our caching simple, just talking to something that is 
replicating the python-memcached api so that we can change out an in memory 
cache or actual memcached or db cache or etc...


This has a bit of promise:

http://code.google.com/p/python-cache/

Vish

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-22 Thread Jesse Andrews
Agree that there are pros and cons to caching at different layers.

As for plugins, in most places where we support memcache we revert to
an in-memory cache if it isn't configured.

The work that was done during essex was to make the metadata service
use either an external cache or internal cache.  When an cloud-init
based image boots it makes dozens of calls to get data, which results
in dozens of RPC calls to nova-network to map IP to instance and then
to the DB to load data.

-

Background info: The in-process cache I commited is actually the code
previously known as the fake memcache, moved from nova/tests to
nova/common.  The reason for the move is two-fold: 1) the code was
already in use as an in-memory cache in other locations in the code 2)
nova/tests isn't included in most (all?) packagings.

Also Josh Harlow has been researching better alternatives for
in-memory caches for python (rather than re-inventing the wheel -
which I started here: https://github.com/cloudbuilders/millicache ...)

--

So ya, I think a summit proposal would be good.

Jesse

On Thu, Mar 22, 2012 at 8:06 AM, Sandy Walsh sandy.wa...@rackspace.com wrote:
 o/

 Vek and myself are looking into caching strategies in and around Nova.

 There are essentially two approaches: in-process and external (proxy).
 The in-process schemes sit in with the python code while the external
 ones basically proxy the the HTTP requests.

 There are some obvious pro's and con's to each approach. The external is
 easier for operations to manage, but in-process allows us greater
 control over the caching (for things like caching db calls and not just
 HTTP calls). But, in-memory also means more code, more memory usage on
 the servers, monolithic services, limited to python based solutions,
 etc. In-process also gives us access to tools like Tach
 https://github.com/ohthree/tach for profiling performance.

 I see Jesse recently landed a branch that touches on the in-process
 approach:
 https://github.com/openstack/nova/commit/1bcf5f5431d3c9620596f5329d7654872235c7ee#nova/common/memorycache.py

 I don't know if people think putting caching code inside nova is a good
 or bad idea. If we do continue down this road, it would be nice to make
 it a little more modular/plug-in-based (YAPI .. yet another plug-in).
 Perhaps a hybrid solution is required?

 We're looking at tools like memcache, beaker, varnish, etc.

 Has anyone already started down this road already? Any insights to
 share? Opinions? (summit talk?)

 What are Glance, Swift, Keystone (lite?) doing?

 -S

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-22 Thread Mark Washenberger
What problems are caching strategies supposed to solve?

On the nova compute side, it seems like streamlining db access and
api-view tables would solve any performance problems caching would
address, while keeping the stale data management problem small.

Sandy Walsh sandy.wa...@rackspace.com said:

 o/
 
 Vek and myself are looking into caching strategies in and around Nova.
 
 There are essentially two approaches: in-process and external (proxy).
 The in-process schemes sit in with the python code while the external
 ones basically proxy the the HTTP requests.
 
 There are some obvious pro's and con's to each approach. The external is
 easier for operations to manage, but in-process allows us greater
 control over the caching (for things like caching db calls and not just
 HTTP calls). But, in-memory also means more code, more memory usage on
 the servers, monolithic services, limited to python based solutions,
 etc. In-process also gives us access to tools like Tach
 https://github.com/ohthree/tach for profiling performance.
 
 I see Jesse recently landed a branch that touches on the in-process
 approach:
 https://github.com/openstack/nova/commit/1bcf5f5431d3c9620596f5329d7654872235c7ee#nova/common/memorycache.py
 
 I don't know if people think putting caching code inside nova is a good
 or bad idea. If we do continue down this road, it would be nice to make
 it a little more modular/plug-in-based (YAPI .. yet another plug-in).
 Perhaps a hybrid solution is required?
 
 We're looking at tools like memcache, beaker, varnish, etc.
 
 Has anyone already started down this road already? Any insights to
 share? Opinions? (summit talk?)
 
 What are Glance, Swift, Keystone (lite?) doing?
 
 -S
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-22 Thread Sandy Walsh
We're doing tests to find out where the bottlenecks are, caching is the
most obvious solution, but there may be others. Tools like memcache do a
really good job of sharing memory across servers so we don't have to
reinvent the wheel or hit the db at all.

In addition to looking into caching technologies/approaches we're gluing
together some tools for finding those bottlenecks. Our first step will
be finding them, then squashing them ... however.

-S

On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?
 
 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-22 Thread Joshua Harlow
Just from experience.

They do a great job. But the killer thing about caching is how u do the cache 
invalidation.

Just caching stuff is easy-peasy, making sure it is invalidated on all servers 
in all conditions, not so easy...

On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:

We're doing tests to find out where the bottlenecks are, caching is the
most obvious solution, but there may be others. Tools like memcache do a
really good job of sharing memory across servers so we don't have to
reinvent the wheel or hit the db at all.

In addition to looking into caching technologies/approaches we're gluing
together some tools for finding those bottlenecks. Our first step will
be finding them, then squashing them ... however.

-S

On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?

 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-22 Thread Mark Washenberger
Working on this independently, I created a branch with some simple
performance logging around the nova-api, and individually around 
glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
copy and its on a different computer right now, and probably needs
a rebase. I will rebase and publish it on GitHub tomorrow.) 

With this logging, I could get some simple profiling that I found
very useful. Here is a GH project with the analysis code as well
as some nova-api logs I was using as input. 

https://github.com/markwash/nova-perflog

With these tools, you can get a wall-time profile for individual
requests. For example, looking at one server create request (and
you can run this directly from the checkout as the logs are saved
there):

markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python 
profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
keycountavg
nova.api.openstack.wsgi.POST   1  0.657
nova.db.api.instance_update1  0.191
nova.image.show1  0.179
nova.db.api.instance_add_security_group1  0.082
nova.rpc.cast  1  0.059
nova.db.api.instance_get_all_by_filters1  0.034
nova.db.api.security_group_get_by_name 2  0.029
nova.db.api.instance_create1  0.011
nova.db.api.quota_get_all_by_project   3  0.003
nova.db.api.instance_data_get_for_project  1  0.003

key  count  total
nova.api.openstack.wsgi  1  0.657
nova.db.api 10  0.388
nova.image   1  0.179
nova.rpc 1  0.059

All times are in seconds. The nova.rpc time is probably high
since this was the first call since server restart, so the
connection handshake is probably included. This is also probably
1.5 months stale.

The conclusion I reached from this profiling is that we just plain
overuse the db (and we might do the same in glance). For example,
whenever we do updates, we actually re-retrieve the item from the
database, update its dictionary, and save it. This is double the
cost it needs to be. We also handle updates for data across tables
inefficiently, where they could be handled in single database round
trip.

In particular, in the case of server listings, extensions are just
rough on performance. Most extensions hit the database again
at least once. This isn't really so bad, but it clearly is an area
where we should improve, since these are the most frequent api
queries.

I just see a ton of specific performance problems that are easier
to address one by one, rather than diving into a general (albeit
obvious) solution such as caching.


Sandy Walsh sandy.wa...@rackspace.com said:

 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.
 
 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.
 
 -S
 
 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?

 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.

 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-22 Thread Mark Washenberger
This is precisely my concern.

It must be brought up that with Rackspace Cloud Servers, nearly
all client codes routinely submit requests with a query parameter 
cache-busting=some random string just to get around problems with
cache invalidation. And woe to the client that does not.

I get the feeling that once trust like this is lost, a project has
a hard time regaining it. I'm not saying that we can avoid
inconsistency entirely. Rather, I believe we will have to embrace
some eventual-consistency models to enable the performance and
scale we will ultimately attain. But I just get the feeling that
generic caches are really only appropriate for write-once or at
least write-rarely data. So personally I would rule out external
caches entirely and try to be very judicious in selecting internal
caches as well.

Joshua Harlow harlo...@yahoo-inc.com said:

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 Just from experience.
 
 They do a great job. But the killer thing about caching is how u do the cache
 invalidation.
 
 Just caching stuff is easy-peasy, making sure it is invalidated on all 
 servers in
 all conditions, not so easy...
 
 On 3/22/12 4:26 PM, Sandy Walsh sandy.wa...@rackspace.com wrote:
 
 We're doing tests to find out where the bottlenecks are, caching is the
 most obvious solution, but there may be others. Tools like memcache do a
 really good job of sharing memory across servers so we don't have to
 reinvent the wheel or hit the db at all.
 
 In addition to looking into caching technologies/approaches we're gluing
 together some tools for finding those bottlenecks. Our first step will
 be finding them, then squashing them ... however.
 
 -S
 
 On 03/22/2012 06:25 PM, Mark Washenberger wrote:
 What problems are caching strategies supposed to solve?

 On the nova compute side, it seems like streamlining db access and
 api-view tables would solve any performance problems caching would
 address, while keeping the stale data management problem small.

 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
 



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp