On Tue, Sep 1, 2015 at 10:45 PM Steven Hardy <sha...@redhat.com> wrote:
> On Fri, Aug 28, 2015 at 01:35:52AM +0000, Angus Salkeld wrote: > > Hi > > I have been running some rally tests against convergence and our > existing > > implementation to compare. > > So far I have done the following: > > 1. defined a template with a resource > > groupA > https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template > > 2. the inner resource looks like > > this:A > https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.templateA > (it > > uses TestResource to attempt to be a reasonable simulation of a > > server+volume+floatingip) > > 3. defined a rally > > job:A > https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yamlA > that > > creates X resources then updates to X*2 then deletes. > > 4. I then ran the above with/without convergence and with 2,4,8 > > heat-engines > > Here are the results compared: > > > https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing > > Some notes on the results so far: > > * A convergence with only 2 engines does suffer from RPC overload > (it > > gets message timeouts on larger templates). I wonder if this is > the > > problem in our convergence gate... > > * convergence does very well with a reasonable number of engines > > running. > > * delete is slightly slower on convergence > > Still to test: > > * the above, but measure memory usage > > * many small templates (run concurrently) > > So, I tried running my many-small-templates here with convergence enabled: > > https://bugs.launchpad.net/heat/+bug/1489548 > > In heat.conf I set: > > max_resources_per_stack = -1 > convergence_engine = true > > Most other settings (particularly RPC and DB settings) are defaults. > > Without convergence (but with max_resources_per_stack disabled) I see the > time to create a ResourceGroup of 400 nested stacks (each containing one > RandomString resource) is about 2.5 minutes (core i7 laptop w/SSD, 4 heat > workers e.g the default for a 4 core machine). > > With convergence enabled, I see these errors from sqlalchemy: > > File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 652, in > _checkout\n fairy = _ConnectionRecord.checkout(pool)\n', u' File > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 444, in > checkout\n rec = pool._do_get()\n', u' File > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 980, in > _do_get\n (self.size(), self.overflow(), self._timeout))\n', > u'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection > timed out, timeout 30\n']. > > I assume this means we're loading the DB much more in the convergence case > and overflowing the QueuePool? > Yeah, looks like it. > > This seems to happen when the RPC call from the ResourceGroup tries to > create some of the 400 nested stacks. > > Interestingly after this error, the parent stack moves to CREATE_FAILED, > but the engine remains (very) busy, to the point of being partially > responsive, so it looks like maybe the cancel-on-fail isnt' working (I'm > assuming it isn't error_wait_time because the parent stack has been marked > FAILED and I'm pretty sure it's been more than 240s). > > I'll dig a bit deeper when I get time, but for now you might like to try > the stress test too. It's a bit of a synthetic test, but it turns out to > be a reasonable proxy for some performance issues we observed when creating > large-ish TripleO deployments (which also create a large number of nested > stacks concurrently). > Thanks a lot for testing Steve! I'll make 2 bugs for what you have raised 1. limit the number of resource actions in parallel (maybe base on the number of cores) 2. the cancel on fail error -Angus > > Steve > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev