On 07/08/16 19:52, Clint Byrum wrote:
Excerpts from Steve Baker's message of 2016-08-08 10:11:29 +1200:
On 05/08/16 21:48, Ricardo Rocha wrote:
Hi.

Quick update is 1000 nodes and 7 million reqs/sec :) - and the number
of requests should be higher but we had some internal issues. We have
a submission for barcelona to provide a lot more details.

But a couple questions came during the exercise:

1. Do we really need a volume in the VMs? On large clusters this is a
burden, and local storage only should be enough?

2. We observe a significant delay (~10min, which is half the total
time to deploy the cluster) on heat when it seems to be crunching the
kube_minions nested stacks. Once it's done, it still adds new stacks
gradually, so it doesn't look like it precomputed all the info in advance

Anyone tried to scale Heat to stacks this size? We end up with a stack
with:
* 1000 nested stacks (depth 2)
* 22000 resources
* 47008 events

And already changed most of the timeout/retrial values for rpc to get
this working.

This delay is already visible in clusters of 512 nodes, but 40% of the
time in 1000 nodes seems like something we could improve. Any hints on
Heat configuration optimizations for large stacks very welcome.

Yes, we recommend you set the following in /etc/heat/heat.conf [DEFAULT]:
max_resources_per_stack = -1

Enforcing this for large stacks has a very high overhead, we make this
change in the TripleO undercloud too.


Wouldn't this necessitate having a private Heat just for Magnum? Not
having a resource limit per stack would leave your Heat engines
vulnerable to being DoS'd by malicious users, since one can create many
many thousands of resources, and thus python objects, in just a couple
of cleverly crafted templates (which is why I added the setting).

Although when you added it, all of the resources in a tree of nested stacks got handled by a single engine, so sending a really big tree of nested stacks was an easy way to DoS Heat. That's no longer the case since Kilo; we farm the child stacks out over RPC, so the difficulty of carrying out a DoS increases in proportion to the number of cores you have running Heat whereas before it was constant. (This is also the cause of the performance problem, since counting all the resources in the tree when then entire thing was already loaded in-memory was easy.)

Convergence splits it up even further, farming out each _resource_ as well as each stack over RPC.

I had the thought that having a per-tenant resource limit might be both more effective at both protecting the limited resource and more efficient to calculate, since we could have the DB simply count the Resource rows for stacks in a given tenant instead of recursively loading all of the stacks in a tree and counting the resources in heat-engine. However, the tenant isn't stored directly in the Stack table, and people who know databases tell me the resulting joins would be fearsome.

I'm still not convinced it'd be worse than what we have now, even after Steve did a lot of work to make it much, much better than it was at one point ;)

This makes perfect sense in the undercloud of TripleO, which is a
private, single tenant OpenStack. But, for Magnum.. now you're talking
about the Heat that users have access to.

Indeed, and now that we're seeing other users of very large stacks (Sahara is another) I think we need to come up with a solution that is both efficient enough to use on a large/deep tree of nested stacks but can still be tuned to protect against DoS at whatever scale Heat is deployed at.

cheers,
Zane.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to