On Fri, Sep 4, 2015 at 12:48 AM Zane Bitter <[email protected]> wrote:
> On 03/09/15 02:56, Angus Salkeld wrote: > > On Thu, Sep 3, 2015 at 3:53 AM Zane Bitter <[email protected] > > <mailto:[email protected]>> wrote: > > > > On 02/09/15 04:55, Steven Hardy wrote: > > > On Wed, Sep 02, 2015 at 04:33:36PM +1200, Robert Collins wrote: > > >> On 2 September 2015 at 11:53, Angus Salkeld > > <[email protected] <mailto:[email protected]>> wrote: > > >> > > >>> 1. limit the number of resource actions in parallel (maybe base > > on the > > >>> number of cores) > > >> > > >> I'm having trouble mapping that back to 'and heat-engine is > > running on > > >> 3 separate servers'. > > > > > > I think Angus was responding to my test feedback, which was a > > different > > > setup, one 4-core laptop running heat-engine with 4 worker > processes. > > > > > > In that environment, the level of additional concurrency becomes > > a problem > > > because all heat workers become so busy that creating a large > stack > > > DoSes the Heat services, and in my case also the DB. > > > > > > If we had a configurable option, similar to num_engine_workers, > which > > > enabled control of the number of resource actions in parallel, I > > probably > > > could have controlled that explosion in activity to a more > > managable series > > > of tasks, e.g I'd set num_resource_actions to > > (num_engine_workers*2) or > > > something. > > > > I think that's actually the opposite of what we need. > > > > The resource actions are just sent to the worker queue to get > processed > > whenever. One day we will get to the point where we are overflowing > the > > queue, but I guarantee that we are nowhere near that day. If we are > > DoSing ourselves, it can only be because we're pulling *everything* > off > > the queue and starting it in separate greenthreads. > > > > > > worker does not use a greenthread per job like service.py does. > > This issue is if you have actions that are fast you can hit the db hard. > > > > QueuePool limit of size 5 overflow 10 reached, connection timed out, > > timeout 30 > > > > It seems like it's not very hard to hit this limit. It comes from simply > > loading > > the resource in the worker: > > "/home/angus/work/heat/heat/engine/worker.py", line 276, in > check_resource > > "/home/angus/work/heat/heat/engine/worker.py", line 145, in > _load_resource > > "/home/angus/work/heat/heat/engine/resource.py", line 290, in load > > resource_objects.Resource.get_obj(context, resource_id) > > This is probably me being naive, but that sounds strange. I would have > thought that there is no way to exhaust the connection pool by doing > lots of actions in rapid succession. I'd have guessed that the only way > to exhaust a connection pool would be to have lots of connections open > simultaneously. That suggests to me that either we are failing to > expeditiously close connections and return them to the pool, or that we > are - explicitly or implicitly - processing a bunch of messages in > parallel. > I suspect we are leaking sessions, I have updated this bug to make sure we focus on figuring out the root cause of this before jumping to conclusions: https://bugs.launchpad.net/heat/+bug/1491185 -A > > > In an ideal world, we might only ever pull one task off that queue > at a > > time. Any time the task is sleeping, we would use for processing > stuff > > off the engine queue (which needs a quick response, since it is > serving > > the ReST API). The trouble is that you need a *huge* number of > > heat-engines to handle stuff in parallel. In the reductio-ad-absurdum > > case of a single engine only processing a single task at a time, > we're > > back to creating resources serially. So we probably want a higher > number > > than 1. (Phase 2 of convergence will make tasks much smaller, and may > > even get us down to the point where we can pull only a single task > at a > > time.) > > > > However, the fewer engines you have, the more greenthreads we'll > have to > > allow to get some semblance of parallelism. To the extent that more > > cores means more engines (which assumes all running on one box, but > > still), the number of cores is negatively correlated with the number > of > > tasks that we want to allow. > > > > Note that all of the greenthreads run in a single CPU thread, so > having > > more cores doesn't help us at all with processing more stuff in > > parallel. > > > > > > Except, as I said above, we are not creating greenthreads in worker. > > Well, maybe we'll need to in order to make things still work sanely with > a low number of engines :) (Should be pretty easy to do with a semaphore.) > > I think what y'all are suggesting is limiting the number of jobs that go > into the queue... that's quite wrong IMO. Apart from the fact it's > impossible (resources put jobs into the queue entirely independently, > and have no knowledge of the global state required to throttle inputs), > we shouldn't implement an in-memory queue with long-running tasks > containing state that can be lost if the process dies - the whole point > of convergence is we have... a message queue for that. We need to limit > the rate that stuff comes *out* of the queue. And, again, since we have > no knowledge of global state, we can only control the rate at which an > individual worker processes tasks. The way to avoid killing the DB is to > out a constant ceiling on the workers * concurrent_tasks_per_worker > product. > > cheers, > Zane. > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
