Re: [openstack-dev] [Mistral] Refine engine <-> executor protocol
On 13 Jun 2014, at 07:03, W Chan wrote: > Design proposal for blueprint > https://blueprints.launchpad.net/mistral/+spec/mistral-engine-executor-protocol > Rename Executor to Worker. I’d be ok with Worker but would prefer ActionRunner since it reflects the purpose a little better although being more verbose. > Continue to use RPC server-client model in oslo.messaging for Engine and > Worker protocol. Sure. > Use asynchronous call (cast) between Worker and Engine where appropriate. I would even emphasize: only async calls make sense. > Remove any DB access from Worker. DB IO will only be done by Engine. I still have doubts it’s actually possible. This is a part of the issue I mentioned in the previous email. I’ll post more detailed email on that separately. > Worker updates Engine that it's going to start running action now. If > execution is not RUNNING and task is not IDLE, Engine tells Worker to halt at > this point. Worker cannot assume execution state is RUNNING and task state > is IDLE because the handle_task message could have been sitting in the > message queue for awhile. This call between Worker and Engine is > synchronous, meaning Worker will wait for a response from the Engine. > Currently, Executor checks state and updates task state directly to the DB > before running the action. Yes, that’s how it works now. First of all, like I said before we can’t afford making any sync calls between engine and executor because it’ll lead to problems with scalability and fault tolerance. So for that reason we make DB calls directly to make sure that execution and the task itself are in the suitable state. This would only work reliably for READ_COMMITTED transactions used in both engine and executor which I believe currently isn’t true since we use sqlite (it doesn’t seem to support them, right?). With mysql it should be fine. So the whole initial idea was to use DB whenever we need to make sure that something is in a right state. That’s why all the reads should see only committed data. And we use queue just to notify executors about new tasks. Basically we could have even not used a queue and instead used db poll but with queue it looked more elegant. It’s all part of one problem. Let’s try to figure out options to simplify the protocol and make it more reliable. > Worker communicates result (success or failure) to Engine. Currently, > Executor is inconsistent and calls Engine.convey_task_result on success and > write directly to DB on failure. Yes, that probably needs to be fixed. > Sequence > Engine -> Worker.handle_task > Worker converts action spec to Action instance Yes, it uses action spec in case if it’s ad-hoc action. If not, it just gets action class from the factory and instantiate it. > Worker -> Engine.confirm_task_execution. Engine returns an exception if > execution state is not RUNNING or task state is not IDLE. Maybe I don’t entirely follow your thought but I think it’s not going to work. After engine confirms everything’s OK we’ll have a concurrency window again after that we’ll have to confirm the states again. That’s why I was talking about READ_COMMITTED DB transactions: we need to eliminate concurrency windows. > Worker runs action > Worker -> Engine.convey_task_result That looks fine (it’s as it is now). Maybe the only thing we need to pay attention to is to how we communicate errors back to engine. It seems logical that “convey_task_result()” can also be used to pass information about errors that that error is considered a special case of a regular result. Need to think it over though... Renat Akhmerov @ Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Mistral] Refine engine <-> executor protocol
Hi Winson, Sorry, I haven’t responded so far for I was on vacation. So getting back to you now.. It’s my fault that the notes in the BP are not fairly clear. 1. By “worker parallelism” we meant that one worker (which is called “executor" now) can poll and handle more than one task from the task queue (it’s not abstracted out from the notion of queue but anyway). It would be a nice feature because it would allow to tune the system performance much more accurately. 2. What “engine-executor parallelism” means I honestly don’t remember :) I guess this is a note made by Dmitri so he may be better aware. Dmitri? As far as engine<->executor interaction we now have an issue with it that we need to fix but it’s not related with parallelism. The protocol itself is not 100% complete in terms of reliability. Thanks Renat Akhmerov @ Mirantis Inc. On 06 Jun 2014, at 23:12, W Chan wrote: > Regarding blueprint > https://blueprints.launchpad.net/mistral/+spec/mistral-engine-executor-protocol, > can you clarify what it means by worker parallelism and engine-executor > parallelism? Currently, the engine and executor are launched with the > eventlet driver in oslo.messaging. Once a message arrives over transport, a > new green thread is spawned and passed to the dispatcher. In the case of > executor, the function being dispatched to is handle_task. I'm unclear what > additional parallelism this blueprint is referring to. The context isn't > clear from the summit notes. > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Mistral] Refine engine <-> executor protocol
Design proposal for blueprint https://blueprints.launchpad.net/mistral/+spec/mistral-engine-executor-protocol - Rename Executor to Worker. - Continue to use RPC server-client model in oslo.messaging for Engine and Worker protocol. - Use asynchronous call (cast) between Worker and Engine where appropriate. - Remove any DB access from Worker. DB IO will only be done by Engine. - Worker updates Engine that it's going to start running action now. If execution is not RUNNING and task is not IDLE, Engine tells Worker to halt at this point. Worker cannot assume execution state is RUNNING and task state is IDLE because the handle_task message could have been sitting in the message queue for awhile. This call between Worker and Engine is synchronous, meaning Worker will wait for a response from the Engine. Currently, Executor checks state and updates task state directly to the DB before running the action. - Worker communicates result (success or failure) to Engine. Currently, Executor is inconsistent and calls Engine.convey_task_result on success and write directly to DB on failure. Sequence 1. Engine -> Worker.handle_task 2. Worker converts action spec to Action instance 3. Worker -> Engine.confirm_task_execution. Engine returns an exception if execution state is not RUNNING or task state is not IDLE. 4. Worker runs action 5. Worker -> Engine.convey_task_result Please provide feedback. Thanks. Winson On Fri, Jun 6, 2014 at 9:12 AM, W Chan wrote: > Renat, > > Regarding blueprint > https://blueprints.launchpad.net/mistral/+spec/mistral-engine-executor-protocol, > can you clarify what it means by worker parallelism and engine-executor > parallelism? > Currently, the engine and executor are launched with the eventlet driver > in oslo.messaging. Once a message arrives over transport, a new green > thread is spawned and passed to the dispatcher. In the case of executor, > the function being dispatched to is handle_task. I'm unclear what > additional parallelism this blueprint is referring to. The context isn't > clear from the summit notes. > > Thanks. > Winson > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Mistral] Refine engine <-> executor protocol
Renat, Regarding blueprint https://blueprints.launchpad.net/mistral/+spec/mistral-engine-executor-protocol, can you clarify what it means by worker parallelism and engine-executor parallelism? Currently, the engine and executor are launched with the eventlet driver in oslo.messaging. Once a message arrives over transport, a new green thread is spawned and passed to the dispatcher. In the case of executor, the function being dispatched to is handle_task. I'm unclear what additional parallelism this blueprint is referring to. The context isn't clear from the summit notes. Thanks. Winson ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev