Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

Zane Bitter Thu, 13 Nov 2014 05:36:50 -0800

On 13/11/14 06:52, Angus Salkeld wrote:

On Thu, Nov 13, 2014 at 6:29 PM, Murugan, Visnusaran
<[email protected] <mailto:[email protected]>> wrote:


    Hi all,____

    __ __

    Convergence-POC distributes stack operations by sending resource
    actions over RPC for any heat-engine to execute. Entire stack
    lifecycle will be controlled by worker/observer notifications. This
    distributed model has its own advantages and disadvantages.____

    __ __

    Any stack operation has a timeout and a single engine will be
    responsible for it. If that engine goes down, timeout is lost along
    with it. So a traditional way is for other engines to recreate
    timeout from scratch. Also a missed resource action notification
    will be detected only when stack operation timeout happens. __ __

    __ __

    To overcome this, we will need the following capability:____

    __1.__Resource timeout (can be used for retry)

We will shortly have a worker job, can't we have a job that just sleeps
that gets started in parallel with the job that is doing the work?
It gets to the end of the sleep and runs a check.

What if that worker dies too? There's no guarantee that it'd even be adifferent worker. In fact, there's not even a guarantee that we'd havemultiple workers.

BTW Steve Hardy's suggestion, which I have more or less come around to,is that the engines themselves should be the workers in convergence, tosave operators deploying two types of processes. (The observers willstill be a separate process though, in phase 2.)

    ____

    __2.__Recover from engine failure (loss of stack timeout, resource
    action notification)____

    __


My suggestion above could catch failures as long as it was run in a
different process.

-Angus

    __

    __ __

    Suggestion:____

    __1.__Use task queue like celery to host timeouts for both stack and
    resource.____

    __2.__Poll database for engine failures and restart timers/
    retrigger resource retry (IMHO: This would be a traditional and
    weighs heavy)____

    __3.__Migrate heat to use TaskFlow. (Too many code change)____

    __ __

    I am not suggesting we use Task Flow. Using celery will have very
    minimum code change. (decorate appropriate functions) ____

    __ __

    __ __

    Your thoughts.____

    __ __

    -Vishnu____

    IRC: ckmvishnu____


    _______________________________________________
    OpenStack-dev mailing list
    [email protected]
    <mailto:[email protected]>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

Reply via email to