Re: [openstack-dev] [Heat] Convergence: Backing up template instead of stack

Anant Patil Wed, 24 Sep 2014 08:30:01 -0700

On 24-Sep-14 00:25, Joshua Harlow wrote:
> I believe heat has its own dependency graph implementation but if that was 
> switched to networkx[1] that library has a bunch of nice read/write 
> capabilities.
> 
> See: https://github.com/networkx/networkx/tree/master/networkx/readwrite
> 
> And one made for sqlalchemy @ https://pypi.python.org/pypi/graph-alchemy/
> 
> Networkx has worked out pretty well for taskflow (and I believe mistral is 
> also using it).
> 
> [1] https://networkx.github.io/
> 
> Something to think about...
> 
> On Sep 23, 2014, at 11:32 AM, Zane Bitter <[email protected]> wrote:
> 
>> On 23/09/14 09:44, Anant Patil wrote:
>>> On 23-Sep-14 09:42, Clint Byrum wrote:
>>>> Excerpts from Angus Salkeld's message of 2014-09-22 20:15:43 -0700:
>>>>> On Tue, Sep 23, 2014 at 1:09 AM, Anant Patil <[email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> One of the steps in the direction of convergence is to enable Heat
>>>>>> engine to handle concurrent stack operations. The main convergence spec
>>>>>> talks about it. Resource versioning would be needed to handle concurrent
>>>>>> stack operations.
>>>>>>
>>>>>> As of now, while updating a stack, a backup stack is created with a new
>>>>>> ID and only one update runs at a time. If we keep the raw_template
>>>>>> linked to it's previous completed template, i.e. have a back up of
>>>>>> template instead of stack, we avoid having backup of stack.
>>>>>>
>>>>>> Since there won't be a backup stack and only one stack_id to be dealt
>>>>>> with, resources and their versions can be queried for a stack with that
>>>>>> single ID. The idea is to identify resources for a stack by using stack
>>>>>> id and version. Please let me know your thoughts.
>>>>>>
>>>>>
>>>>> Hi Anant,
>>>>>
>>>>> This seems more complex than it needs to be.
>>>>>
>>>>> I could be wrong, but I thought the aim was to simply update the goal 
>>>>> state.
>>>>> The backup stack is just the last working stack. So if you update and 
>>>>> there
>>>>> is already an update you don't need to touch the backup stack.
>>>>>
>>>>> Anyone else that was at the meetup want to fill us in?
>>>>>
>>>>
>>>> The backup stack is a device used to collect items to operate on after
>>>> the current action is complete. It is entirely an implementation detail.
>>>>
>>>> Resources that can be updated in place will have their resource record
>>>> superseded, but retain their physical resource ID.
>>>>
>>>> This is one area where the resource plugin API is particularly sticky,
>>>> as resources are allowed to raise the "replace me" exception if in-place
>>>> updates fail. That is o-k though, at that point we will just comply by
>>>> creating a replacement resource as if we never tried the in-place update.
>>>>
>>>> In order to facilitate this, we must expand the resource data model to
>>>> include a version. Replacement resources will be marked as "current" and
>>>> to-be-removed resources marked for deletion. We can also keep all current
>>>> - 1 resources around to facilitate rollback until the stack reaches a
>>>> "complete" state again. Once that is done, we can remove the backup stack.
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> [email protected]
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>> Backup stack is a good way to take care of rollbacks or cleanups after
>>> the stack action is complete. By cleanup I mean the deletion of
>>> resources that are no longer needed after the new update. It works very
>>> well when one engine is processing the stack request and the stacks are
>>> in memory.
>>
>> It's actually a fairly terrible hack (I wrote it ;)
>>
>> It doesn't work very well because in practice during an update there are 
>> dependencies that cross between the real stack and the backup stack (due to 
>> some resources remaining the same or being updated in place, while others 
>> are moved to the backup stack ready for replacement). So in the event of a 
>> failure that we don't completely roll back on the spot, we lose some 
>> dependency information.
>>
>>> As a step towards distributing the stack request processing and making
>>> it fault-tolerant, we need to persist the dependency task graph. The
>>> backup stack can also be persisted along with the new graph, but then
>>> the engine has to traverse both the graphs to proceed with the operation
>>> and later identify the resources to be cleaned-up or rolled back using
>>> the stack id. There would be many resources for the same stack but
>>> different stack ids.
>>
>> Right, yeah this would be a mistake because in reality there is only one 
>> graph, so that's how we need to model it internally.
>>
>>> In contrast, when we store the current dependency task graph(from the
>>> latest request) in DB, and version the resources, we can identify those
>>> resources that need to be rolled-back or cleaned up after the stack
>>> operations is done, by comparing their versions. With versioning of
>>> resources and template, we can avoid creating a deep stack of backup
>>> stacks. The processing of stack operation can happen from multiple
>>> engines, and IMHO, it is simpler when all the engines just see one stack
>>> and versions of resources, instead of seeing many stacks with many
>>> resources for each stack.
>>
>> Bingo.
>>
>> I think all you need to do is record in the resource the particular template 
>> and set of parameters it was tied to (maybe just generate a UUID for each 
>> update... or perhaps a SHA hash of the actual data for better rollbacks?). 
>> Then any resource that isn't part of the latest template should get deleted 
>> during the cleanup phase of the dependency graph traversal.
>>
>> As you mentioned above, we'll also need to store the dependency graph of the 
>> stack in the database somewhere. Right now we generate it afresh from the 
>> template by assuming that each resource name corresponds to one entry in the 
>> DB. Since that will no longer be true, we'll need it to be a graph of 
>> resource IDs that we store.
>>
>> cheers,
>> Zane.
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> [email protected]
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
>


Thanks Zane for your thoughts.

I have submitted a spec under convergence spec to implement this and
would like all to review it. I created this spec so that we can submit
patches towards these specs. To identify which parts of convergence are
being addressed by the patches, I have created this subsidiary spec
under Convergence Specification.

- Anant

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Convergence: Backing up template instead of stack

Reply via email to