Hi all,

I was working on spec[1] and prototype[2] to make Cinder to be able to resume 
workflows in case of server or service failure. Problem of requests lost and 
resources left in unresolved states in case of failure was signaled at the 
Paris Summit[3].

What I was able to prototype was to resume running tasks locally after service 
restart using persistence API provided by TaskFlow. However core team agreed 
that we should aim at resuming workflows globally even by other service 
instances (which I think is a good decision).

There are few major problems blocking this approach:

1. Need of distributed lock to avoid same task being resumed by two instances 
of a service. Do we need tooz to do that or is there any other solution?
2. Are we going to step out from using TaskFlow? Such idea came up at the 
mid-cycle meetup, what's the status of it? Without TaskFlow's persistence 
implementing task resumptions would be a lot more difficult.
3. In case of cinder-api service we're unable to monitor it's state using 
servicegroup API. Do we have alternatives here to make decision if particular 
workflow being processed by cinder-api is abandoned?

As this topic is deferred to Liberty release I want to start discussion here to 
be continued at the summit.

[1] https://review.openstack.org/#/c/147879/
[2] https://review.openstack.org/#/c/152200/
[3] https://etherpad.openstack.org/p/kilo-crossproject-ha-integration

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Reply via email to