Hi all, I was working on spec and prototype to make Cinder to be able to resume workflows in case of server or service failure. Problem of requests lost and resources left in unresolved states in case of failure was signaled at the Paris Summit.
What I was able to prototype was to resume running tasks locally after service restart using persistence API provided by TaskFlow. However core team agreed that we should aim at resuming workflows globally even by other service instances (which I think is a good decision). There are few major problems blocking this approach: 1. Need of distributed lock to avoid same task being resumed by two instances of a service. Do we need tooz to do that or is there any other solution? 2. Are we going to step out from using TaskFlow? Such idea came up at the mid-cycle meetup, what's the status of it? Without TaskFlow's persistence implementing task resumptions would be a lot more difficult. 3. In case of cinder-api service we're unable to monitor it's state using servicegroup API. Do we have alternatives here to make decision if particular workflow being processed by cinder-api is abandoned? As this topic is deferred to Liberty release I want to start discussion here to be continued at the summit.  https://review.openstack.org/#/c/147879/  https://review.openstack.org/#/c/152200/  https://etherpad.openstack.org/p/kilo-crossproject-ha-integration __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev