An RPC-message TTL mechanism sounds like a good solution.
I've opened a launchpad bug, so we can move the discussion there, and see
if we can think of more ideas to solve this:
https://bugs.launchpad.net/nova/+bug/1571175
Also, please see this previous bug on the same issue:
>> I have wanted to make a change for a while that involves a TTL on
>> messages, along with a deadline record so that we can know when to retry
>> or revert things that were in flight. This requires a lot of machinery
>> to accomplish, and is probably interwoven with the task concept we've
>> had
Hi,
Just coming from my curiosity (inline).
On Thu, Apr 14, 2016 at 12:34 AM, Dan Smith wrote:
>> * nova-api should receive an acknowledgement from nova-compute. It is
>> unclear to me why today it uses a non-reply mechanism - probably to
>> free the worker as fast
> * nova-api should receive an acknowledgement from nova-compute. It is
> unclear to me why today it uses a non-reply mechanism - probably to
> free the worker as fast as it can.
Yes, wherever possible, we want the API to return immediately and let
the action complete later. Making a
Hi all,
There are some cases that a communication failure between the different
nova services, might cause a bad state in the system.
For example, when "shelving" a VM, nova-api puts the VM's task_state as
"shelving", sends an RPC to nova-compute, which shelves the VM, and resets
it's task_state