Re: [openstack-dev] [Nova] RPC Communication Errors Might Lead to a Bad State

2016-04-16 Thread Shoham Peller
An RPC-message TTL mechanism sounds like a good solution. I've opened a launchpad bug, so we can move the discussion there, and see if we can think of more ideas to solve this: https://bugs.launchpad.net/nova/+bug/1571175 Also, please see this previous bug on the same issue:

Re: [openstack-dev] [Nova] RPC Communication Errors Might Lead to a Bad State

2016-04-14 Thread Dan Smith
>> I have wanted to make a change for a while that involves a TTL on >> messages, along with a deadline record so that we can know when to retry >> or revert things that were in flight. This requires a lot of machinery >> to accomplish, and is probably interwoven with the task concept we've >> had

Re: [openstack-dev] [Nova] RPC Communication Errors Might Lead to a Bad State

2016-04-13 Thread Shinobu Kinjo
Hi, Just coming from my curiosity (inline). On Thu, Apr 14, 2016 at 12:34 AM, Dan Smith wrote: >> * nova-api should receive an acknowledgement from nova-compute. It is >> unclear to me why today it uses a non-reply mechanism - probably to >> free the worker as fast

Re: [openstack-dev] [Nova] RPC Communication Errors Might Lead to a Bad State

2016-04-13 Thread Dan Smith
> * nova-api should receive an acknowledgement from nova-compute. It is > unclear to me why today it uses a non-reply mechanism - probably to > free the worker as fast as it can. Yes, wherever possible, we want the API to return immediately and let the action complete later. Making a

[openstack-dev] [Nova] RPC Communication Errors Might Lead to a Bad State

2016-04-13 Thread Shoham Peller
Hi all, There are some cases that a communication failure between the different nova services, might cause a bad state in the system. For example, when "shelving" a VM, nova-api puts the VM's task_state as "shelving", sends an RPC to nova-compute, which shelves the VM, and resets it's task_state