Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 09:24 PM, Chris Friesen wrote: The problem is that the RPC code in Havana doesn't handle connection error exceptions. The oslo.messaging code used in Icehouse does. If we ported "https://github.com/openstack/oslo.messaging/commit/0400cbf4f83cf8d58076c7e65e08a156ec3508a8"; to Hava

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 07:45 PM, Chris Behrens wrote: Do you have some sort of network device like a firewall between your compute and rabbit or you failed from one rabbit over to another? There are two controllers (active/standby) and two computes all hooked up to the same switch. We definitely did

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread John Dewey
Jay had responded to a similar issue [1] some time ago (I swear I saw talk of this last week but can’t find the newer thread). Since the posting referenced we also found rabbit 3.2.x with esl erlang helped a ton. tl;dr It is a client issue. See the thread for further details. [1] http://list

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Behrens
Do you have some sort of network device like a firewall between your compute and rabbit or you failed from one rabbit over to another? The only cases where I've seen this happen is when the compute side OS doesn't detect a closed connection for various reasons. I'm on my phone and didn't check

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 11:31 AM, Chris Friesen wrote: It looks like we're raising RecoverableConnectionError: connection already closed down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but nothing handles it. It looks like the most likely place that should be handling it is nova.op

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 10:59 AM, Dan Smith wrote: Any ideas on what might be going on would be appreciated. This looks like something that should be filed as a bug. I don't have any ideas off hand, bit I will note that the reconnection logic works fine for us in the upstream upgrade tests. That scenario

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 10:41 AM, Chris Friesen wrote: We've been stress-testing our system doing controlled switchover of the controller. Normally this works okay, but we've run into a situation that seems to show a flaw in the reconnection logic. On the compute node, nova-compute has managed to get in

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Dan Smith
> Any ideas on what might be going on would be appreciated. This looks like something that should be filed as a bug. I don't have any ideas off hand, bit I will note that the reconnection logic works fine for us in the upstream upgrade tests. That scenario includes starting up a full stack, then t

[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
We've been stress-testing our system doing controlled switchover of the controller. Normally this works okay, but we've run into a situation that seems to show a flaw in the reconnection logic. On the compute node, nova-compute has managed to get into a state where it shows as "down" in "nova