On 03/24/2014 09:24 PM, Chris Friesen wrote:
The problem is that the RPC code in Havana doesn't handle connection
error exceptions. The oslo.messaging code used in Icehouse does.
If we ported
"https://github.com/openstack/oslo.messaging/commit/0400cbf4f83cf8d58076c7e65e08a156ec3508a8";
to Hava
On 03/24/2014 07:45 PM, Chris Behrens wrote:
Do you have some sort of network device like a firewall between your
compute and rabbit or you failed from one rabbit over to another?
There are two controllers (active/standby) and two computes all hooked
up to the same switch.
We definitely did
Jay had responded to a similar issue [1] some time ago (I swear I saw talk of
this last week but can’t find the newer thread). Since the posting referenced
we also found rabbit 3.2.x with esl erlang helped a ton.
tl;dr It is a client issue. See the thread for further details.
[1] http://list
Do you have some sort of network device like a firewall between your compute
and rabbit or you failed from one rabbit over to another? The only cases where
I've seen this happen is when the compute side OS doesn't detect a closed
connection for various reasons. I'm on my phone and didn't check
On 03/24/2014 11:31 AM, Chris Friesen wrote:
It looks like we're raising
RecoverableConnectionError: connection already closed
down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but
nothing handles it.
It looks like the most likely place that should be handling it is
nova.op
On 03/24/2014 10:59 AM, Dan Smith wrote:
Any ideas on what might be going on would be appreciated.
This looks like something that should be filed as a bug. I don't have
any ideas off hand, bit I will note that the reconnection logic works
fine for us in the upstream upgrade tests. That scenario
On 03/24/2014 10:41 AM, Chris Friesen wrote:
We've been stress-testing our system doing controlled switchover of
the controller. Normally this works okay, but we've run into a
situation that seems to show a flaw in the reconnection logic.
On the compute node, nova-compute has managed to get in
> Any ideas on what might be going on would be appreciated.
This looks like something that should be filed as a bug. I don't have
any ideas off hand, bit I will note that the reconnection logic works
fine for us in the upstream upgrade tests. That scenario includes
starting up a full stack, then t
We've been stress-testing our system doing controlled switchover of the
controller. Normally this works okay, but we've run into a situation that
seems to show a flaw in the reconnection logic.
On the compute node, nova-compute has managed to get into a state where it
shows as "down" in "nova