Here is the bug I’ve been tracking related to this for a while.  I haven’t 
really kept up to speed with it, so I don’t know the current status.

https://bugs.launchpad.net/nova/+bug/856764


From: Kris Lindgren <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>
Date: Thursday, January 15, 2015 at 12:10 PM
To: Gustavo Randich 
<gustavo.rand...@gmail.com<mailto:gustavo.rand...@gmail.com>>, OpenStack 
Operators 
<openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq 
connectivity

During the Atlanta ops meeting this topic came up and I specifically mentioned 
about adding a "no-op" or healthcheck ping to the rabbitmq stuff to both nova & 
neutron.  The dev's in the room looked at me like I was crazy, but it was so 
that we could exactly catch issues as you described.  I am also interested if 
any one knows of a lightweight call that could be used to verify/confirm 
rabbitmq connectivity as well.  I haven't been able to devote time to dig into 
it.  Mainly because if one client is having issues - you will notice other 
clients are having similar/silent errors and a restart of all the things is the 
easiest way to fix, for us atleast.
____________________________________________

Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.


From: Gustavo Randich 
<gustavo.rand...@gmail.com<mailto:gustavo.rand...@gmail.com>>
Date: Thursday, January 15, 2015 at 11:53 AM
To: 
"openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>"
 
<openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq 
connectivity

Just to add one more background scenario, we also had similar problems trying 
to load balance rabbitmq via F5 Big IP LTM. For that reason we don't use it 
now. Our installation is a single rabbitmq instance and no intermediaries 
(albeit network switches). We use Folsom and Icehouse, the problem being 
perceived more in Icehouse nodes.

We are already monitoring message queue size, but we would like to pinpoint in 
semi-realtime the specific hosts/racks/network paths experiencing the "stale 
connection" before a user complains about an operation being stuck, or even 
hosts with no such pending operations but already "disconnected" -- we also 
could diagnose possible network causes and avoid massive service restarting.

So, for now, if someone knows about a cheap and quick openstack operation that 
triggers a message interchange between rabbitmq and nova-compute and a way of 
checking the result it would be great.




On Thu, Jan 15, 2015 at 1:45 PM, Kris G. Lindgren 
<klindg...@godaddy.com<mailto:klindg...@godaddy.com>> wrote:
We did have an issue using celery  on an internal application that we wrote - 
but I believe it was fixed after much failover testing and code changes.  We 
also use logstash via rabbitmq and haven't noticed any issues there either.

So this seems to be just openstack/oslo related.

We have tried a number of different configurations - all of them had their 
issues.  We started out listing all the members in the cluster on the 
rabbit_hosts line.  This worked most of the time without issue, until we would 
restart one of the servers, then it seemed like the clients wouldn't figure out 
they were disconnected and reconnect to the next host.

In an attempt to solve that we moved to using harpoxy to present a vip that we 
configured in the rabbit_hosts line.  This created issues with long lived 
connections disconnects and a bunch of other issues.  In our production 
environment we moved to load balanced rabbitmq, but using a real loadbalancer, 
and don’t have the weird disconnect issues.  However, anytime we reboot/take 
down a rabbitmq host or pull a member from the cluster we have issues, or if 
their is a network disruption we also have issues.

Thinking the best course of action is to move rabbitmq off on to its own box 
and to leave it alone.

Does anyone have a rabbitmq setup that works well and doesn’t have random 
issues when pulling nodes for maintenance?
____________________________________________

Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.


From: Joe Topjian <j...@topjian.net<mailto:j...@topjian.net>>
Date: Thursday, January 15, 2015 at 9:29 AM
To: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>
Cc: 
"openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>"
 
<openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq 
connectivity

Hi Kris,

 Our experience is pretty much the same on anything that is using rabbitmq - 
not just nova-compute.

Just to clarify: have you experienced this outside of OpenStack (or Oslo)?

We've seen similar issues with rabbitmq and OpenStack. We used to run rabbit 
through haproxy and tried a myriad of options like setting no timeouts, very 
very long timeouts, etc, but would always eventually see similar issues as 
described.

Last month, we reconfigured all OpenStack components to use the `rabbit_hosts` 
option with all nodes in our cluster listed. So far this has worked well, 
though I probably just jinxed myself. :)

We still have other services (like Sensu) using the same rabbitmq cluster and 
accessing it through haproxy. We've never had any issues there.

What's also strange is that I have another OpenStack deployment (from Folsom to 
Icehouse) with just a single rabbitmq server installed directly on the cloud 
controller (meaning: no nova-compute). I never have any rabbit issues in that 
cloud.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to