+1 on this. In general rabbitmq connectivity/failover is pretty terrible. Services look to be connected to rabbitmq but in reality they aren't, monitoring on the server to see if it has an established connection to rabbitmq isn't enough. Our experience is pretty much the same on anything that is using rabbitmq - not just nova-compute. The issue seems to be that it can send messages, but it doesn't actually pull messages from the queue. Also, when we restart a rabbit node in the cluster, connections typically have issues re-establishing and we need to restart most services to fix the issue. ____________________________________________
Kris Lindgren Senior Linux Systems Engineer GoDaddy, LLC. From: Gustavo Randich <[email protected]<mailto:[email protected]>> Date: Thursday, January 15, 2015 at 8:34 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [Openstack-operators] Way to check compute <-> rabbitmq connectivity Hi, I'm experiencing some issues with nova-compute services not responding to rabbitmq messages, despite the service reporting OK state via periodic tasks. Apparently the TCP connection is open but in a stale or unresponsive state. This happens sporadically when there is some not yet understood network problem. Restarting nova-compute solves the problem. Is there any way, preferably via openstack API, to probe service responsiveness, i.e., that it consumes messages, so we can program an alert? Thanks in advance!
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
