Do you have rabbitmq/oslo messaging heartbeats enabled? If you aren't using heartbeats it will take a long time for the nova-compute agent to figure out that its actually no longer attached to anything. Heartbeat does periodic checks against rabbitmq and will catch this state and reconnect.
___________________________________________________________________ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <[email protected]<mailto:[email protected]>> Date: Thursday, April 21, 2016 at 11:43 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Hi I am seeing on Kilo if I bring down one contoller node sometimes some computes report down forever. I need to restart the compute service on compute node to recover. Looks like oslo is not reconnecting in nova-compute Here is the Trace from nova-compute 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=self.retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db timeout=timeout, retry=retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db result = self._waiter.wait(msg_id, timeout) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db message = self.waiters.get(msg_id, timeout=timeout) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 'to message ID %s' % msg_id) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID e064b5f6c8244818afdc5e91fff8ebf1 Any thougths. I am at stable/kilo for oslo Ajay
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
