** Description changed: In the logs the first traceback that happen is this: [-] Unexpected exception occurred 1 time(s)... retrying. Traceback (most recent call last): - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 62, in inner_func - return infunc(*args, **kwargs) - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 741, in _consumer_thread - - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 732, in consume - @excutils.forever_retry_uncaught_exceptions - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 660, in iterconsume - try: - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 590, in ensure - def close(self): - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 531, in reconnect - # to return an error not covered by its transport - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 513, in _connect - Will retry up to self.max_retries number of times. - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 150, in reconnect - use the callback passed during __init__() - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 508, in declare - self.queue_bind(nowait) - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 541, in queue_bind - self.binding_arguments, nowait=nowait) - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 551, in bind_to - nowait=nowait) - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/channel.py", line 1003, in queue_bind - (50, 21), # Channel.queue_bind_ok - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 68, in wait - return self.dispatch_method(method_sig, args, content) - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 86, in dispatch_method - return amqp_method(self, args) - File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/channel.py", line 241, in _close - reply_code, reply_text, (class_id, method_id), ChannelError, + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 62, in inner_func + return infunc(*args, **kwargs) + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 741, in _consumer_thread + + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 732, in consume + @excutils.forever_retry_uncaught_exceptions + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 660, in iterconsume + try: + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 590, in ensure + def close(self): + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 531, in reconnect + # to return an error not covered by its transport + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 513, in _connect + Will retry up to self.max_retries number of times. + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 150, in reconnect + use the callback passed during __init__() + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 508, in declare + self.queue_bind(nowait) + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 541, in queue_bind + self.binding_arguments, nowait=nowait) + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 551, in bind_to + nowait=nowait) + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/channel.py", line 1003, in queue_bind + (50, 21), # Channel.queue_bind_ok + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 68, in wait + return self.dispatch_method(method_sig, args, content) + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 86, in dispatch_method + return amqp_method(self, args) + File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/amqp/channel.py", line 241, in _close + reply_code, reply_text, (class_id, method_id), ChannelError, NotFound: Queue.bind: (404) NOT_FOUND - no exchange 'reply_8f19344531b448c89d412ee97ff11e79' in vhost '/' + Than an RPC Timeout is raised each second in all the agents - Than an RPC Timeout is raised each second in all the agents - - ERROR neutron.agent.l3_agent [-] Failed synchronizing routers + ERROR neutron.agent.l3_agent [-] Failed synchronizing routers TRACE neutron.agent.l3_agent Traceback (most recent call last): TRACE neutron.agent.l3_agent File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/agent/l3_agent.py", line 702, in _rpc_loop - TRACE neutron.agent.l3_agent self.context, router_ids) + TRACE neutron.agent.l3_agent self.context, router_ids) TRACE neutron.agent.l3_agent File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/agent/l3_agent.py", line 79, in get_routers TRACE neutron.agent.l3_agent topic=self.topic) TRACE neutron.agent.l3_agent File "/opt/cloudbau/neutron-virtualenv/lib/python2.7/site-packages/neutron/openstack/common/rpc/proxy.py", line 130, in call TRACE neutron.agent.l3_agent exc.info, real_topic, msg.get('method')) TRACE neutron.agent.l3_agent Timeout: Timeout while waiting on RPC response - topic: "q-l3-plugin", RPC method: "sync_routers" info: "<unknown>" This actually make the agent useless until they are all restarted. An analyze of what's going on coming soon :) + + + --------------------------- + + [Impact] + + This patch addresses an issue when a RabbitMQ cluster node goes down, + OpenStack services try to reconnect to another RabbitMQ node and then + re-create everything from scratch , and due to the 'auto-delete' flag is + set, race condition happened between re-create and delete on Exchange, + Queues, Bindings, which caused nova-compute and neutron agents are down. + + [Test Case] + + Note steps are for trusty-icehouse, including latest oslo.messaging + library (1.3.0-0ubuntu1.2 at the time of this writing). + + Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly + kill one of the rabbit nodes (e.g. sudo service rabbitmq-server stop, + etc). Observe that the nova services and neutron agents do detect that + the node went down and report that they are reconnected, but messages + are still reporting as timed out, nova service-list/neutron agent-list + still reports compute and agents as down, etc. + + [Regression Potential] + + None.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318721 Title: RPC timeout in all neutron agents To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1318721/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs