** Description changed: Under an HA deployment, neutron-openvswitch-agent can get stuck when receiving a close command on a fanout queue the agent is not subscribed to. It stops responding to any other messages, so it stops effectively working at all. 2014-11-11 10:27:33.092 3027 INFO neutron.common.config [-] Logging enabled! 2014-11-11 10:27:34.285 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672 2014-11-11 10:27:34.370 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672 2014-11-11 10:27:35.348 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent initialized successfully, now running... 2014-11-11 10:27:35.351 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent out of sync with plugin! 2014-11-11 10:27:35.401 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent tunnel out of sync with plugin! 2014-11-11 10:27:35.414 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672 2014-11-11 10:32:33.143 3027 INFO neutron.agent.securitygroups_rpc [req-22c7fa11-882d-4278-9f83-6dd56ab95ba4 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413'] 2014-11-11 10:58:11.916 3027 INFO neutron.agent.securitygroups_rpc [req-484fd71f-8f61-496c-aa8a-2d3abf8de365 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413'] 2014-11-11 10:59:43.954 3027 INFO neutron.agent.securitygroups_rpc [req-2c0bc777-04ed-470a-aec5-927a59100b89 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413'] 2014-11-11 11:00:22.500 3027 INFO neutron.agent.securitygroups_rpc [req-df447d01-d132-40f2-8528-1c1c4d57c0f5 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413'] 2014-11-12 01:27:35.662 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last): 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs) 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout) 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs) 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs) 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common chanmap, None, timeout=timeout, 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common channel, method_sig, args, content = read_timeout(timeout) 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.method_reader.read_method() 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common raise m 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed 2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common 2014-11-12 01:27:35.695 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672 2014-11-12 01:27:35.722 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672 2014-11-12 02:00:22.682 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last): 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs) 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout) 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs) 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs) 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common chanmap, None, timeout=timeout, 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common channel, method_sig, args, content = read_timeout(timeout) 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.method_reader.read_method() 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common raise m 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed 2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common 2014-11-12 02:00:22.683 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672 2014-11-12 02:00:23.017 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672 2014-11-12 02:00:23.021 3027 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2014-11-12 02:00:23.021 3027 TRACE root Traceback (most recent call last): 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func 2014-11-12 02:00:23.021 3027 TRACE root return infunc(*args, **kwargs) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread 2014-11-12 02:00:23.021 3027 TRACE root self.consume() 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 737, in consume 2014-11-12 02:00:23.021 3027 TRACE root six.next(it) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 664, in iterconsume 2014-11-12 02:00:23.021 3027 TRACE root yield self.ensure(_error_callback, _consume) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure 2014-11-12 02:00:23.021 3027 TRACE root return method(*args, **kwargs) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 657, in _consume 2014-11-12 02:00:23.021 3027 TRACE root queues_tail.consume(nowait=False) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 190, in consume 2014-11-12 02:00:23.021 3027 TRACE root self.queue.consume(*args, callback=_callback, **options) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 598, in consume 2014-11-12 02:00:23.021 3027 TRACE root nowait=nowait) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1769, in basic_consume 2014-11-12 02:00:23.021 3027 TRACE root (60, 21), # Channel.basic_consume_ok 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 71, in wait 2014-11-12 02:00:23.021 3027 TRACE root return self.dispatch_method(method_sig, args, content) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 88, in dispatch_method 2014-11-12 02:00:23.021 3027 TRACE root return amqp_method(self, args) 2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 224, in _close 2014-11-12 02:00:23.021 3027 TRACE root raise ChannelError(reply_code, reply_text, (class_id, method_id)) 2014-11-12 02:00:23.021 3027 TRACE root ChannelError: 404: (NOT_FOUND - no queue 'q-agent-notifier-port-update_fanout_cc21f47607704321860757b7e6a1194a' in vhost '/', (60, 20), None) 2014-11-12 02:00:23.021 3027 TRACE root 2014-11-12 02:01:24.268 3027 ERROR root [-] Unexpected exception occurred 61 time(s)... retrying. 2014-11-12 02:01:24.268 3027 TRACE root Traceback (most recent call last): 2014-11-12 02:01:24.268 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func 2014-11-12 02:01:24.268 3027 TRACE root return infunc(*args, **kwargs) 2014-11-12 02:01:24.268 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread - --------------------------- [Impact] This patch addresses an issue under a RabbitMQ HA deployment where neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port- update_fanout_xx' error when one of the RabbitMQ cluster node goes down, if there are more than 100 nova compute nodes, all neutron agents are down which is awful, even restart neutron-openvswitch agent can solve it, it is not the idea reality to restart all of the agents on all compute nodes, it broke HA. [Test Case] Note steps are for trusty-icehouse, including neutron package 1:2014.1.5-0ubuntu1. Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly kill one of the rabbit nodes (e.g. sudo service rabbitmq-server stop, etc). Observe that the neutron agents stopped to consume messages and keep throw no queue 'q-agent-notifier-port-update_fanout..' exception. [Regression Potential] - None. + The regression potential is low. The fix is fairly minimal and is + limited to the code path where a 404 error occurs. [Other Info] Oslo library has this fix, but due to Neutron is using kombu other than oslo library in Icehouse, it still suffer this issue.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1393391 Title: neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port- update_fanout.. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1393391/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
