Reviewed: https://review.openstack.org/482427 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=38d058c2cf0746e2452a0c2c704c914c836de9e7 Submitter: Jenkins Branch: master
commit 38d058c2cf0746e2452a0c2c704c914c836de9e7 Author: Dongcan Ye <[email protected]> Date: Tue Jul 11 15:15:23 2017 +0800 Fix generation of thousands of DHCP tap interfaces As reported in the bug, there may be an case where an empty namespace file in /run/netns, but the namespace not actually exist. In such case the DHCP agent throws an error when pluging the interface in the dhcp namespace. This may also result in many tap interfaces getting generated in OVS bridge or Linux bridge. This patch fixes the above bug by unpluging the tap device in the bridge if exception occurs, this can prevents the tap interfaces generate. Co-Authored-By: Brian Haley <[email protected]> Change-Id: I4a197edd180887ad36317ddb2f0c0e7bd2e34e30 Closes-Bug: #1561695 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1561695 Title: neutron-dhcp-agent generates thousands of interfaces on a failure Status in neutron: Fix Released Bug description: I ran into slowness on a new deploy of mitaka-rc1 code with neutron. I had ~13,000 tap devices that were created by dhcp-agent. The neutron database does not have these ports. As far as I can tell, neutron is no longer aware, or cares about those ports but they remain on the node (and in OpenVSwitch so a reboot wouldnt clear them). I do not know how the initial failure happened, but to reproduce this you can do the following: 1. Stop dhcp agent (and anything using the network namespace). 2. ip netns del qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02 3. touch /run/netns/qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02 4 Start the dhcp agent and watch it continually try to create (and then fail to cleanup) tap interfaces Over the course of ~4 hours this issue generate 13,000 interfaces and 4GB of logs (debug was turned on). How the initial issue came about I do not know but it did happen in normal usage. I believe the proper fix here would be _always_ clean up tap devices even on failures but I am not familiar with the neutron code enough to fix this. The output of `ip netns` when it has an invalid namespace looks like this: # ip netns RTNETLINK answers: Invalid argument RTNETLINK answers: Invalid argument qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02 The stack trace in neutron-dhcp-agent is: 2016-03-24 18:42:12.165 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--columns=ofport', 'list', 'Interface', 'tap42983a07-e0'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84 2016-03-24 18:42:12.275 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142 2016-03-24 18:42:12.276 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'address', 'fa:16:3e:79:1b:0a'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84 2016-03-24 18:42:12.384 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142 2016-03-24 18:42:12.385 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'mtu', '9000'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84 2016-03-24 18:42:12.495 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142 2016-03-24 18:42:12.496 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', '-o', 'netns', 'list'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84 2016-03-24 18:42:12.604 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142 2016-03-24 18:42:12.605 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'netns', 'qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84 2016-03-24 18:42:12.709 1 ERROR neutron.agent.linux.utils [-] Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp [-] Unable to plug DHCP port for network 8e5d7a66-df5d-4e36-8446-3c2148e53f02. Releasing port. 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp Traceback (most recent call last): 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1234, in setup 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp mtu=network.get('mtu')) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 248, in plug 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp bridge, namespace, prefix, mtu) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 346, in plug_new 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp namespace_obj.add_device_to_namespace(ns_dev) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 216, in add_device_to_namespace 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp device.link.set_netns(self.namespace) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 514, in set_netns 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp self._as_root([], ('set', self.name, 'netns', namespace)) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 365, in _as_root 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp use_root_namespace=use_root_namespace) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 95, in _as_root 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp log_fail_as_error=self.log_fail_as_error) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 104, in _execute 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp log_fail_as_error=log_fail_as_error) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp raise RuntimeError(msg) 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp 2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp 2016-03-24 18:42:12.711 1 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL msg_id: 559dc40172904849a6cda4efebd85c38 exchange 'neutron' topic 'q-plugin' _send /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:454 2016-03-24 18:42:12.858 1 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 559dc40172904849a6cda4efebd85c38 __call__ /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:302 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 8e5d7a66-df5d-4e36-8446-3c2148e53f02. 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent Traceback (most recent call last): 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 112, in call_driver 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 208, in enable 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1240, in setup 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent self.plugin.release_dhcp_port(network.id, port.device_id) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent self.force_reraise() 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent six.reraise(self.type_, self.value, self.tb) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1234, in setup 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent mtu=network.get('mtu')) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 248, in plug 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent bridge, namespace, prefix, mtu) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 346, in plug_new 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent namespace_obj.add_device_to_namespace(ns_dev) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 216, in add_device_to_namespace 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent device.link.set_netns(self.namespace) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 514, in set_netns 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent self._as_root([], ('set', self.name, 'netns', namespace)) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 365, in _as_root 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent use_root_namespace=use_root_namespace) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 95, in _as_root 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent log_fail_as_error=self.log_fail_as_error) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 104, in _execute 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent log_fail_as_error=log_fail_as_error) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent raise RuntimeError(msg) 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent 2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent 2016-03-24 18:42:12.859 1 INFO neutron.agent.dhcp.agent [-] Finished network 8e5d7a66-df5d-4e36-8446-3c2148e53f02 dhcp configuration 2016-03-24 18:42:12.859 1 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete 2016-03-24 18:42:12.859 1 DEBUG oslo_concurrency.lockutils [-] Lock "dhcp-agent" released by "neutron.agent.dhcp.agent.sync_state" :: held 1.626s inner /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1561695/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

