Public bug reported:
The following introduces a 60s timeout to DHCP agent startups: https://github.com/openstack/neutron/commit/157e09e6af758b7669fbe5a8cdb0b1969f04661a #diff-3fcbcfeebb7de79a1cb36faed9b8b091 The value is not adjustable from conf. When there's enough network elements (ie. ~1200 DHCP enabled subnets in our case), nearly all DHCP startups fail with: 2019-10-09 13:21:27.826 694156 ERROR neutron.agent.linux.dhcp [-] Failed to start DHCP process for network 8b4b5496-8b35-482e-a2a3-7c352f1e343a: WaitTimeout: Timed out after 60 seconds Timeout happens due to operations happening in sequence with 100-300ms between each operation, and too many agents tried at the same time: 2019-10-09 12:24:17.836 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-13648243-5659-4094-9dd7-cee58e4d46ac', 'ip', '-o', 'link', 'show', 'tap3af1ee05-2f'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.075 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1407a320-8fca-4dfd-a011-96a2ad41779f', 'ip', '-o', 'link', 'show', 'tap21c443e1-89'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.266 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1412976d-4cd0-452a-91b7-7f8c3003c722', 'ip', '-o', 'link', 'show', 'tap1cb0995e-bd'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.541 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1433d8a6-fa06-4544-994a-d38b01302490', 'ip', '-o', 'link', 'show', 'tap00fcd6f1-b1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.735 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1447a8cf-e94c-4250-b9a8-2b13c0cf60c6', 'ip', '-o', 'link', 'show', 'tap91076dbc-83'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.930 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-14b396e1-d561-4087-990d-9b993cc08619', 'ip', '-o', 'link', 'show', 'tapdf767a28-af'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 The following allows the agents to start: - common_utils.wait_until_true(self._enable) + common_utils.wait_until_true(self._enable, timeout=300) Few ways to solve this issue: - Increase default timeout from 60s to a bigger number - Make the timeout dhcp conf adjustable - Figure out more optimal batch sizes of DHCP agents to be started at a time, and increase the startup performance ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1849676 Title: DHCP agents time out during startup at 60s when there is enough agents Status in neutron: New Bug description: The following introduces a 60s timeout to DHCP agent startups: https://github.com/openstack/neutron/commit/157e09e6af758b7669fbe5a8cdb0b1969f04661a #diff-3fcbcfeebb7de79a1cb36faed9b8b091 The value is not adjustable from conf. When there's enough network elements (ie. ~1200 DHCP enabled subnets in our case), nearly all DHCP startups fail with: 2019-10-09 13:21:27.826 694156 ERROR neutron.agent.linux.dhcp [-] Failed to start DHCP process for network 8b4b5496-8b35-482e- a2a3-7c352f1e343a: WaitTimeout: Timed out after 60 seconds Timeout happens due to operations happening in sequence with 100-300ms between each operation, and too many agents tried at the same time: 2019-10-09 12:24:17.836 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-13648243-5659-4094-9dd7-cee58e4d46ac', 'ip', '-o', 'link', 'show', 'tap3af1ee05-2f'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.075 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1407a320-8fca-4dfd-a011-96a2ad41779f', 'ip', '-o', 'link', 'show', 'tap21c443e1-89'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.266 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1412976d-4cd0-452a-91b7-7f8c3003c722', 'ip', '-o', 'link', 'show', 'tap1cb0995e-bd'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.541 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1433d8a6-fa06-4544-994a-d38b01302490', 'ip', '-o', 'link', 'show', 'tap00fcd6f1-b1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.735 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1447a8cf-e94c-4250-b9a8-2b13c0cf60c6', 'ip', '-o', 'link', 'show', 'tap91076dbc-83'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 2019-10-09 12:24:18.930 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-14b396e1-d561-4087-990d-9b993cc08619', 'ip', '-o', 'link', 'show', 'tapdf767a28-af'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 The following allows the agents to start: - common_utils.wait_until_true(self._enable) + common_utils.wait_until_true(self._enable, timeout=300) Few ways to solve this issue: - Increase default timeout from 60s to a bigger number - Make the timeout dhcp conf adjustable - Figure out more optimal batch sizes of DHCP agents to be started at a time, and increase the startup performance To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1849676/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

