Public bug reported: We have seen regular issues with the neutron-bgp-dragent service when one or more network nodes fail or are undergoing maintenance.
In the most problematic case, we have a deployment with four network nodes. Each of these runs a neutron-bgp-dragent process, and each is associated with the same BGP speaker. When one of these network nodes goes down, a short time later a cleanup process runs, but rather than just removing the speaker association from the absent network node, they are removed from all but one of them. During this process, the running neutron-bgp-dragent processes report errors such as the following (observed using the latest neutron-dynamic- routing code from stable/victoria): "Unable to sync BGP speaker state.: RuntimeError: dictionary changed size during iteration" or Sep 15 13:36:26 neutron-bgp-dragent[1308396]: 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server [req-094dad10-b4da-4c50-8e32-f7814d446705 - - - - -] Exception during message handling: TypeError: unhashable type: 'dict' 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 185, in bgp_speaker_create_end 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_helper(bgp_speaker_id) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 249, in add_bgp_speaker_helper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_on_dragent(bgp_speaker) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 344, in add_bgp_speaker_on_dragent 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.cache.put_bgp_speaker(bgp_speaker) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 582, in put_bgp_speaker 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.remove_bgp_speaker_by_id(self.cache[bgp_speaker['id']]) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 600, in remove_bgp_speaker_by_id 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server if bgp_speaker_id in self.cache: 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server TypeError: unhashable type: 'dict' 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server This issue appears to match a comment here: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675/3#message-c409a4fb83a44216e03a041921c7067f44eb70d0 We will test out https://review.opendev.org/c/openstack/neutron-dynamic- routing/+/780675 as in our case the automatic behaviour appears mostly unnecessary, but a fix for the underlying issue would still be appreciated. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1943725 Title: Automatic cleanup of BGP speakers is too aggressive Status in neutron: New Bug description: We have seen regular issues with the neutron-bgp-dragent service when one or more network nodes fail or are undergoing maintenance. In the most problematic case, we have a deployment with four network nodes. Each of these runs a neutron-bgp-dragent process, and each is associated with the same BGP speaker. When one of these network nodes goes down, a short time later a cleanup process runs, but rather than just removing the speaker association from the absent network node, they are removed from all but one of them. During this process, the running neutron-bgp-dragent processes report errors such as the following (observed using the latest neutron- dynamic-routing code from stable/victoria): "Unable to sync BGP speaker state.: RuntimeError: dictionary changed size during iteration" or Sep 15 13:36:26 neutron-bgp-dragent[1308396]: 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server [req-094dad10-b4da-4c50-8e32-f7814d446705 - - - - -] Exception during message handling: TypeError: unhashable type: 'dict' 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 185, in bgp_speaker_create_end 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_helper(bgp_speaker_id) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 249, in add_bgp_speaker_helper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_on_dragent(bgp_speaker) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 344, in add_bgp_speaker_on_dragent 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.cache.put_bgp_speaker(bgp_speaker) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 582, in put_bgp_speaker 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.remove_bgp_speaker_by_id(self.cache[bgp_speaker['id']]) 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 600, in remove_bgp_speaker_by_id 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server if bgp_speaker_id in self.cache: 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server TypeError: unhashable type: 'dict' 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server This issue appears to match a comment here: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675/3#message-c409a4fb83a44216e03a041921c7067f44eb70d0 We will test out https://review.opendev.org/c/openstack/neutron- dynamic-routing/+/780675 as in our case the automatic behaviour appears mostly unnecessary, but a fix for the underlying issue would still be appreciated. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1943725/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

