** Also affects: cloud-archive
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1623664

Title:
  Race between L3 agent and neutron-ns-cleanup

Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Invalid

Bug description:
  I suspect a race between the neutron L3 agent and the neutron-netns-
  cleanup script, which runs as a CRON job in Ubuntu. Here's a stack
  trace in the router delete code path:

  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager [-] Error during 
notification for neutron.agent.metadata.driver.before_router_removed router, 
before_delete
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager Traceback (most 
recent call last):
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager   File 
"/usr/lib/python2.7/dist-packages/neutron/callbacks/manager.py", line 141, in 
_notify_loop
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager     
callback(resource, event, trigger, **kwargs)
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/metadata/driver.py", line 176, 
in before_router_removed
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager     
router.iptables_manager.apply()
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", 
line 423, in apply
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager     return 
self._apply()
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", 
line 431, in _apply
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager     return 
self._apply_synchronized()
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", 
line 457, in _apply_synchronized
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager     save_output 
= self.execute(args, run_as_root=True)
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 159, in 
execute
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager     raise 
RuntimeError(m)
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager RuntimeError:
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager Command: 
['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 
'netns', 'exec', 'qrouter-69ef3d5c-1ad1-42fb-8a1e-8d949837bbf8', 
'iptables-save']
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager Exit code: 1
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager Stdin:
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager Stdout:
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager Stderr: Cannot 
open network namespace "qrouter-69ef3d5c-1ad1-42fb-8a1e-8d949837bbf8": No such 
file or directory
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager
  2016-08-03 03:30:03.392 2595 ERROR neutron.callbacks.manager
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 69ef3d5c-1ad1-42fb-8a1e-8d949837bbf8
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 344, in 
_safe_router_removed
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent     
self._router_removed(router_id)
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 360, in 
_router_removed
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent     self, router=ri)
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/callbacks/registry.py", line 44, in 
notify
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent     
_get_callback_manager().notify(resource, event, trigger, **kwargs)
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/callbacks/manager.py", line 123, in 
notify
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent     raise 
exceptions.CallbackFailure(errors=errors)
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent CallbackFailure: 
Callback neutron.agent.metadata.driver.before_router_removed failed with "
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent Command: ['sudo', 
'/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 
'exec', 'qrouter-69ef3d5c-1ad1-42fb-8a1e-8d949837bbf8', 'iptables-save']
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent Exit code: 1
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent Stdin:
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent Stdout:
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent Stderr: Cannot open 
network namespace "qrouter-69ef3d5c-1ad1-42fb-8a1e-8d949837bbf8": No such file 
or directory
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent "
  2016-08-03 03:30:03.393 2595 ERROR neutron.agent.l3.agent

  In this case, the cleanup first deleted the qrouter namespace it found
  to be empty (not containing any netdevs other than lo). The router
  delete flow attempts to delete iptables rules within the namespace
  before deleting the namespace itself. However, if the namespace is
  deleted first, the iptables-save command on a non-existent namespace
  fails. The resulting exception prevents the router delete flow from
  succeeding and the L3 agent gets stuck in a failure loop.

  Can somebody confirm if this is a known issue or if I've misunderstood
  the problem. Assuming my analysis is correct, would the following fix
  work?

  diff --git a/neutron/agent/l3/agent.py b/neutron/agent/l3/agent.py
  index b096091..8d3e8ae 100644
  --- a/neutron/agent/l3/agent.py
  +++ b/neutron/agent/l3/agent.py
  @@ -358,8 +358,16 @@ class 
L3NATAgent(firewall_l3_agent.FWaaSL3AgentRpcCallback,
               self.namespaces_manager.ensure_router_cleanup(router_id)
               return
   
  -        registry.notify(resources.ROUTER, events.BEFORE_DELETE,
  -                        self, router=ri)
  +        try:
  +            registry.notify(resources.ROUTER, events.BEFORE_DELETE,
  +                            self, router=ri)
  +        except Exception as e:
  +            ns_err = "Cannot open network namespace qrouter-" + router_id
  +            if ns_err not in e:
  +                raise
  +            else:
  +                LOG.warn(_LW("Namespace for router %s already deleted"),
  +                             router_id)

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1623664/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to