[Expired for neutron because there has been no activity for 60 days.]
** Changed in: neutron
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1223369
Title:
Metadata ns proxy didn't start - pid already exist. Daemon already
running?
Status in neutron:
Expired
Bug description:
This failure happened just once. Levels are Ubuntu Raring 13.04, Grizzly
Quantum packages at 1:2013.1.2-0ubuntu1.
I noticed the metadata namespace proxy hadn't started after the network node
was booted. The l3-agent.log (was only at INFO) has:
2013-09-04 15:53:16 INFO [quantum.openstack.common.rpc.common] Connected
to AMQP server on 10.0.10.10:5672
2013-09-04 15:53:16 INFO [quantum.agent.l3_agent] L3 agent started
2013-09-04 15:53:28 ERROR [quantum.agent.l3_agent] Failed synchronizing
routers
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line
638, in _sync_routers_task
self._process_routers(routers, all_routers=True)
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line
618, in _process_routers
self._router_added(r['id'], r)
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line
236, in _router_added
self._spawn_metadata_proxy(ri)
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line
270, in _spawn_metadata_proxy
pm.enable(callback)
File
"/usr/lib/python2.7/dist-packages/quantum/agent/linux/external_process.py",
line 55, in enable
ip_wrapper.netns.execute(cmd)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/ip_lib.py", line
414, in execute
check_exit_code=check_exit_code)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/utils.py", line
61, in execute
raise RuntimeError(m)
RuntimeError:
Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ip',
'netns', 'exec', 'qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7',
'quantum-ns-metadata-proxy',
'--pid_file=/var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid',
'--router_id=fa2ec96d-d1f9-4af2-a022-cac171646aa7',
'--state_path=/var/lib/quantum', '--metadata_port=9697', '--verbose',
'--log-file=quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log',
'--log-dir=/var/log/quantum']
Exit code: 1
Stdout: ''
Stderr: '2013-09-04 15:53:28 INFO [quantum.common.config] Logging
enabled!\n2013-09-04 15:53:28 ERROR [quantum.agent.linux.daemon] Pidfile /var
/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already
exist. Daemon already running?\n'
And quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log has:
2013-08-29 19:04:04 INFO [quantum.common.config] Logging enabled!
2013-09-04 15:53:28 INFO [quantum.common.config] Logging enabled!
2013-09-04 15:53:28 ERROR [quantum.agent.linux.daemon] Pidfile
/var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already
exist. Daemon already running?
It is the same error message as
https://bugs.launchpad.net/neutron/+bug/1177416 - but the patch from that bug
was applied.
The file /lib/quantum/external/pids/fa2ec96d-
d1f9-4af2-a022-cac171646aa7.pid had 2045 in it - but no process with
pid 2045 was running when I checked - /proc/2045/ did not exist. The
pid file was stale as its date was that of the previous launch.
The process call chain in short-hand is like this:
l3-agent --> sudo rootwrap... --> python rootwrap ip netns exec qrouter-uuid
quantum-ns-metadata-proxy router_id=uuid... --> python
quantum-ns-metadata-proxy router_id=uuid...
Now the code in external_process.py either didn't find a
/proc/2045/cmdline, or if it did then that file did not have the
strings 'python' and 'fa2ec96d-d1f9-4af2-a022-cac171646aa7'. But the
code in daemon.py must have found a /proc/2045/cmdline and it must
have had those strings. The only explaination I can give for this is
that the python rootwrap process started by sudo just happened to get
pid 2045 that time, and this is what daemon.py is_running() found. Its
full command line would have looked like:
/usr/bin/python /usr/bin/quantum-rootwrap /etc/quantum/rootwrap.conf
ip netns exec qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7 quantum-ns-
metadata-proxy --pid_file=/var/lib/quantum/external/pids/fa2ec96d-
d1f9-4af2-a022-cac171646aa7.pid --router_id=fa2ec96d-
d1f9-4af2-a022-cac171646aa7 --state_path=/var/lib/quantum
--metadata_port=9697 --verbose --log-file=quantum-ns-metadata-
proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log --log-
dir=/var/log/quantum
It has the strings 'python' and the router's uuid, so it would have
matched. If my theory is right, then a possible fix would be to change
the checks to not report cmdlines with 'ip\x00netns\x00exec' as a
running daemon.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1223369/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp