[Expired for neutron because there has been no activity for 60 days.]
** Changed in: neutron
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1339045
Title:
Cannot open network namespace
Status in neutron:
Expired
Bug description:
While booting tripleo undercloud have occassionally (about 8 in 10
attempts) seen errors in neutron-dhcp-agent.log of:
2014-07-08 10:12:07.374 3905 ERROR neutron.agent.linux.utils [-]
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf',
'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip',
'link', 'set', 'tap4ac4d6ab-d7', 'up']
Exit code: 1
Stdout: ''
Stderr: 'Cannot open network namespace
"qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'
2014-07-08 10:12:07.419 3905 ERROR neutron.agent.linux.utils [-]
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf',
'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip',
'-o', 'link', 'show', 'tap4ac4d6ab-d7']
Exit code: 1
Stdout: ''
Stderr: 'Cannot open network namespace
"qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'
In all cases these ERRORs are preceeded by the ERROR:
2014-07-08 10:12:07.151 3905 ERROR neutron.agent.dhcp_agent
[req-523b2309-18c9-4a2d-ad65-7586db10f956 None] Failed reporting state!
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent Traceback (most
recent call last):
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/dhcp_agent.py",
line 576, in _report_state
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent
self.state_rpc.report_state(ctx, self.agent_state, self.use_call)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/rpc.py",
line 70, in report_state
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent return
self.call(context, msg, topic=self.topic)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py",
line 161, in call
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent context, msg,
rpc_method='call', **kwargs)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py",
line 185, in __call_rpc_method
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent return
func(context, msg['method'], **msg['args'])
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py",
line 150, in call
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent
wait_for_reply=True, timeout=timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/transport.py",
line 90, in _send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent
timeout=timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py",
line 412, in send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent return
self._send(target, ctxt, message, wait_for_reply, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py",
line 403, in _send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent result =
self._waiter.wait(msg_id, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py",
line 267, in wait
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent reply, ending
= self._poll_connection(msg_id, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File
"/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py",
line 217, in _poll_connection
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent % msg_id)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent MessagingTimeout:
Timed out waiting for a reply to message ID 21fd392ebb3a43228447f380334d05b0
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent
2014-07-08 10:12:07.157 3905 WARNING neutron.openstack.common.loopingcall [-]
task run outlasted interval by 30.119771 sec
If I then wait for the undercloud to fail and login to the seed host, I can
manually run the command that failed previously:
root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap
/etc/neutron/rootwrap.conf ip netns exec
'qdhcp-e967d587-5999-4333-8405-76b32398b121' ip link set tap4ac4d6ab-d7 up
Replicate the ERROR message by attempting to exec a command for a non
existent namespace..
------------------------------------------------------------------------------------------------------------------------------------
root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap
/etc/neutron/rootwrap.conf ip netns exec
'qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' ip link set tap4ac4d6ab-d7 up
Cannot open network namespace "qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx":
No such file or directory
The code section that returns the error (neutron/agent/linux/ip_lib.py - see
below) has a comment regarding the return of a RuntimeError if the device does
not exist - so this could be seen as expected behaviour
def ensure_device_is_ready(device_name, root_helper=None, namespace=None):
dev = IPDevice(device_name, root_helper, namespace)
try:
# Ensure the device is up, even if it is already up. If the device
# doesn't exist, a RuntimeError will be raised.
dev.link.set_up()
except RuntimeError:
return False
return True
- why does the device not exist? (probably a timing issue)
- why not wait until the device exists before attempting the operation?
what check could be put in place - prior to calling this function (or as
part of this function) that could check on the state of the device
would a retry be appropriate?
- Is the name of the function correct? - does not seem to
'ensure_device_is_ready' - more like "set_device_interface_up"?
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1339045/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp