Public bug reported:

In our environment, we have some large compute nodes with a large number
of VIFs.  When the update_device_list call happens on the agent start
up:

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L842

This takes a very long time as it seems to loop on each port at the
server side, contact Nova and much more. The default rpc timeout of 60
seconds is not enough and it ends up failing on a server with around 120
VIFs.  When raising the timeout to 120, it seems to work with no
problems.

2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-1e6cc46d-eb52-4d99-bd77-bf2e8424a1ea - - - - -] Error while processing VIF 
ports
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most 
recent call last):
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1752, in rpc_loop
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
ovs_restarted)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1507, in process_network_ports
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
self._bind_devices(need_binding_devices)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 847, in _bind_devices
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
self.conf.host)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 179, in 
update_device_list
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
agent_id=agent_id, host=host)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in 
call
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
retry=self.retry)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in 
_send
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
timeout=timeout, retry=retry)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
431, in send
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=retry)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
420, in _send
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = 
self._waiter.wait(msg_id, timeout)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
318, in wait
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     message = 
self.waiters.get(msg_id, timeout=timeout)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
223, in get
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     'to message 
ID %s' % msg_id)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
MessagingTimeout: Timed out waiting for a reply to message ID 
c42c1ffc801b41ca89aa4472696bbf1a

I don't think that an RPC call should ever take that long, the neutron-
server is not loaded or anything and adding new ones doesn't seem to
resolve it, due to the fact a single RPC responder answers this.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1528895

Title:
  Timeouts in update_device_list (too slow with large # of VIFs)

Status in neutron:
  New

Bug description:
  In our environment, we have some large compute nodes with a large
  number of VIFs.  When the update_device_list call happens on the agent
  start up:

  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L842

  This takes a very long time as it seems to loop on each port at the
  server side, contact Nova and much more. The default rpc timeout of 60
  seconds is not enough and it ends up failing on a server with around
  120 VIFs.  When raising the timeout to 120, it seems to work with no
  problems.

  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-1e6cc46d-eb52-4d99-bd77-bf2e8424a1ea - - - - -] Error while processing VIF 
ports
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most 
recent call last):
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1752, in rpc_loop
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
ovs_restarted)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1507, in process_network_ports
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
self._bind_devices(need_binding_devices)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 847, in _bind_devices
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
self.conf.host)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 179, in 
update_device_list
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
agent_id=agent_id, host=host)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in 
call
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
retry=self.retry)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in 
_send
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     
timeout=timeout, retry=retry)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
431, in send
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=retry)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
420, in _send
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = 
self._waiter.wait(msg_id, timeout)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
318, in wait
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     message = 
self.waiters.get(msg_id, timeout=timeout)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
223, in get
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     'to message 
ID %s' % msg_id)
  2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
MessagingTimeout: Timed out waiting for a reply to message ID 
c42c1ffc801b41ca89aa4472696bbf1a

  I don't think that an RPC call should ever take that long, the
  neutron-server is not loaded or anything and adding new ones doesn't
  seem to resolve it, due to the fact a single RPC responder answers
  this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1528895/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to