I've been able to track down what I believe is the root problem.

If ovsdb-server (run by the openvswitch-switch service) restarts, the
neutron-openvswitch-agent loses its connection and needs to be manually
restarted in order to reconnect.

Causes of this bug I've seen have included ovsdb-server segfaulting,
being kill -9ed, and being gracefully restarted with "service
openvswitch-switch restart".

The errors recorded in /var/log/upstart/neutron-openvswitch-agent.log
vary depending on why ovsdb-server went away:

2014-03-23 20:10:01.883 20375 ERROR neutron.agent.linux.ovsdb_monitor 
[req-a776b981-b86b-4437-ab65-0c6be6070094 None] Error received from ovsdb 
monitor: ovsdb-client: unix:/var/run/openvswitch/db.sock: receive failed (End 
of file)
2014-03-24 01:40:17.617 20375 ERROR neutron.agent.linux.ovsdb_monitor 
[req-a776b981-b86b-4437-ab65-0c6be6070094 None] Error received from ovsdb 
monitor: 2014-03-24T01:40:17Z|00001|fatal_signal|WARN|terminating with signal 
15 (Terminated)
2014-03-24 04:08:59.718 8455 ERROR neutron.agent.linux.ovsdb_monitor 
[req-d2c2cbd5-a77a-4455-84ac-0a8ec69b41e8 None] Error received from ovsdb 
monitor: ovsdb-client: unix:/var/run/openvswitch/db.sock: receive failed (End 
of file)
2014-03-24 22:44:22.174 8455 ERROR neutron.agent.linux.ovsdb_monitor 
[req-d2c2cbd5-a77a-4455-84ac-0a8ec69b41e8 None] Error received from ovsdb 
monitor: ovsdb-client: unix:/var/run/openvswitch/db.sock: receive failed (End 
of file)
2014-03-24 22:44:52.220 8455 ERROR neutron.agent.linux.ovsdb_monitor 
[req-d2c2cbd5-a77a-4455-84ac-0a8ec69b41e8 None] Error received from ovsdb 
monitor: ovsdb-client: failed to connect to "unix:/var/run/openvswitch/db.sock" 
(Connection refused)
2014-03-24 22:45:22.266 8455 ERROR neutron.agent.linux.ovsdb_monitor 
[req-d2c2cbd5-a77a-4455-84ac-0a8ec69b41e8 None] Error received from ovsdb 
monitor: ovsdb-client: failed to connect to "unix:/var/run/openvswitch/db.sock" 
(Connection refused)
2014-03-24 22:45:52.310 8455 ERROR neutron.agent.linux.ovsdb_monitor 
[req-d2c2cbd5-a77a-4455-84ac-0a8ec69b41e8 None] Error received from ovsdb 
monitor: ovsdb-client: failed to connect to "unix:/var/run/openvswitch/db.sock" 
(Connection refused)
2014-03-24 22:46:22.355 8455 ERROR neutron.agent.linux.ovsdb_monitor 
[req-d2c2cbd5-a77a-4455-84ac-0a8ec69b41e8 None] Error received from ovsdb 
monitor: ovsdb-client: failed to connect to "unix:/var/run/openvswitch/db.sock" 
(Connection refused)
2014-03-24 22:49:27.179 8455 ERROR neutron.agent.linux.ovsdb_monitor 
[req-d2c2cbd5-a77a-4455-84ac-0a8ec69b41e8 None] Error received from ovsdb 
monitor: 2014-03-24T22:49:27Z|00001|fatal_signal|WARN|terminating with signal 
15 (Terminated)
2014-03-24 22:55:45.441 16033 ERROR neutron.agent.linux.ovsdb_monitor 
[req-5fe682ce-138e-46d6-aa7e-f0d43ab576ee None] Error received from ovsdb 
monitor: ovsdb-client: unix:/var/run/openvswitch/db.sock: receive failed (End 
of file)

In all cases, the result is the same: until neutron-openvswitch-agent is
restarted, no traffic is passed onto the tapXXXXX interface inside the
dhcp-XXXXX netns

** Also affects: neutron
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1290486

Title:
  dhcp agent not serving responses

Status in OpenStack Neutron (virtual network service):
  New
Status in tripleo - openstack on openstack:
  In Progress

Bug description:
  The DHCP requests were not being responded to after they were seen on
  the undercloud network interface.  The neutron services were restarted
  in an attempt to ensure they had the newest configuration and knew
  they were supposed to respond to the requests.

  Rather than using the heat stack create (called in
  devtest_overcloud.sh) to test, it was simple to use the following to
  directly boot a baremetal node.

      nova boot --flavor $(nova flavor-list | grep 
"|[[:space:]]*baremetal[[:space:]]*|" | awk '{print $2}) \
            --image $(nova image-list | grep 
"|[[:space:]]*overcloud-control[[:space:]]*|" | awk '{print $2}') \
            bm-test1

  Whilst the baremetal node was attempting to pxe boot a restart of the
  neutron services was performed.  This allowed the baremetal node to
  boot.

  It has been observed that a neutron restart was needed for each
  subsequent reboot of the baremetal nodes to succeed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1290486/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to