Reviewed: https://review.openstack.org/469231 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=73701bf75b964509c7d7e8b62dba97f7cbe9c87a Submitter: Jenkins Branch: master
commit 73701bf75b964509c7d7e8b62dba97f7cbe9c87a Author: Ihar Hrachyshka <[email protected]> Date: Tue May 30 19:42:16 2017 +0000 ovs: bubble up failures into main thread in native ofctl mode When native ofctl interface is used (the default), the agent main() is running in a separate gevent thread. Unless we explicitly request from ryu to raise errors that may have happened in the agent app, it will ignore them (only logging a warning message). This may interfere with service management software like systemd that may use the return code to decide whether to restart the dead service. This patch makes ryu raise any uncaught errors happening inside the agent. It also makes the agent 'wrapper' helper function not to swallow raised exceptions on logging the error. Those two changes combined make the agent exit with rc=1 if an exception happens inside the main() function when in native mode. This patch doesn't include any unit tests because those would be very silly (like checking that we indeed pass the needed arguments to ryu). Change-Id: Ic86b5eeae25a916c3c51f21e6820f5b0212dd5f8 Closes-Bug: #1694505 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1694505 Title: neutron-ovs-agent dies with return code 0 when neutron-server is down Status in neutron: Fix Released Bug description: Environment description: - Deployment using RDO Trunk repo from master. - Neutron based on commit c430e9b In neutron-ovs-agent is started before neutron-server starts, it exits with return code 0, which is not identified by systemd as a failure so it's not restarted. following ERRORS appear in /var/log/neutron/openvswitch-agent.log: 2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met hod bulk_pull called with arguments (<neutron_lib.context.Context object at 0x75ff950>, 'Port') {} wrapper /usr/lib/python2.7/site-packages/oslo_log/helpers.py:47 2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202 .... 2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of an exception ... 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 'to message ID %s' % msg_id) 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp MessagingTimeout: Timed out waiting for a reply to message ID 3874905892f543e0be9984e6504644bb 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=29502 From systemd side, following status is reported: [root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent ● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled) Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago Main PID: 29042 (code=exited, status=0/SUCCESS) May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch Agent... May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-arptables = 1 May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-iptables = 1 May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-ip6tables = 1 May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch Agent. May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be reg...te reports. May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load neutron.openstack.common.notifier.rpc_notifier Note the (code=exited, status=0/SUCCESS) A easy way to reproduce this is: 1. Stop neutron-server 2. Start manually neutron-openvswitch-agent: # /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports. Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". Could not load neutron.openstack.common.notifier.rpc_notifier [root@weirdo1 neutron]# echo $? 0 Note return code is 0 I'd say this is a bug in ovs agent which should exit with rc!=0 so that systemd service restart it again based on "Restart=on-failure" current policy. Otherwise we should change systemd restart policy. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1694505/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

