*** This bug is a duplicate of bug 1439472 ***
https://bugs.launchpad.net/bugs/1439472
** This bug has been marked a duplicate of bug 1439472
OVS doesn't restart properly when Exception occurred
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1531210
Title:
Ovs agent loses OpenFlow rules if OVS gets restarted while Neutron is
disconnected from SQL
Status in neutron:
Confirmed
Bug description:
Flow to reproduce in Juno:
1. Node X has neutron-ovs-agent running
2. Neutron-server running ML2 plugin loses its connection to SQL server. At
this point neutron-ovs-agent is not aware to this, since it doesn't query
device properties.
3. OVS is restarted in the background of the neutron-ovs-agent.
4. The neutron-ovs-agent realizes that OVS was restarted since the CANARY
VALUE it placed in OpenFlow table 23 is missing.
5. The agent raises a local flag ovs_restarted and replaces the CANARY value
to signal it took care of the OVS restart in this iteration.
6. It runs through the OVS restart flow, which erases the OpenFlow rules
(again, this is Juno).
7. When accessing the Neutron server to in process_network_ports() it the
following SQL error which breaks this iteration:
########################################################################################
2015-12-28 08:49:07,075.075 35862 ERROR
neutron.plugins.openvswitch.agent.ovs_neutron_agent
[req-bea668e9-3c52-4535-a4f9-71a63dc538c4 None] process_network_ports -
iteration:41940 - failure while retrieving port details from server
2015-12-28 08:49:07,075.075 35862 TRACE
neutron.plugins.openvswitch.agent.ovs_neutron_agent Traceback (most recent call
last):
2015-12-28 08:49:07,075.075 35862 TRACE
neutron.plugins.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py",
line 1230, in process_network_ports
2015-12-28 08:49:07,075.075 35862 TRACE
neutron.plugins.openvswitch.agent.ovs_neutron_agent devices_added_updated,
ovs_restarted)
2015-12-28 08:49:07,075.075 35862 TRACE
neutron.plugins.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py",
line 1103, in treat_devices_added_or_updated
2015-12-28 08:49:07,075.075 35862 TRACE
neutron.plugins.openvswitch.agent.ovs_neutron_agent raise
DeviceListRetrievalError(devices=devices, error=e)
2015-12-28 08:49:07,075.075 35862 TRACE
neutron.plugins.openvswitch.agent.ovs_neutron_agent DeviceListRetrievalError:
Unable to retrieve port details for devices:
set([u'918890c7-cbfd-4a3f-bb2c-030e0f5ded5b',
u'9c8c6b21-4baa-4c7e-b2ac-9772a7653da9',
u'dded408d-65e7-4adf-8490-3ba78e1496b0',
u'9aec4ec1-5921-40f5-8db7-fec3635511ce',
u'545ba077-e2ab-434b-a696-bf0bc8874dcb',
u'9a03a23c-2ae9-422c-a8da-2578134001bb',
u'b62aa4db-819c-4941-a457-8c19a9897e66',
u'a47ff11b-0c57-435e-ac5e-4348dccd6f0f',
u'55defa8f-016f-46b1-b240-5825bc282571']) because of error: Remote error:
OperationalError (_mysql_exceptions.OperationalError) (1047, 'WSREP has not yet
prepared node for application use')
2015-12-28 08:49:07,075.075 35862 TRACE
neutron.plugins.openvswitch.agent.ovs_neutron_agent [u'Traceback (most recent
call last):\n', u' File
"/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 134,
in _dispatch_and_reply\n incoming.message))\n', u' File
"/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 177,
in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u'
File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line
123, in _do_dispatch\n result = getattr(endpoint, method)(ctxt,
**new_args)\n', u' File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 115, in
get_devices_details_list\n for device in kwargs.pop(\'devices\', [])\n', u'
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 92, in
get_device_details\n host)\n', u' File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1127, in
update_port_status\n
updated = True\n', u' File "/usr/lib64/python2.7/contextlib.py", line 24, in
__exit__\n self.gen.next()\n', u' File
"/usr/lib64/python2.7/contextlib.py", line 121, in nested\n if
exit(*exc):\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 502, in
__exit__\n self.rollback()\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 63,
in __exit__\n compat.reraise(type_, value, traceback)\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 502, in
__exit__\n self.rollback()\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 423, in
rollback\n transaction._rollback_impl()\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 461, in
_rollback_impl\n t[1].rollback()\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1563, in
rollback\n self._do_rollback()\n', u' File "/usr/lib64/p
ython2.7/site-packages/sqlalchemy/engine/base.py", line 1601, in
_do_rollback\n self.connection._rollback_impl()\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 670, in
_rollback_impl\n self._handle_dbapi_exception(e, None, None, None, None)\n',
u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line
1334, in _handle_dbapi_exception\n self._autorollback()\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 791, in
_autorollback\n self._root._rollback_impl()\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 670, in
_rollback_impl\n self._handle_dbapi_exception(e, None, None, None, None)\n',
u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line
1266, in _handle_dbapi_exception\n exc_info\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in
raise_from_cause\n reraise(type(exception), excep
tion, tb=exc_tb)\n', u' File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 668, in
_rollback_impl\n self.engine.dialect.do_rollback(self.connection)\n', u'
File "/usr/lib64/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py",
line 2524, in do_rollback\n dbapi_connection.rollback()\n',
u"OperationalError: (_mysql_exceptions.OperationalError) (1047, 'WSREP has not
yet prepared node for application use')\n"].
########################################################################################
8. The error described in #7 happens till Neutron-server restores connection
to SQL server.
9. When SQL is restored in the next iteration, the agent manages to get the
ports data from server but it lost the ovs_restarted flag, which was in the
scope of a previous iteration. Therefore it skips the provision_local_vlan()
and the OpenFlow rules are never retrieved.
A possible solution is to put CANARY value in the end of the iteration
that found it missing.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1531210/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp