Public bug reported: For ml2/OVN live-migration doesn't work. After spending some time debugging this issue I found that its potentially more complicated and not related to OVN intself.
Here is the full story behind not working live-migration while using OVN in latest u/s master. To speedup live-migration double-binding was introduced in neutron [1] and nova [2]. It implements this blueprint [3]. In short words it creates double binding (ACTIVE and INACTIVE) to verify if network bind is possible to be done on destination host and then starts live-migration (to not waste time in case of rollback). This mechanism started to be default in Stein [4]. So before actual qemu live-migration neutron should send 'network-vif-plugged' to nova and then migration is being run. While using OVN this mechanism doesn't work. Notification 'network-vif- plugged' is not being send so live-migration is stuck at the beginning. Lets check how those notifications are send. On every change of 'status' field (sqlalchemy event) in neutron.ports row [5] function [6] is executed and it is responsible for sending 'network-vif-unplugged' and 'network-vif-plugged' notifications. During pre_live_migration tasks two bindings and bindings levels are created. At the end of this process I found that commit_port_binding() is executed [7]. At this time neutron port status in the db is DOWN. I found that at the end of commit_port_binding() [8] after neutron_lib.callbacks.registry notification is send the port status moves to UP. For ml2/OVN it stays DOWN. This is the first difference that I found between ml2/ovs and ml2/ovn. After a bit digging I figured out how 'network-vif-plugged' is triggered in ml2/ovs. Lets see how this is done. 1. On list of registered callbacks in ml2/ovs [8] we have configured callback from class ovo_rpc._ObjectChangeHandler [9] and at the end of commit_port_binding() this callback is used. ------------------------------------------------------------- neutron.plugins.ml2.ovo_rpc._ObjectChangeHandler.handle_event ------------------------------------------------------------- 2. It is responsible for pushing new port object revisions to agents, like: ---------------------------------------------------------------------------- Jun 24 10:01:01 test-migrate-1 neutron-server[3685]: DEBUG neutron.api.rpc.handlers.resources_rpc [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Pushing event updated for resources: {'Port': ['ID=3704a567-ef4c-4f6d-9557-a1191de07c4a,revision_number=10']} {{(pid=3697) push /opt/stack/neutron/neutron/api/rpc/handlers/resources_rpc.py:243}} ---------------------------------------------------------------------------- 3. OVS agent consumes it and sends back RPC to the neutron server that port is actually UP (on source node!): ------------------------------------------------------------------------------------------------------------ Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.agent.resource_cache [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Resource Port 3704a567-ef4c-4f6d-9557-a1191de07c4a updated (revision_number 8->10). Old fields: {'status': u'ACTIVE', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='INACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59), PortBindingLevel(driver='openvswitch',host='test-migrate-2',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} New fields: {'status': u'DOWN', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='INACTIVE',vif_details=None,vif_type='unbound',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} {{(pi Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: d=18660) record_resource_update /opt/stack/neutron/neutron/agent/resource_cache.py:186}} ... Jun 24 10:01:02 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9daaf112-57f4-49bb-8390-4b65a5c5e674 None None] Setting status for 3704a567-ef4c-4f6d-9557-a1191de07c4a to UP {{(pid=18660) _bind_devices /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1088}} ------------------------------------------------------------------------------------------------------------ 4. Neutron server consumes it: ------------------------------------------------------------------------------------------------------------ Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.plugins.ml2.rpc [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Device 3704a567-ef4c-4f6d-9557-a1191de07c4a up at agent ovs-agent-test-migrate-1 {{(pid=3698) update_device_up /opt/stack/neutron/neutron/plugins/ml2/rpc.py:269}} ... Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning for port 3704a567-ef4c-4f6d-9557-a1191de07c4a completed by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:133}} ... Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning complete for port 3704a567-ef4c-4f6d-9557-a1191de07c4a triggered by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:140}} ------------------------------------------------------------------------------------------------------------ and then generates internal event "PROVISIONING_COMPLETE" [10]. This event is consumed by [11] and port_provisioned() updates port status in the DB to UP [12]. At the end it emits notification 'network-vif- plugged' and nova continues migration. In ml2/ovn we don't have agents, so we don't use ovo_rpc. That's why migration for ml2/ovn doesn't work. It looks like general bug somewhere between nova and neutron. Neutron shouldn't send notification 'network-vif-plug' during configuration of double binding from source host like it is now (paragraph 3.) Maybe we could consider using some more sophisticated names, like 'neutron-vif-inactive-binding-set'? Maybe nova could watch for inactive binding being created [13] and then start live-migration instead waiting for neutron notification? Thanks, Maciej [1] https://review.opendev.org/#/q/topic:bp/live-migration-portbinding+(status:open+OR+status:merged) [2] https://review.opendev.org/#/c/558001/ [3] https://blueprints.launchpad.net/nova/+spec/neutron-new-port-binding-api [4] https://review.opendev.org/#/c/635360/ [5] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/db_base_plugin_v2.py#L173 [6] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/notifiers/nova.py#L182 [7] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L505 [8] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L713 [9] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/ovo_rpc.py#L51 [10] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/provisioning_blocks.py#L140 [11] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L285 [12] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L316 [13] https://specs.openstack.org/openstack/neutron-specs/specs/backlog/pike/portbinding_information_for_nova.html#list-bindings ** Affects: networking-ovn Importance: Undecided Status: New ** Affects: nova Importance: Undecided Status: New ** Affects: neutron (Ubuntu) Importance: Undecided Status: New ** Also affects: nova Importance: Undecided Status: New ** Also affects: networking-ovn Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834045 Title: Live-migration double binding doesn't work with OVN To manage notifications about this bug go to: https://bugs.launchpad.net/networking-ovn/+bug/1834045/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs