** Description changed: [Impact] Currently, if an ovn controller node goes down and the node is in the list of the lrp gateways, then this is not removed from the list. This means it will still try and use this gateway, and fail, and go onto the - next one if this was at the top of the priority list. + next one if this was at the top of the priority list. Furthermore, we + could lose all the ovn controllers, which is a valid but unlikely + point, then there would be no gateways to use, and hence no networking + would be available across the platform. What the back-port does is that it removes the --restart from the systemd. This then removes the gateway from the list, so OVN will not try to use this gateway port. On top of this, by removing the --restart, it can affect upgrades of the ovn package, to avoid this we add the d/ovn-host.preinst that prevents - the runtime state of OVN at upgrade + the runtime state of OVN at upgrade. If traffic is flowing, and a + restart is initiated then some of the traffic could pause for a bit, + which would cause + + The change was originally introduced in lunar via the debian package + 22.12.0-3 which eventually became 23.03.1-1ubuntu0.23.04.1 in Lunar. So + all versions after Lunar already have this particular change. + + [Test Plan] * Install OpenStack using the default OpenStack bundles, ensure to have at least 2 or 3 nova-compute units. * Find the node that has the northbound database as well as the southbound database, and login to both of these. It could be that they are both on the same node. * Then run the following commands, with sample outputs ``` root@juju-71c67f-3-lxd-2:~# ovn-nbctl lr-list 8588bef1-b3fc-4097-87e7-2c34e26f4c69 (neutron-727e3681-8eb4-4025-aa54-046588a2c3ab) root@juju-71c67f-3-lxd-2:~# ovn-nbctl show 8588bef1-b3fc-4097-87e7-2c34e26f4c69 router 8588bef1-b3fc-4097-87e7-2c34e26f4c69 (neutron-727e3681-8eb4-4025-aa54-046588a2c3ab) (aka provider-router) port lrp-3e157281-4075-456d-bc17-61c6fcb5ff1d mac: "fa:16:3e:9b:be:95" networks: ["10.0.22.1/24"] port lrp-35c4d398-9a43-4fbf-9781-bc4a6d8d4a38 mac: "fa:16:3e:d0:25:3b" networks: ["192.168.21.1/24"] port lrp-7c725d10-8056-455d-a2d5-b84f33c16045 mac: "fa:16:3e:81:04:03" networks: ["192.168.1.44/24"] gateway chassis: [as3-maas-node-03.maas as2-maas-node-04.maas as4-maas-node-03.maas as4-maas-node-04.maas as3-maas-node-06.maas] nat 3121ce08-9899-495b-b385-5ecf17be0c1b external ip: "192.168.1.44" logical ip: "192.168.21.0/24" type: "snat" nat 8323629e-dbcc-4a45-84a4-a90aa3859cd7 external ip: "192.168.1.44" logical ip: "10.0.22.0/24" type: "snat" nat 8e54bb28-3418-4052-9dc8-3c4178225322 external ip: "192.168.1.42" logical ip: "10.0.22.187" type: "dnat_and_snat" root@juju-71c67f-3-lxd-2:~# ovn-nbctl lrp-get-gateway-chassis lrp-7c725d10-8056-455d-a2d5-b84f33c16045 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as3-maas-node-03.maas 5 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as3-maas-node-06.maas 4 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as4-maas-node-04.maas 3 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as4-maas-node-03.maas 2 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as2-maas-node-04.maas 1 ``` This shows that node as3-maas-node-03.maas has the highest priority * Now stop ovn-controller service on the the highest priority node, i.e. as3-maas-node-03.maas in this case * Now the output of the last command will still be the same. We should expect as3-maas-node-03.maas to be removed. * Now install the new package, and run through the same process, and we should see that the node is removed, and similar output to the one below should be seen. ``` root@juju-71c67f-3-lxd-2:~# ovn-nbctl lrp-get-gateway-chassis lrp-7c725d10-8056-455d-a2d5-b84f33c16045 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as3-maas-node-06.maas 5 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as4-maas-node-04.maas 4 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as4-maas-node-03.maas 3 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as2-maas-node-04.maas 2 lrp-7c725d10-8056-455d-a2d5-b84f33c16045_as1-maas-node-06.maas 1 ``` + On the upgrade scenario: + + * We will install the old package, and create multiple ports in openstack. + * We will get a dump of all the flows from the ovn-central and ovn-chassis + * We will then upgrade the relevant packages + * We will then double check the flows that the same flows are in tact, and the amount of time that the flows are up, is longer than the time when the package was upgfaded + + [Where problems could occur] The --restart flag was originally added the commit [1] & (LP: #1940043) to ensure that upgrades don't cause issues. So we could potentially have that issue. [1] https://git.launchpad.net/ubuntu/+source/ovn/commit/?h=import/21.09.0_git20210806.d08f89e21-0ubuntu1.1&id=d73df64c24f97b6133448b57cae8d82af51df1fe + with respect to having any regression on the bug above, the key thing + here is that when the package is upgraded, as mentioned in the test + scenario, we will double check the flows pre and post upgrade, and + ensure that the same flows exist, and the timestamp of these flows + suggest that these have not been recreated. + + A overall regression would mean, that the flows are re initialised, and + we would get a data plane outage, which we should expect with the update + we are handling here. + + [Other Info] Related LP: 1940043
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2150130 Title: --restart in ovn-controller leaves gateway ports active To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2150130/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
