On Thu, Apr 14, 2022 at 4:54 PM Ihar Hrachyshka <[email protected]> wrote: > > On 4/13/22 11:36 AM, Numan Siddique wrote: > > On Tue, Apr 12, 2022 at 8:06 PM Ihar Hrachyshka <[email protected]> wrote: > >> There's a discussion to have on the intended behavior of a port with 2+ > >> chassis when the switch is attached to localnet. > >> > >> In general, the patch series makes all chassis switch to tunneling when > >> multiple chassis are set. (See patch 11/15 "Clone packets to both port > >> chassis" of the series.) This is done to guarantee delivery and to clone > >> packets between multiple locations using smart flow rules. But because > >> interfaces used to transfer tunneled packets may have different MTU > >> properties from localnet, patch 14/15 "Allow to disable tunneling > >> enforcement for multi-chassis port" introduces an option to enforce > >> using localnet port to communicate between chassis. > >> > >> The question arises of the intended behavior when localnet communication > >> is enforced. The problem is that each port binding location will > >> generate ARP replies (gratuitous or otherwise) for the same MAC address, > >> that will then be analyzed by upstream switches to update their ARP > >> tables. Because of this, upstream switches will flip from one location > >> to another. This is not optimal / wrong. > >> > >> We could designate one chassis as "main" (first in requested-chassis > >> list) and generate gratuitous ARPs on this chassis only. But any packet > >> generated by any other chassis hosting the same port binding will still > >> make the upstream switch update its ARP tables, still making the port flip. > >> > >> With that in mind, 1) if there is a way to avoid flipping problems while > >> hosting the same MAC address in two locations that I am not aware of? > >> and 2) if not, do we really want to support localnet connectivity for 2+ > >> chassis port bindings [as implemented in this patch]? > >> > >> There is also an option to not support 2+ chassis for localnet attached > >> logical switches. This would be unfortunate because in environments > >> where MTU discrepancy is not an issue the usual approach of switching to > >> tunneling works fine. > >> > >> Thoughts? > > OVN supports the option - ovn-chassis-mac-mappings (which allows to > > configure a unique mac for each chassis) > > and this option is used to support routing between provider logical > > switches. > > > > Can we make use of this here ? i.e when the VM sends out the ARP/GARP > > ovn-controller will > > modify the source mac to the chassis unique mac before sending out via > > the patch port. > > I'm not sure if it's feasible or not. Probably you can explore and see. > > > I may misunderstand the suggestion, please confirm if my understanding > is right. > > AFAIU the ovn-chassis-mac-mappings option is used to rewrite the MAC > address before sending a packet through localnet. It is used for > distributed gw ports where only one chassis is active at any given point > in time. In case of multi-chassis binding, 2+ chassis are actively > hosting the same binding (with the same MAC address). Even if they > rewrite to unique MAC addresses, they will announce the same IP address > as located on 2+ different upstream switch' ports. This will make it flip.
Ok. I guess when an arp request is sent to the ip, the requester will receive 2 arp replies with different macs. Ideally the physical switches should not see L3 right? Or they do ? > > > > Meanwhile I'd suggest not supporting 2+ chassis for localnet attached > > logical switches to start with > > and document it. > > > We could either ignore additional chassis set in requested-chassis, or > we could still make an effort to make it work by enforcing tunneling for > localnet attached LSs, while documenting the MTU requirements demanded > when using 2+ requested-chassis for localnet attached switches. > > In case of live migration (the primary use case I'm trying to tackle > with the series), the ability to simultaneously stand up the same > binding on 2 chassis allows to shorten the time of port migration / > network downtime of a migrated VM. Plus untangle the destination port > configuration from CMS -> database persistence -> southbound conversion > latency. Not leaving a way to support 2 chassis+ binding for > localnet-attached LS means that live migration behavior will be > inconsistent, even for environments that don't have the MTU discrepancy > between localnet and tunneling interfaces. Ok. It seems fine to me then to support localnet switches using tunnelling. Thanks Numan > > On the other hand, if CMS wants to have the current behavior, it may > simply continue setting requested-chassis to a single chassis name. In > this context, maybe the knob that is added by 14/15 patch is not > required - CMS may just avoid using the new feature if MTU is a risk. > > Thoughts? > > > > Thanks > > Numan > > > > > > > >> Ihar > >> > >> On 3/29/22 8:47 PM, Ihar Hrachyshka wrote: > >>> When chassis-mirroring-enabled is set, controller will not enforce > >>> tunneling for localnet-attached switches. This may be useful when the > >>> network used to deliver tunnel packets doesn't have MTU headway to > >>> redirect all packets originally sent through localnet port. > >>> > >>> Signed-off-by: Ihar Hrachyshka <[email protected]> > >>> --- > >>> controller/physical.c | 12 ++++++-- > >>> ovn-nb.xml | 11 ++++++++ > >>> ovn-sb.xml | 11 ++++++++ > >>> tests/ovn.at | 66 ++++++++++++++++++++++++++++++++++++++----- > >>> 4 files changed, 90 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/controller/physical.c b/controller/physical.c > >>> index 02b771f3b..df5fe8cb3 100644 > >>> --- a/controller/physical.c > >>> +++ b/controller/physical.c > >>> @@ -1257,6 +1257,9 @@ consider_port_binding(struct ovsdb_idl_index > >>> *sbrec_port_binding_by_name, > >>> } > >>> } > >>> > >>> + bool mirroring_enabled = smap_get_bool( > >>> + &binding->options, "chassis-mirroring-enabled", true); > >>> + > >>> bool is_ha_remote = false; > >>> const struct chassis_tunnel *tun = NULL; > >>> const struct sbrec_port_binding *localnet_port = > >>> @@ -1266,7 +1269,8 @@ consider_port_binding(struct ovsdb_idl_index > >>> *sbrec_port_binding_by_name, > >>> is_remote = true; > >>> /* Enforce tunneling while we clone packets to additional > >>> chassis b/c > >>> * otherwise upstream switch won't flood the packet to both > >>> chassis. */ > >>> - if (localnet_port && !binding->additional_chassis) { > >>> + if (localnet_port && > >>> + (!binding->additional_chassis || !mirroring_enabled)) { > >>> ofport = u16_to_ofp(simap_get(patch_ofports, > >>> > >>> localnet_port->logical_port)); > >>> if (!ofport) { > >>> @@ -1316,8 +1320,10 @@ consider_port_binding(struct ovsdb_idl_index > >>> *sbrec_port_binding_by_name, > >>> } > >>> > >>> /* Clone packets to additional chassis if needed. */ > >>> - additional_tuns = get_additional_tunnels(binding, chassis, > >>> - chassis_tunnels); > >>> + if (mirroring_enabled) { > >>> + additional_tuns = get_additional_tunnels(binding, chassis, > >>> + chassis_tunnels); > >>> + } > >>> > >>> if (!is_remote) { > >>> /* Packets that arrive from a vif can belong to a VM or > >>> diff --git a/ovn-nb.xml b/ovn-nb.xml > >>> index 3fd6c5140..c3ca7ba36 100644 > >>> --- a/ovn-nb.xml > >>> +++ b/ovn-nb.xml > >>> @@ -1022,6 +1022,17 @@ > >>> additional chassis that are allowed to bind the same port. > >>> </column> > >>> > >>> + <column name="options" key="chassis-mirroring-enabled"> > >>> + When multiple chassis are set for the port, and the logical > >>> switch is > >>> + connected to an external network through a > >>> <code>localnet</code> > >>> + port, tunneling is enforced for the port to guarantee delivery > >>> of > >>> + packets directed to the port to all its locations. This has MTU > >>> + implications because the network used for tunneling must have > >>> headway > >>> + for additional tunnel headers attached to packets. If the > >>> network is > >>> + not capable to deliver packets with additional tunnel headers, > >>> then > >>> + tunneling enforcement can be disabled using this option. > >>> + </column> > >>> + > >>> <column name="options" key="activation-strategy"> > >>> If set and used with <ref > >>> column="requested-additional-chassis"/>, > >>> specifies an activation strategy for the additional chassis. > >>> By > >>> diff --git a/ovn-sb.xml b/ovn-sb.xml > >>> index e8f91f351..dc276916a 100644 > >>> --- a/ovn-sb.xml > >>> +++ b/ovn-sb.xml > >>> @@ -3285,6 +3285,17 @@ tcp.flags = RST; > >>> chassis that are allowed to bind the same port. > >>> </column> > >>> > >>> + <column name="options" key="chassis-mirroring-enabled"> > >>> + When multiple chassis are set for the port, and the logical > >>> switch is > >>> + connected to an external network through a <code>localnet</code> > >>> + port, tunneling is enforced for the port to guarantee delivery of > >>> + packets directed to the port to all its locations. This has MTU > >>> + implications because the network used for tunneling must have > >>> headway > >>> + for additional tunnel headers attached to packets. If the > >>> network is > >>> + not capable to deliver packets with additional tunnel headers, > >>> then > >>> + tunneling enforcement can be disabled using this option. > >>> + </column> > >>> + > >>> <column name="options" key="activation-strategy"> > >>> If used with multiple chassis set in <ref > >>> column="requested-chassis"/>, > >>> specifies an activation strategy for all additional chassis. By > >>> diff --git a/tests/ovn.at b/tests/ovn.at > >>> index c8992608e..760a4fd58 100644 > >>> --- a/tests/ovn.at > >>> +++ b/tests/ovn.at > >>> @@ -14227,10 +14227,12 @@ check ovn-nbctl lsp-add ls0 first > >>> check ovn-nbctl lsp-add ls0 second > >>> check ovn-nbctl lsp-add ls0 third > >>> check ovn-nbctl lsp-add ls0 migrator > >>> +check ovn-nbctl lsp-add ls0 migrator-no-cloning > >>> check ovn-nbctl lsp-set-addresses first "00:00:00:00:00:01 10.0.0.1" > >>> check ovn-nbctl lsp-set-addresses second "00:00:00:00:00:02 10.0.0.2" > >>> check ovn-nbctl lsp-set-addresses third "00:00:00:00:00:03 10.0.0.3" > >>> check ovn-nbctl lsp-set-addresses migrator "00:00:00:00:00:ff > >>> 10.0.0.100" > >>> +check ovn-nbctl lsp-set-addresses migrator-no-cloning "00:00:00:00:01:ff > >>> 10.0.1.100" > >>> > >>> check ovn-nbctl lsp-add ls0 public > >>> check ovn-nbctl lsp-set-type public localnet > >>> @@ -14242,6 +14244,9 @@ check ovn-nbctl lsp-set-options public > >>> network_name=phys > >>> # chassis locations. Connectivity will be checked for resources > >>> located at hv1 > >>> # (First) and hv2 (Second) as well as for hv3 (Third) that does not > >>> take part > >>> # in port migration. > >>> +# > >>> +# Also, check that a port with chassis-mirroring-enabled=false doesn't > >>> clone > >>> +# packets through an enforced tunnel; instead localnet port still used > >>> check ovn-nbctl lsp-set-options first requested-chassis=hv1 > >>> check ovn-nbctl lsp-set-options second requested-chassis=hv2 > >>> check ovn-nbctl lsp-set-options third requested-chassis=hv3 > >>> @@ -14265,6 +14270,10 @@ for hv in hv1 hv2; do > >>> set Interface migrator external-ids:iface-id=migrator \ > >>> options:tx_pcap=$hv/migrator-tx.pcap \ > >>> options:rxq_pcap=$hv/migrator-rx.pcap > >>> + as $hv check ovs-vsctl -- add-port br-int migrator-no-cloning -- \ > >>> + set Interface migrator-no-cloning > >>> external-ids:iface-id=migrator-no-cloning \ > >>> + options:tx_pcap=$hv/migrator-no-cloning-tx.pcap \ > >>> + options:rxq_pcap=$hv/migrator-no-cloning-rx.pcap > >>> done > >>> > >>> send_arp() { > >>> @@ -14298,8 +14307,12 @@ reset_env() { > >>> reset_pcap_file hv3 third hv3/third > >>> reset_pcap_file hv1 migrator hv1/migrator > >>> reset_pcap_file hv2 migrator hv2/migrator > >>> + reset_pcap_file hv1 migrator-no-cloning hv1/migrator-no-cloning > >>> + reset_pcap_file hv2 migrator-no-cloning hv2/migrator-no-cloning > >>> > >>> - for port in hv1/migrator hv2/migrator hv1/first hv2/second > >>> hv3/third; do > >>> + for port in hv1/migrator hv2/migrator \ > >>> + hv1/migrator-no-cloning hv2/migrator-no-cloning \ > >>> + hv1/first hv2/second hv3/third; do > >>> : > $port.expected > >>> done > >>> } > >>> @@ -14309,12 +14322,15 @@ check_packets() { > >>> # attachment, hence using CONTAIN instead of strict matching > >>> OVN_CHECK_PACKETS_CONTAIN([hv1/migrator-tx.pcap], > >>> [hv1/migrator.expected]) > >>> OVN_CHECK_PACKETS_CONTAIN([hv2/migrator-tx.pcap], > >>> [hv2/migrator.expected]) > >>> + OVN_CHECK_PACKETS_CONTAIN([hv1/migrator-no-cloning-tx.pcap], > >>> [hv1/migrator-no-cloning.expected]) > >>> + OVN_CHECK_PACKETS_CONTAIN([hv2/migrator-no-cloning-tx.pcap], > >>> [hv2/migrator-no-cloning.expected]) > >>> OVN_CHECK_PACKETS_CONTAIN([hv1/first-tx.pcap], > >>> [hv1/first.expected]) > >>> OVN_CHECK_PACKETS_CONTAIN([hv2/second-tx.pcap], > >>> [hv2/second.expected]) > >>> OVN_CHECK_PACKETS_CONTAIN([hv3/third-tx.pcap], > >>> [hv3/third.expected]) > >>> } > >>> > >>> migrator_tpa=$(ip_to_hex 10 0 0 100) > >>> +nocloning_tpa=$(ip_to_hex 10 0 1 100) > >>> first_spa=$(ip_to_hex 10 0 0 1) > >>> second_spa=$(ip_to_hex 10 0 0 2) > >>> third_spa=$(ip_to_hex 10 0 0 3) > >>> @@ -14336,7 +14352,8 @@ wait_column "" Port_Binding > >>> requested_additional_chassis logical_port=migrator > >>> wait_for_ports_up > >>> > >>> # advertise location of ports through localnet port > >>> -send_garp hv1 migrator 0000000000ff ffffffffffff $migrator_spa > >>> $migrator_tpa > >>> +send_garp hv1 migrator 0000000000ff ffffffffffff $migrator_tpa > >>> $migrator_tpa > >>> +send_garp hv1 migrator-no-cloning 0000000001ff ffffffffffff > >>> $nocloning_tpa $nocloning_tpa > >>> send_garp hv1 first 000000000001 ffffffffffff $first_spa $first_tpa > >>> send_garp hv2 second 000000000002 ffffffffffff $second_spa $second_tpa > >>> send_garp hv3 third 000000000003 ffffffffffff $third_spa $third_tpa > >>> @@ -14404,13 +14421,19 @@ request=$(send_arp hv2 migrator 0000000000ff > >>> ffffffffffff $migrator_tpa $first_s > >>> check_packets > >>> reset_env > >>> > >>> -# Start port migration hv1 -> hv2: both hypervisors are now bound > >>> +# Start ports migration hv1 -> hv2: both hypervisors are now bound > >>> check ovn-nbctl lsp-set-options migrator requested-chassis=hv1,hv2 > >>> +check ovn-nbctl lsp-set-options migrator-no-cloning > >>> requested-chassis=hv1,hv2 \ > >>> + > >>> chassis-mirroring-enabled=false > >>> wait_for_ports_up > >>> -wait_column "$hv1_uuid" Port_Binding chassis logical_port=migrator > >>> -wait_column "$hv1_uuid" Port_Binding requested_chassis > >>> logical_port=migrator > >>> -wait_column "$hv2_uuid" Port_Binding additional_chassis > >>> logical_port=migrator > >>> -wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > >>> logical_port=migrator > >>> +for port in migrator migrator-no-cloning; do > >>> + wait_column "$hv1_uuid" Port_Binding chassis logical_port=$port > >>> + wait_column "$hv1_uuid" Port_Binding requested_chassis > >>> logical_port=$port > >>> + wait_column "$hv2_uuid" Port_Binding additional_chassis > >>> logical_port=$port > >>> + wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > >>> logical_port=$port > >>> +done > >>> + > >>> +send_garp hv1 migrator-no-cloning 0000000001ff ffffffffffff > >>> $nocloning_tpa $nocloning_tpa > >>> > >>> # check that... > >>> # unicast from First arrives to hv1:Migrator > >>> @@ -14419,11 +14442,20 @@ request=$(send_arp hv1 first 000000000001 > >>> 0000000000ff $first_spa $migrator_tpa) > >>> echo $request >> hv1/migrator.expected > >>> echo $request >> hv2/migrator.expected > >>> > >>> +# unicast from First arrives to hv1:Migrator-no-cloning > >>> +# unicast from First doesn't arrive to hv2:Migrator-no-cloning > >>> +request=$(send_arp hv1 first 000000000001 0000000001ff $first_spa > >>> $nocloning_tpa) > >>> +echo $request >> hv1/migrator-no-cloning.expected > >>> + > >>> # mcast from First arrives to hv1:Migrator > >>> # mcast from First arrives to hv2:Migrator > >>> +# mcast from First arrives to hv1:Migrator-no-cloning > >>> +# mcast from First arrives to hv2:Migrator-no-cloning > >>> request=$(send_arp hv1 first 000000000001 ffffffffffff $first_spa > >>> $migrator_tpa) > >>> echo $request >> hv1/migrator.expected > >>> echo $request >> hv2/migrator.expected > >>> +echo $request >> hv1/migrator-no-cloning.expected > >>> +echo $request >> hv2/migrator-no-cloning.expected > >>> echo $request >> hv3/third.expected > >>> echo $request >> hv2/second.expected > >>> > >>> @@ -14433,11 +14465,20 @@ request=$(send_arp hv2 second 000000000002 > >>> 0000000000ff $second_spa $migrator_tp > >>> echo $request >> hv1/migrator.expected > >>> echo $request >> hv2/migrator.expected > >>> > >>> +# unicast from Second doesn't arrive to hv1:Migrator-no-cloning > >>> +# unicast from Second arrives to hv2:Migrator-no-cloning > >>> +request=$(send_arp hv2 second 000000000002 0000000001ff $second_spa > >>> $nocloning_tpa) > >>> +echo $request >> hv2/migrator-no-cloning.expected > >>> + > >>> # mcast from Second arrives to hv1:Migrator > >>> # mcast from Second arrives to hv2:Migrator > >>> +# mcast from Second arrives to hv1:Migrator-no-cloning > >>> +# mcast from Second arrives to hv2:Migrator-no-cloning > >>> request=$(send_arp hv2 second 000000000002 ffffffffffff $second_spa > >>> $migrator_tpa) > >>> echo $request >> hv1/migrator.expected > >>> echo $request >> hv2/migrator.expected > >>> +echo $request >> hv1/migrator-no-cloning.expected > >>> +echo $request >> hv2/migrator-no-cloning.expected > >>> echo $request >> hv3/third.expected > >>> echo $request >> hv1/first.expected > >>> > >>> @@ -14447,11 +14488,20 @@ request=$(send_arp hv3 third 000000000003 > >>> 0000000000ff $third_spa $migrator_tpa) > >>> echo $request >> hv1/migrator.expected > >>> echo $request >> hv2/migrator.expected > >>> > >>> +# unicast from Third arrives to hv1:Migrator-no-cloning > >>> +# unicast from Third doesn't arrive to hv2:Migrator-no-cloning > >>> +request=$(send_arp hv3 third 000000000003 0000000001ff $third_spa > >>> $nocloning_tpa) > >>> +echo $request >> hv1/migrator-no-cloning.expected > >>> + > >>> # mcast from Third arrives to hv1:Migrator > >>> # mcast from Third arrives to hv2:Migrator > >>> +# mcast from Third arrives to hv1:Migrator-no-cloning > >>> +# mcast from Third arrives to hv2:Migrator-no-cloning > >>> request=$(send_arp hv3 third 000000000003 ffffffffffff $third_spa > >>> $migrator_tpa) > >>> echo $request >> hv1/migrator.expected > >>> echo $request >> hv2/migrator.expected > >>> +echo $request >> hv1/migrator-no-cloning.expected > >>> +echo $request >> hv2/migrator-no-cloning.expected > >>> echo $request >> hv1/first.expected > >>> echo $request >> hv2/second.expected > >>> > >>> @@ -14476,12 +14526,14 @@ request=$(send_arp hv1 migrator 0000000000ff > >>> ffffffffffff $migrator_tpa $first_s > >>> echo $request >> hv1/first.expected > >>> echo $request >> hv2/second.expected > >>> echo $request >> hv3/third.expected > >>> +echo $request >> hv1/migrator-no-cloning.expected > >>> > >>> # mcast from hv2:Migrator arrives to First, Second, and Third > >>> request=$(send_arp hv2 migrator 0000000000ff ffffffffffff > >>> $migrator_tpa $first_spa) > >>> echo $request >> hv1/first.expected > >>> echo $request >> hv2/second.expected > >>> echo $request >> hv3/third.expected > >>> +echo $request >> hv1/migrator-no-cloning.expected > >>> > >>> check_packets > >>> > >> _______________________________________________ > >> dev mailing list > >> [email protected] > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
