On 4/13/22 11:36 AM, Numan Siddique wrote:
On Tue, Apr 12, 2022 at 8:06 PM Ihar Hrachyshka <[email protected]> wrote:
There's a discussion to have on the intended behavior of a port with 2+
chassis when the switch is attached to localnet.

In general, the patch series makes all chassis switch to tunneling when
multiple chassis are set. (See patch 11/15 "Clone packets to both port
chassis" of the series.) This is done to guarantee delivery and to clone
packets between multiple locations using smart flow rules. But because
interfaces used to transfer tunneled packets may have different MTU
properties from localnet, patch 14/15 "Allow to disable tunneling
enforcement for multi-chassis port" introduces an option to enforce
using localnet port to communicate between chassis.

The question arises of the intended behavior when localnet communication
is enforced. The problem is that each port binding location will
generate ARP replies (gratuitous or otherwise) for the same MAC address,
that will then be analyzed by upstream switches to update their ARP
tables. Because of this, upstream switches will flip from one location
to another. This is not optimal / wrong.

We could designate one chassis as "main" (first in requested-chassis
list) and generate gratuitous ARPs on this chassis only. But any packet
generated by any other chassis hosting the same port binding will still
make the upstream switch update its ARP tables, still making the port flip.

With that in mind, 1) if there is a way to avoid flipping problems while
hosting the same MAC address in two locations that I am not aware of?
and 2) if not, do we really want to support localnet connectivity for 2+
chassis port bindings [as implemented in this patch]?

There is also an option to not support 2+ chassis for localnet attached
logical switches. This would be unfortunate because in environments
where MTU discrepancy is not an issue the usual approach of switching to
tunneling works fine.

Thoughts?
OVN supports the option - ovn-chassis-mac-mappings (which allows to
configure a unique mac for each chassis)
and this option is used to support routing between provider logical switches.

Can we make use of this here ?  i.e when the VM sends out the ARP/GARP
ovn-controller will
modify the source mac to the chassis unique mac before sending out via
the patch port.
I'm not sure if it's feasible or not.  Probably you can explore and see.


I may misunderstand the suggestion, please confirm if my understanding is right.

AFAIU the ovn-chassis-mac-mappings option is used to rewrite the MAC address before sending a packet through localnet. It is used for distributed gw ports where only one chassis is active at any given point in time. In case of multi-chassis binding, 2+ chassis are actively hosting the same binding (with the same MAC address). Even if they rewrite to unique MAC addresses, they will announce the same IP address as located on 2+ different upstream switch' ports. This will make it flip.


Meanwhile  I'd suggest not supporting 2+ chassis for localnet attached
logical switches to start with
and document it.


We could either ignore additional chassis set in requested-chassis, or we could still make an effort to make it work by enforcing tunneling for localnet attached LSs, while documenting the MTU requirements demanded when using 2+ requested-chassis for localnet attached switches.

In case of live migration (the primary use case I'm trying to tackle with the series), the ability to simultaneously stand up the same binding on 2 chassis allows to shorten the time of port migration / network downtime of a migrated VM. Plus untangle the destination port configuration from CMS -> database persistence -> southbound conversion latency. Not leaving a way to support 2 chassis+ binding for localnet-attached LS means that live migration behavior will be inconsistent, even for environments that don't have the MTU discrepancy between localnet and tunneling interfaces.

On the other hand, if CMS wants to have the current behavior, it may simply continue setting requested-chassis to a single chassis name. In this context, maybe the knob that is added by 14/15 patch is not required - CMS may just avoid using the new feature if MTU is a risk.

Thoughts?


Thanks
Numan



Ihar

On 3/29/22 8:47 PM, Ihar Hrachyshka wrote:
When chassis-mirroring-enabled is set, controller will not enforce
tunneling for localnet-attached switches. This may be useful when the
network used to deliver tunnel packets doesn't have MTU headway to
redirect all packets originally sent through localnet port.

Signed-off-by: Ihar Hrachyshka <[email protected]>
---
   controller/physical.c | 12 ++++++--
   ovn-nb.xml            | 11 ++++++++
   ovn-sb.xml            | 11 ++++++++
   tests/ovn.at          | 66 ++++++++++++++++++++++++++++++++++++++-----
   4 files changed, 90 insertions(+), 10 deletions(-)

diff --git a/controller/physical.c b/controller/physical.c
index 02b771f3b..df5fe8cb3 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -1257,6 +1257,9 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
           }
       }

+    bool mirroring_enabled = smap_get_bool(
+        &binding->options, "chassis-mirroring-enabled", true);
+
       bool is_ha_remote = false;
       const struct chassis_tunnel *tun = NULL;
       const struct sbrec_port_binding *localnet_port =
@@ -1266,7 +1269,8 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
           is_remote = true;
           /* Enforce tunneling while we clone packets to additional chassis b/c
            * otherwise upstream switch won't flood the packet to both chassis. 
*/
-        if (localnet_port && !binding->additional_chassis) {
+        if (localnet_port &&
+                (!binding->additional_chassis || !mirroring_enabled)) {
               ofport = u16_to_ofp(simap_get(patch_ofports,
                                             localnet_port->logical_port));
               if (!ofport) {
@@ -1316,8 +1320,10 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
       }

       /* Clone packets to additional chassis if needed. */
-    additional_tuns = get_additional_tunnels(binding, chassis,
-                                             chassis_tunnels);
+    if (mirroring_enabled) {
+        additional_tuns = get_additional_tunnels(binding, chassis,
+                                                 chassis_tunnels);
+    }

       if (!is_remote) {
           /* Packets that arrive from a vif can belong to a VM or
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 3fd6c5140..c3ca7ba36 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -1022,6 +1022,17 @@
             additional chassis that are allowed to bind the same port.
           </column>

+        <column name="options" key="chassis-mirroring-enabled">
+          When multiple chassis are set for the port, and the logical switch is
+          connected to an external network through a <code>localnet</code>
+          port, tunneling is enforced for the port to guarantee delivery of
+          packets directed to the port to all its locations. This has MTU
+          implications because the network used for tunneling must have headway
+          for additional tunnel headers attached to packets. If the network is
+          not capable to deliver packets with additional tunnel headers, then
+          tunneling enforcement can be disabled using this option.
+        </column>
+
           <column name="options" key="activation-strategy">
             If set and used with <ref column="requested-additional-chassis"/>,
             specifies an activation strategy for the additional chassis. By
diff --git a/ovn-sb.xml b/ovn-sb.xml
index e8f91f351..dc276916a 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -3285,6 +3285,17 @@ tcp.flags = RST;
           chassis that are allowed to bind the same port.
         </column>

+      <column name="options" key="chassis-mirroring-enabled">
+        When multiple chassis are set for the port, and the logical switch is
+        connected to an external network through a <code>localnet</code>
+        port, tunneling is enforced for the port to guarantee delivery of
+        packets directed to the port to all its locations. This has MTU
+        implications because the network used for tunneling must have headway
+        for additional tunnel headers attached to packets. If the network is
+        not capable to deliver packets with additional tunnel headers, then
+        tunneling enforcement can be disabled using this option.
+      </column>
+
         <column name="options" key="activation-strategy">
           If used with multiple chassis set in <ref 
column="requested-chassis"/>,
           specifies an activation strategy for all additional chassis. By
diff --git a/tests/ovn.at b/tests/ovn.at
index c8992608e..760a4fd58 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -14227,10 +14227,12 @@ check ovn-nbctl lsp-add ls0 first
   check ovn-nbctl lsp-add ls0 second
   check ovn-nbctl lsp-add ls0 third
   check ovn-nbctl lsp-add ls0 migrator
+check ovn-nbctl lsp-add ls0 migrator-no-cloning
   check ovn-nbctl lsp-set-addresses first "00:00:00:00:00:01 10.0.0.1"
   check ovn-nbctl lsp-set-addresses second "00:00:00:00:00:02 10.0.0.2"
   check ovn-nbctl lsp-set-addresses third "00:00:00:00:00:03 10.0.0.3"
   check ovn-nbctl lsp-set-addresses migrator "00:00:00:00:00:ff 10.0.0.100"
+check ovn-nbctl lsp-set-addresses migrator-no-cloning "00:00:00:00:01:ff 
10.0.1.100"

   check ovn-nbctl lsp-add ls0 public
   check ovn-nbctl lsp-set-type public localnet
@@ -14242,6 +14244,9 @@ check ovn-nbctl lsp-set-options public network_name=phys
   # chassis locations. Connectivity will be checked for resources located at 
hv1
   # (First) and hv2 (Second) as well as for hv3 (Third) that does not take part
   # in port migration.
+#
+# Also, check that a port with chassis-mirroring-enabled=false doesn't clone
+# packets through an enforced tunnel; instead localnet port still used
   check ovn-nbctl lsp-set-options first requested-chassis=hv1
   check ovn-nbctl lsp-set-options second requested-chassis=hv2
   check ovn-nbctl lsp-set-options third requested-chassis=hv3
@@ -14265,6 +14270,10 @@ for hv in hv1 hv2; do
           set Interface migrator external-ids:iface-id=migrator \
           options:tx_pcap=$hv/migrator-tx.pcap \
           options:rxq_pcap=$hv/migrator-rx.pcap
+    as $hv check ovs-vsctl -- add-port br-int migrator-no-cloning -- \
+        set Interface migrator-no-cloning 
external-ids:iface-id=migrator-no-cloning \
+        options:tx_pcap=$hv/migrator-no-cloning-tx.pcap \
+        options:rxq_pcap=$hv/migrator-no-cloning-rx.pcap
   done

   send_arp() {
@@ -14298,8 +14307,12 @@ reset_env() {
       reset_pcap_file hv3 third hv3/third
       reset_pcap_file hv1 migrator hv1/migrator
       reset_pcap_file hv2 migrator hv2/migrator
+    reset_pcap_file hv1 migrator-no-cloning hv1/migrator-no-cloning
+    reset_pcap_file hv2 migrator-no-cloning hv2/migrator-no-cloning

-    for port in hv1/migrator hv2/migrator hv1/first hv2/second hv3/third; do
+    for port in hv1/migrator hv2/migrator \
+                hv1/migrator-no-cloning hv2/migrator-no-cloning \
+                hv1/first hv2/second hv3/third; do
           : > $port.expected
       done
   }
@@ -14309,12 +14322,15 @@ check_packets() {
       # attachment, hence using CONTAIN instead of strict matching
       OVN_CHECK_PACKETS_CONTAIN([hv1/migrator-tx.pcap], 
[hv1/migrator.expected])
       OVN_CHECK_PACKETS_CONTAIN([hv2/migrator-tx.pcap], 
[hv2/migrator.expected])
+    OVN_CHECK_PACKETS_CONTAIN([hv1/migrator-no-cloning-tx.pcap], 
[hv1/migrator-no-cloning.expected])
+    OVN_CHECK_PACKETS_CONTAIN([hv2/migrator-no-cloning-tx.pcap], 
[hv2/migrator-no-cloning.expected])
       OVN_CHECK_PACKETS_CONTAIN([hv1/first-tx.pcap], [hv1/first.expected])
       OVN_CHECK_PACKETS_CONTAIN([hv2/second-tx.pcap], [hv2/second.expected])
       OVN_CHECK_PACKETS_CONTAIN([hv3/third-tx.pcap], [hv3/third.expected])
   }

   migrator_tpa=$(ip_to_hex 10 0 0 100)
+nocloning_tpa=$(ip_to_hex 10 0 1 100)
   first_spa=$(ip_to_hex 10 0 0 1)
   second_spa=$(ip_to_hex 10 0 0 2)
   third_spa=$(ip_to_hex 10 0 0 3)
@@ -14336,7 +14352,8 @@ wait_column "" Port_Binding 
requested_additional_chassis logical_port=migrator
   wait_for_ports_up

   # advertise location of ports through localnet port
-send_garp hv1 migrator 0000000000ff ffffffffffff $migrator_spa $migrator_tpa
+send_garp hv1 migrator 0000000000ff ffffffffffff $migrator_tpa $migrator_tpa
+send_garp hv1 migrator-no-cloning 0000000001ff ffffffffffff $nocloning_tpa 
$nocloning_tpa
   send_garp hv1 first 000000000001 ffffffffffff $first_spa $first_tpa
   send_garp hv2 second 000000000002 ffffffffffff $second_spa $second_tpa
   send_garp hv3 third 000000000003 ffffffffffff $third_spa $third_tpa
@@ -14404,13 +14421,19 @@ request=$(send_arp hv2 migrator 0000000000ff 
ffffffffffff $migrator_tpa $first_s
   check_packets
   reset_env

-# Start port migration hv1 -> hv2: both hypervisors are now bound
+# Start ports migration hv1 -> hv2: both hypervisors are now bound
   check ovn-nbctl lsp-set-options migrator requested-chassis=hv1,hv2
+check ovn-nbctl lsp-set-options migrator-no-cloning requested-chassis=hv1,hv2 \
+                                                    
chassis-mirroring-enabled=false
   wait_for_ports_up
-wait_column "$hv1_uuid" Port_Binding chassis logical_port=migrator
-wait_column "$hv1_uuid" Port_Binding requested_chassis logical_port=migrator
-wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=migrator
-wait_column "$hv2_uuid" Port_Binding requested_additional_chassis 
logical_port=migrator
+for port in migrator migrator-no-cloning; do
+    wait_column "$hv1_uuid" Port_Binding chassis logical_port=$port
+    wait_column "$hv1_uuid" Port_Binding requested_chassis logical_port=$port
+    wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=$port
+    wait_column "$hv2_uuid" Port_Binding requested_additional_chassis 
logical_port=$port
+done
+
+send_garp hv1 migrator-no-cloning 0000000001ff ffffffffffff $nocloning_tpa 
$nocloning_tpa

   # check that...
   # unicast from First arrives to hv1:Migrator
@@ -14419,11 +14442,20 @@ request=$(send_arp hv1 first 000000000001 
0000000000ff $first_spa $migrator_tpa)
   echo $request >> hv1/migrator.expected
   echo $request >> hv2/migrator.expected

+# unicast from First arrives to hv1:Migrator-no-cloning
+# unicast from First doesn't arrive to hv2:Migrator-no-cloning
+request=$(send_arp hv1 first 000000000001 0000000001ff $first_spa 
$nocloning_tpa)
+echo $request >> hv1/migrator-no-cloning.expected
+
   # mcast from First arrives to hv1:Migrator
   # mcast from First arrives to hv2:Migrator
+# mcast from First arrives to hv1:Migrator-no-cloning
+# mcast from First arrives to hv2:Migrator-no-cloning
   request=$(send_arp hv1 first 000000000001 ffffffffffff $first_spa 
$migrator_tpa)
   echo $request >> hv1/migrator.expected
   echo $request >> hv2/migrator.expected
+echo $request >> hv1/migrator-no-cloning.expected
+echo $request >> hv2/migrator-no-cloning.expected
   echo $request >> hv3/third.expected
   echo $request >> hv2/second.expected

@@ -14433,11 +14465,20 @@ request=$(send_arp hv2 second 000000000002 
0000000000ff $second_spa $migrator_tp
   echo $request >> hv1/migrator.expected
   echo $request >> hv2/migrator.expected

+# unicast from Second doesn't arrive to hv1:Migrator-no-cloning
+# unicast from Second arrives to hv2:Migrator-no-cloning
+request=$(send_arp hv2 second 000000000002 0000000001ff $second_spa 
$nocloning_tpa)
+echo $request >> hv2/migrator-no-cloning.expected
+
   # mcast from Second arrives to hv1:Migrator
   # mcast from Second arrives to hv2:Migrator
+# mcast from Second arrives to hv1:Migrator-no-cloning
+# mcast from Second arrives to hv2:Migrator-no-cloning
   request=$(send_arp hv2 second 000000000002 ffffffffffff $second_spa 
$migrator_tpa)
   echo $request >> hv1/migrator.expected
   echo $request >> hv2/migrator.expected
+echo $request >> hv1/migrator-no-cloning.expected
+echo $request >> hv2/migrator-no-cloning.expected
   echo $request >> hv3/third.expected
   echo $request >> hv1/first.expected

@@ -14447,11 +14488,20 @@ request=$(send_arp hv3 third 000000000003 
0000000000ff $third_spa $migrator_tpa)
   echo $request >> hv1/migrator.expected
   echo $request >> hv2/migrator.expected

+# unicast from Third arrives to hv1:Migrator-no-cloning
+# unicast from Third doesn't arrive to hv2:Migrator-no-cloning
+request=$(send_arp hv3 third 000000000003 0000000001ff $third_spa 
$nocloning_tpa)
+echo $request >> hv1/migrator-no-cloning.expected
+
   # mcast from Third arrives to hv1:Migrator
   # mcast from Third arrives to hv2:Migrator
+# mcast from Third arrives to hv1:Migrator-no-cloning
+# mcast from Third arrives to hv2:Migrator-no-cloning
   request=$(send_arp hv3 third 000000000003 ffffffffffff $third_spa 
$migrator_tpa)
   echo $request >> hv1/migrator.expected
   echo $request >> hv2/migrator.expected
+echo $request >> hv1/migrator-no-cloning.expected
+echo $request >> hv2/migrator-no-cloning.expected
   echo $request >> hv1/first.expected
   echo $request >> hv2/second.expected

@@ -14476,12 +14526,14 @@ request=$(send_arp hv1 migrator 0000000000ff 
ffffffffffff $migrator_tpa $first_s
   echo $request >> hv1/first.expected
   echo $request >> hv2/second.expected
   echo $request >> hv3/third.expected
+echo $request >> hv1/migrator-no-cloning.expected

   # mcast from hv2:Migrator arrives to First, Second, and Third
   request=$(send_arp hv2 migrator 0000000000ff ffffffffffff $migrator_tpa 
$first_spa)
   echo $request >> hv1/first.expected
   echo $request >> hv2/second.expected
   echo $request >> hv3/third.expected
+echo $request >> hv1/migrator-no-cloning.expected

   check_packets

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to