On Thu, Apr 14, 2022 at 4:54 PM Ihar Hrachyshka <[email protected]> wrote:
>
> On 4/13/22 11:36 AM, Numan Siddique wrote:
> > On Tue, Apr 12, 2022 at 8:06 PM Ihar Hrachyshka <[email protected]> wrote:
> >> There's a discussion to have on the intended behavior of a port with 2+
> >> chassis when the switch is attached to localnet.
> >>
> >> In general, the patch series makes all chassis switch to tunneling when
> >> multiple chassis are set. (See patch 11/15 "Clone packets to both port
> >> chassis" of the series.) This is done to guarantee delivery and to clone
> >> packets between multiple locations using smart flow rules. But because
> >> interfaces used to transfer tunneled packets may have different MTU
> >> properties from localnet, patch 14/15 "Allow to disable tunneling
> >> enforcement for multi-chassis port" introduces an option to enforce
> >> using localnet port to communicate between chassis.
> >>
> >> The question arises of the intended behavior when localnet communication
> >> is enforced. The problem is that each port binding location will
> >> generate ARP replies (gratuitous or otherwise) for the same MAC address,
> >> that will then be analyzed by upstream switches to update their ARP
> >> tables. Because of this, upstream switches will flip from one location
> >> to another. This is not optimal / wrong.
> >>
> >> We could designate one chassis as "main" (first in requested-chassis
> >> list) and generate gratuitous ARPs on this chassis only. But any packet
> >> generated by any other chassis hosting the same port binding will still
> >> make the upstream switch update its ARP tables, still making the port flip.
> >>
> >> With that in mind, 1) if there is a way to avoid flipping problems while
> >> hosting the same MAC address in two locations that I am not aware of?
> >> and 2) if not, do we really want to support localnet connectivity for 2+
> >> chassis port bindings [as implemented in this patch]?
> >>
> >> There is also an option to not support 2+ chassis for localnet attached
> >> logical switches. This would be unfortunate because in environments
> >> where MTU discrepancy is not an issue the usual approach of switching to
> >> tunneling works fine.
> >>
> >> Thoughts?
> > OVN supports the option - ovn-chassis-mac-mappings (which allows to
> > configure a unique mac for each chassis)
> > and this option is used to support routing between provider logical 
> > switches.
> >
> > Can we make use of this here ?  i.e when the VM sends out the ARP/GARP
> > ovn-controller will
> > modify the source mac to the chassis unique mac before sending out via
> > the patch port.
> > I'm not sure if it's feasible or not.  Probably you can explore and see.
>
>
> I may misunderstand the suggestion, please confirm if my understanding
> is right.
>
> AFAIU the ovn-chassis-mac-mappings option is used to rewrite the MAC
> address before sending a packet through localnet. It is used for
> distributed gw ports where only one chassis is active at any given point
> in time. In case of multi-chassis binding, 2+ chassis are actively
> hosting the same binding (with the same MAC address). Even if they
> rewrite to unique MAC addresses, they will announce the same IP address
> as located on 2+ different upstream switch' ports. This will make it flip.

Ok.  I guess when an arp request is sent to the ip,  the requester
will receive 2 arp replies with different macs.
Ideally the physical switches should not see L3 right? Or they do ?

>
>
> > Meanwhile  I'd suggest not supporting 2+ chassis for localnet attached
> > logical switches to start with
> > and document it.
>
>
> We could either ignore additional chassis set in requested-chassis, or
> we could still make an effort to make it work by enforcing tunneling for
> localnet attached LSs, while documenting the MTU requirements demanded
> when using 2+ requested-chassis for localnet attached switches.
>
> In case of live migration (the primary use case I'm trying to tackle
> with the series), the ability to simultaneously stand up the same
> binding on 2 chassis allows to shorten the time of port migration /
> network downtime of a migrated VM. Plus untangle the destination port
> configuration from CMS -> database persistence -> southbound conversion
> latency. Not leaving a way to support 2 chassis+ binding for
> localnet-attached LS means that live migration behavior will be
> inconsistent, even for environments that don't have the MTU discrepancy
> between localnet and tunneling interfaces.

Ok.  It seems fine to me then to support localnet switches using tunnelling.

Thanks
Numan


>
> On the other hand, if CMS wants to have the current behavior, it may
> simply continue setting requested-chassis to a single chassis name. In
> this context, maybe the knob that is added by 14/15 patch is not
> required - CMS may just avoid using the new feature if MTU is a risk.
>
> Thoughts?
>
>
> > Thanks
> > Numan
> >
> >
> >
> >> Ihar
> >>
> >> On 3/29/22 8:47 PM, Ihar Hrachyshka wrote:
> >>> When chassis-mirroring-enabled is set, controller will not enforce
> >>> tunneling for localnet-attached switches. This may be useful when the
> >>> network used to deliver tunnel packets doesn't have MTU headway to
> >>> redirect all packets originally sent through localnet port.
> >>>
> >>> Signed-off-by: Ihar Hrachyshka <[email protected]>
> >>> ---
> >>>    controller/physical.c | 12 ++++++--
> >>>    ovn-nb.xml            | 11 ++++++++
> >>>    ovn-sb.xml            | 11 ++++++++
> >>>    tests/ovn.at          | 66 ++++++++++++++++++++++++++++++++++++++-----
> >>>    4 files changed, 90 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/controller/physical.c b/controller/physical.c
> >>> index 02b771f3b..df5fe8cb3 100644
> >>> --- a/controller/physical.c
> >>> +++ b/controller/physical.c
> >>> @@ -1257,6 +1257,9 @@ consider_port_binding(struct ovsdb_idl_index 
> >>> *sbrec_port_binding_by_name,
> >>>            }
> >>>        }
> >>>
> >>> +    bool mirroring_enabled = smap_get_bool(
> >>> +        &binding->options, "chassis-mirroring-enabled", true);
> >>> +
> >>>        bool is_ha_remote = false;
> >>>        const struct chassis_tunnel *tun = NULL;
> >>>        const struct sbrec_port_binding *localnet_port =
> >>> @@ -1266,7 +1269,8 @@ consider_port_binding(struct ovsdb_idl_index 
> >>> *sbrec_port_binding_by_name,
> >>>            is_remote = true;
> >>>            /* Enforce tunneling while we clone packets to additional 
> >>> chassis b/c
> >>>             * otherwise upstream switch won't flood the packet to both 
> >>> chassis. */
> >>> -        if (localnet_port && !binding->additional_chassis) {
> >>> +        if (localnet_port &&
> >>> +                (!binding->additional_chassis || !mirroring_enabled)) {
> >>>                ofport = u16_to_ofp(simap_get(patch_ofports,
> >>>                                              
> >>> localnet_port->logical_port));
> >>>                if (!ofport) {
> >>> @@ -1316,8 +1320,10 @@ consider_port_binding(struct ovsdb_idl_index 
> >>> *sbrec_port_binding_by_name,
> >>>        }
> >>>
> >>>        /* Clone packets to additional chassis if needed. */
> >>> -    additional_tuns = get_additional_tunnels(binding, chassis,
> >>> -                                             chassis_tunnels);
> >>> +    if (mirroring_enabled) {
> >>> +        additional_tuns = get_additional_tunnels(binding, chassis,
> >>> +                                                 chassis_tunnels);
> >>> +    }
> >>>
> >>>        if (!is_remote) {
> >>>            /* Packets that arrive from a vif can belong to a VM or
> >>> diff --git a/ovn-nb.xml b/ovn-nb.xml
> >>> index 3fd6c5140..c3ca7ba36 100644
> >>> --- a/ovn-nb.xml
> >>> +++ b/ovn-nb.xml
> >>> @@ -1022,6 +1022,17 @@
> >>>              additional chassis that are allowed to bind the same port.
> >>>            </column>
> >>>
> >>> +        <column name="options" key="chassis-mirroring-enabled">
> >>> +          When multiple chassis are set for the port, and the logical 
> >>> switch is
> >>> +          connected to an external network through a 
> >>> <code>localnet</code>
> >>> +          port, tunneling is enforced for the port to guarantee delivery 
> >>> of
> >>> +          packets directed to the port to all its locations. This has MTU
> >>> +          implications because the network used for tunneling must have 
> >>> headway
> >>> +          for additional tunnel headers attached to packets. If the 
> >>> network is
> >>> +          not capable to deliver packets with additional tunnel headers, 
> >>> then
> >>> +          tunneling enforcement can be disabled using this option.
> >>> +        </column>
> >>> +
> >>>            <column name="options" key="activation-strategy">
> >>>              If set and used with <ref 
> >>> column="requested-additional-chassis"/>,
> >>>              specifies an activation strategy for the additional chassis. 
> >>> By
> >>> diff --git a/ovn-sb.xml b/ovn-sb.xml
> >>> index e8f91f351..dc276916a 100644
> >>> --- a/ovn-sb.xml
> >>> +++ b/ovn-sb.xml
> >>> @@ -3285,6 +3285,17 @@ tcp.flags = RST;
> >>>            chassis that are allowed to bind the same port.
> >>>          </column>
> >>>
> >>> +      <column name="options" key="chassis-mirroring-enabled">
> >>> +        When multiple chassis are set for the port, and the logical 
> >>> switch is
> >>> +        connected to an external network through a <code>localnet</code>
> >>> +        port, tunneling is enforced for the port to guarantee delivery of
> >>> +        packets directed to the port to all its locations. This has MTU
> >>> +        implications because the network used for tunneling must have 
> >>> headway
> >>> +        for additional tunnel headers attached to packets. If the 
> >>> network is
> >>> +        not capable to deliver packets with additional tunnel headers, 
> >>> then
> >>> +        tunneling enforcement can be disabled using this option.
> >>> +      </column>
> >>> +
> >>>          <column name="options" key="activation-strategy">
> >>>            If used with multiple chassis set in <ref 
> >>> column="requested-chassis"/>,
> >>>            specifies an activation strategy for all additional chassis. By
> >>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>> index c8992608e..760a4fd58 100644
> >>> --- a/tests/ovn.at
> >>> +++ b/tests/ovn.at
> >>> @@ -14227,10 +14227,12 @@ check ovn-nbctl lsp-add ls0 first
> >>>    check ovn-nbctl lsp-add ls0 second
> >>>    check ovn-nbctl lsp-add ls0 third
> >>>    check ovn-nbctl lsp-add ls0 migrator
> >>> +check ovn-nbctl lsp-add ls0 migrator-no-cloning
> >>>    check ovn-nbctl lsp-set-addresses first "00:00:00:00:00:01 10.0.0.1"
> >>>    check ovn-nbctl lsp-set-addresses second "00:00:00:00:00:02 10.0.0.2"
> >>>    check ovn-nbctl lsp-set-addresses third "00:00:00:00:00:03 10.0.0.3"
> >>>    check ovn-nbctl lsp-set-addresses migrator "00:00:00:00:00:ff 
> >>> 10.0.0.100"
> >>> +check ovn-nbctl lsp-set-addresses migrator-no-cloning "00:00:00:00:01:ff 
> >>> 10.0.1.100"
> >>>
> >>>    check ovn-nbctl lsp-add ls0 public
> >>>    check ovn-nbctl lsp-set-type public localnet
> >>> @@ -14242,6 +14244,9 @@ check ovn-nbctl lsp-set-options public 
> >>> network_name=phys
> >>>    # chassis locations. Connectivity will be checked for resources 
> >>> located at hv1
> >>>    # (First) and hv2 (Second) as well as for hv3 (Third) that does not 
> >>> take part
> >>>    # in port migration.
> >>> +#
> >>> +# Also, check that a port with chassis-mirroring-enabled=false doesn't 
> >>> clone
> >>> +# packets through an enforced tunnel; instead localnet port still used
> >>>    check ovn-nbctl lsp-set-options first requested-chassis=hv1
> >>>    check ovn-nbctl lsp-set-options second requested-chassis=hv2
> >>>    check ovn-nbctl lsp-set-options third requested-chassis=hv3
> >>> @@ -14265,6 +14270,10 @@ for hv in hv1 hv2; do
> >>>            set Interface migrator external-ids:iface-id=migrator \
> >>>            options:tx_pcap=$hv/migrator-tx.pcap \
> >>>            options:rxq_pcap=$hv/migrator-rx.pcap
> >>> +    as $hv check ovs-vsctl -- add-port br-int migrator-no-cloning -- \
> >>> +        set Interface migrator-no-cloning 
> >>> external-ids:iface-id=migrator-no-cloning \
> >>> +        options:tx_pcap=$hv/migrator-no-cloning-tx.pcap \
> >>> +        options:rxq_pcap=$hv/migrator-no-cloning-rx.pcap
> >>>    done
> >>>
> >>>    send_arp() {
> >>> @@ -14298,8 +14307,12 @@ reset_env() {
> >>>        reset_pcap_file hv3 third hv3/third
> >>>        reset_pcap_file hv1 migrator hv1/migrator
> >>>        reset_pcap_file hv2 migrator hv2/migrator
> >>> +    reset_pcap_file hv1 migrator-no-cloning hv1/migrator-no-cloning
> >>> +    reset_pcap_file hv2 migrator-no-cloning hv2/migrator-no-cloning
> >>>
> >>> -    for port in hv1/migrator hv2/migrator hv1/first hv2/second 
> >>> hv3/third; do
> >>> +    for port in hv1/migrator hv2/migrator \
> >>> +                hv1/migrator-no-cloning hv2/migrator-no-cloning \
> >>> +                hv1/first hv2/second hv3/third; do
> >>>            : > $port.expected
> >>>        done
> >>>    }
> >>> @@ -14309,12 +14322,15 @@ check_packets() {
> >>>        # attachment, hence using CONTAIN instead of strict matching
> >>>        OVN_CHECK_PACKETS_CONTAIN([hv1/migrator-tx.pcap], 
> >>> [hv1/migrator.expected])
> >>>        OVN_CHECK_PACKETS_CONTAIN([hv2/migrator-tx.pcap], 
> >>> [hv2/migrator.expected])
> >>> +    OVN_CHECK_PACKETS_CONTAIN([hv1/migrator-no-cloning-tx.pcap], 
> >>> [hv1/migrator-no-cloning.expected])
> >>> +    OVN_CHECK_PACKETS_CONTAIN([hv2/migrator-no-cloning-tx.pcap], 
> >>> [hv2/migrator-no-cloning.expected])
> >>>        OVN_CHECK_PACKETS_CONTAIN([hv1/first-tx.pcap], 
> >>> [hv1/first.expected])
> >>>        OVN_CHECK_PACKETS_CONTAIN([hv2/second-tx.pcap], 
> >>> [hv2/second.expected])
> >>>        OVN_CHECK_PACKETS_CONTAIN([hv3/third-tx.pcap], 
> >>> [hv3/third.expected])
> >>>    }
> >>>
> >>>    migrator_tpa=$(ip_to_hex 10 0 0 100)
> >>> +nocloning_tpa=$(ip_to_hex 10 0 1 100)
> >>>    first_spa=$(ip_to_hex 10 0 0 1)
> >>>    second_spa=$(ip_to_hex 10 0 0 2)
> >>>    third_spa=$(ip_to_hex 10 0 0 3)
> >>> @@ -14336,7 +14352,8 @@ wait_column "" Port_Binding 
> >>> requested_additional_chassis logical_port=migrator
> >>>    wait_for_ports_up
> >>>
> >>>    # advertise location of ports through localnet port
> >>> -send_garp hv1 migrator 0000000000ff ffffffffffff $migrator_spa 
> >>> $migrator_tpa
> >>> +send_garp hv1 migrator 0000000000ff ffffffffffff $migrator_tpa 
> >>> $migrator_tpa
> >>> +send_garp hv1 migrator-no-cloning 0000000001ff ffffffffffff 
> >>> $nocloning_tpa $nocloning_tpa
> >>>    send_garp hv1 first 000000000001 ffffffffffff $first_spa $first_tpa
> >>>    send_garp hv2 second 000000000002 ffffffffffff $second_spa $second_tpa
> >>>    send_garp hv3 third 000000000003 ffffffffffff $third_spa $third_tpa
> >>> @@ -14404,13 +14421,19 @@ request=$(send_arp hv2 migrator 0000000000ff 
> >>> ffffffffffff $migrator_tpa $first_s
> >>>    check_packets
> >>>    reset_env
> >>>
> >>> -# Start port migration hv1 -> hv2: both hypervisors are now bound
> >>> +# Start ports migration hv1 -> hv2: both hypervisors are now bound
> >>>    check ovn-nbctl lsp-set-options migrator requested-chassis=hv1,hv2
> >>> +check ovn-nbctl lsp-set-options migrator-no-cloning 
> >>> requested-chassis=hv1,hv2 \
> >>> +                                                    
> >>> chassis-mirroring-enabled=false
> >>>    wait_for_ports_up
> >>> -wait_column "$hv1_uuid" Port_Binding chassis logical_port=migrator
> >>> -wait_column "$hv1_uuid" Port_Binding requested_chassis 
> >>> logical_port=migrator
> >>> -wait_column "$hv2_uuid" Port_Binding additional_chassis 
> >>> logical_port=migrator
> >>> -wait_column "$hv2_uuid" Port_Binding requested_additional_chassis 
> >>> logical_port=migrator
> >>> +for port in migrator migrator-no-cloning; do
> >>> +    wait_column "$hv1_uuid" Port_Binding chassis logical_port=$port
> >>> +    wait_column "$hv1_uuid" Port_Binding requested_chassis 
> >>> logical_port=$port
> >>> +    wait_column "$hv2_uuid" Port_Binding additional_chassis 
> >>> logical_port=$port
> >>> +    wait_column "$hv2_uuid" Port_Binding requested_additional_chassis 
> >>> logical_port=$port
> >>> +done
> >>> +
> >>> +send_garp hv1 migrator-no-cloning 0000000001ff ffffffffffff 
> >>> $nocloning_tpa $nocloning_tpa
> >>>
> >>>    # check that...
> >>>    # unicast from First arrives to hv1:Migrator
> >>> @@ -14419,11 +14442,20 @@ request=$(send_arp hv1 first 000000000001 
> >>> 0000000000ff $first_spa $migrator_tpa)
> >>>    echo $request >> hv1/migrator.expected
> >>>    echo $request >> hv2/migrator.expected
> >>>
> >>> +# unicast from First arrives to hv1:Migrator-no-cloning
> >>> +# unicast from First doesn't arrive to hv2:Migrator-no-cloning
> >>> +request=$(send_arp hv1 first 000000000001 0000000001ff $first_spa 
> >>> $nocloning_tpa)
> >>> +echo $request >> hv1/migrator-no-cloning.expected
> >>> +
> >>>    # mcast from First arrives to hv1:Migrator
> >>>    # mcast from First arrives to hv2:Migrator
> >>> +# mcast from First arrives to hv1:Migrator-no-cloning
> >>> +# mcast from First arrives to hv2:Migrator-no-cloning
> >>>    request=$(send_arp hv1 first 000000000001 ffffffffffff $first_spa 
> >>> $migrator_tpa)
> >>>    echo $request >> hv1/migrator.expected
> >>>    echo $request >> hv2/migrator.expected
> >>> +echo $request >> hv1/migrator-no-cloning.expected
> >>> +echo $request >> hv2/migrator-no-cloning.expected
> >>>    echo $request >> hv3/third.expected
> >>>    echo $request >> hv2/second.expected
> >>>
> >>> @@ -14433,11 +14465,20 @@ request=$(send_arp hv2 second 000000000002 
> >>> 0000000000ff $second_spa $migrator_tp
> >>>    echo $request >> hv1/migrator.expected
> >>>    echo $request >> hv2/migrator.expected
> >>>
> >>> +# unicast from Second doesn't arrive to hv1:Migrator-no-cloning
> >>> +# unicast from Second arrives to hv2:Migrator-no-cloning
> >>> +request=$(send_arp hv2 second 000000000002 0000000001ff $second_spa 
> >>> $nocloning_tpa)
> >>> +echo $request >> hv2/migrator-no-cloning.expected
> >>> +
> >>>    # mcast from Second arrives to hv1:Migrator
> >>>    # mcast from Second arrives to hv2:Migrator
> >>> +# mcast from Second arrives to hv1:Migrator-no-cloning
> >>> +# mcast from Second arrives to hv2:Migrator-no-cloning
> >>>    request=$(send_arp hv2 second 000000000002 ffffffffffff $second_spa 
> >>> $migrator_tpa)
> >>>    echo $request >> hv1/migrator.expected
> >>>    echo $request >> hv2/migrator.expected
> >>> +echo $request >> hv1/migrator-no-cloning.expected
> >>> +echo $request >> hv2/migrator-no-cloning.expected
> >>>    echo $request >> hv3/third.expected
> >>>    echo $request >> hv1/first.expected
> >>>
> >>> @@ -14447,11 +14488,20 @@ request=$(send_arp hv3 third 000000000003 
> >>> 0000000000ff $third_spa $migrator_tpa)
> >>>    echo $request >> hv1/migrator.expected
> >>>    echo $request >> hv2/migrator.expected
> >>>
> >>> +# unicast from Third arrives to hv1:Migrator-no-cloning
> >>> +# unicast from Third doesn't arrive to hv2:Migrator-no-cloning
> >>> +request=$(send_arp hv3 third 000000000003 0000000001ff $third_spa 
> >>> $nocloning_tpa)
> >>> +echo $request >> hv1/migrator-no-cloning.expected
> >>> +
> >>>    # mcast from Third arrives to hv1:Migrator
> >>>    # mcast from Third arrives to hv2:Migrator
> >>> +# mcast from Third arrives to hv1:Migrator-no-cloning
> >>> +# mcast from Third arrives to hv2:Migrator-no-cloning
> >>>    request=$(send_arp hv3 third 000000000003 ffffffffffff $third_spa 
> >>> $migrator_tpa)
> >>>    echo $request >> hv1/migrator.expected
> >>>    echo $request >> hv2/migrator.expected
> >>> +echo $request >> hv1/migrator-no-cloning.expected
> >>> +echo $request >> hv2/migrator-no-cloning.expected
> >>>    echo $request >> hv1/first.expected
> >>>    echo $request >> hv2/second.expected
> >>>
> >>> @@ -14476,12 +14526,14 @@ request=$(send_arp hv1 migrator 0000000000ff 
> >>> ffffffffffff $migrator_tpa $first_s
> >>>    echo $request >> hv1/first.expected
> >>>    echo $request >> hv2/second.expected
> >>>    echo $request >> hv3/third.expected
> >>> +echo $request >> hv1/migrator-no-cloning.expected
> >>>
> >>>    # mcast from hv2:Migrator arrives to First, Second, and Third
> >>>    request=$(send_arp hv2 migrator 0000000000ff ffffffffffff 
> >>> $migrator_tpa $first_spa)
> >>>    echo $request >> hv1/first.expected
> >>>    echo $request >> hv2/second.expected
> >>>    echo $request >> hv3/third.expected
> >>> +echo $request >> hv1/migrator-no-cloning.expected
> >>>
> >>>    check_packets
> >>>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to