On 8 Oct 2023, at 7:26, Mike Pattrick wrote:

> Currently a bond will not always revalidate when an active member
> changes. This can result in counter-intuitive behaviors like the fact
> that using ovs-appctl bond/set-active-member will cause the bond to
> revalidate but changing other_config:bond-primary will not trigger a
> revalidate in the bond.
>
> When revalidation is not set but the active member changes in an
> unbalanced bond, OVS may send traffic out of previously active member
> instead of the new active member.
>
> This change will always mark unbalanced bonds for revalidation if the
> active member changes.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
> Signed-off-by: Mike Pattrick <m...@redhat.com>

Hi Mike,

Thanks for the patch. Some comments below, and the subject needs to end with a 
dot.

//Eelco


> ---
> v2: Added a test
> v3: Made the test more reliable
> ---
>  ofproto/bond.c          |  8 +++++--
>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index cfdf44f85..fb108d30a 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond 
> *, bool force)
>  static bool bond_is_falling_back_to_ab(const struct bond *);
>  static void bond_add_lb_output_buckets(const struct bond *);
>  static void bond_del_lb_output_buckets(const struct bond *);
> +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
>
>
>  /* Attempts to parse 's' as the name of a bond balancing mode.  If 
> successful,
> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const 
> struct eth_addr mac)
>
>  static void
>  bond_active_member_changed(struct bond *bond)
> +    OVS_REQ_WRLOCK(rwlock)
>  {
>      if (bond->active_member) {
>          struct eth_addr mac;
>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>          bond->active_member_mac = mac;
> +        if (!bond_is_balanced(bond)) {
> +            bond->bond_revalidate = true;
> +        }
>      } else {
>          bond->active_member_mac = eth_addr_zero;
>      }
> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, 
> uint32_t *recirc_id,
>  /* Rebalancing. */
>
>  static bool
> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)

Any reason why the annotation was removed? clang would be smart enough to see 
that a write lock is fine for a readlock. It seems to compile fine here. Or 
were you trying to clean this up and move it to the next line :)

> +bond_is_balanced(const struct bond *bond)
>  {
>      return bond->rebalance_interval
>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn 
> *conn,
>      }
>
>      if (bond->active_member != member) {
> -        bond->bond_revalidate = true;
>          bond->active_member = member;
>          VLOG_INFO("bond %s: active member is now %s",
>                    bond->name, member->name);
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 945037ec0..7075c35ea 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 
> 2 10.1.1.2 | FORMAT_PING
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([datapath - ping over active-backup bond])

This does not really represent the test case. What about ‘datapath - bond 
active-backup failover’

> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "priority=1 actions=drop"])
> +AT_CHECK([ovs-ofctl add-flow br1 "priority=1 actions=drop"])
> +AT_CHECK([ovs-ofctl add-flow br0 "ip,icmp actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br1 "ip,icmp actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br0 "arp actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br1 "arp actions=normal"])

Do we need this ‘complex’ set of rules, will the following not work?

AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])

> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
> +on_exit 'ip link del link0a'
> +on_exit 'ip link del link0b'
> +AT_CHECK([ip link add link0a type veth peer name link1a])
> +AT_CHECK([ip link add link0b type veth peer name link1b])
> +
> +AT_CHECK([ip link set dev link0a up])
> +AT_CHECK([ip link set dev link1a up])
> +AT_CHECK([ip link set dev link0b up])
> +AT_CHECK([ip link set dev link1b up])
> +
> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b 
> bond_mode=active-backup])
> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b 
> bond_mode=active-backup])
> +
> +dnl Set primary

dnl Set the primary bond member.


> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
> +                    set port bond1 other_config:bond-primary=link1a])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], 
> [0], [dnl
> +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Change primary

dnl Change the primary bond member.

> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
> +                    set port bond1 other_config:bond-primary=link1b])
> +
> +sleep 0

This short sleep makes it work, however, it could result in a flaky test. Is 
there something else we could wait for to be 100% sure we are ready?

> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.5 -w 6 10.1.1.2 | FORMAT_PING], 
> [0], [dnl
> +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> +])

Should we not add some code to verify traffic goes over the correct link?

> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([datapath - ping over vxlan tunnel])
>  OVS_CHECK_TUNNEL_TSO()
>  OVS_CHECK_VXLAN()


Can you also check this test on all datapaths? It’s failing for me once out of 
ten on userspace;


@@ -1,2 +1,2 @@
-6 packets transmitted, 6 received, 0% packet loss, time 0ms
+7 packets transmitted, 6 received, 14.2857% packet loss, time 0ms

sudo make check-kernel TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  
sudo make check-offloads TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  
sudo make check-system-userspace TESTSUITEFLAGS="-k 'ping over active-backup 
bond'"

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to