On Thu, Oct 19, 2023 at 9:00 AM Eelco Chaudron <[email protected]> wrote: > > > > On 19 Oct 2023, at 4:37, Mike Pattrick wrote: > > > Currently a bond will not always revalidate when an active member > > changes. This can result in counter-intuitive behaviors like the fact > > that using ovs-appctl bond/set-active-member will cause the bond to > > revalidate but changing other_config:bond-primary will not trigger a > > revalidate in the bond. > > > > When revalidation is not set but the active member changes in an > > unbalanced bond, OVS may send traffic out of previously active member > > instead of the new active member. > > > > This change will always mark unbalanced bonds for revalidation if the > > active member changes. > > Thanks for fixing my comments on V3, some more comments on the tests, and the > removed annotation. > > //Eelco > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979 > > Signed-off-by: Mike Pattrick <[email protected]> > > --- > > v2: Added a test > > v3: Made the test more reliable > > v4: Made test much more reliable > > --- > > ofproto/bond.c | 8 +++++-- > > tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 56 insertions(+), 2 deletions(-) > > > > diff --git a/ofproto/bond.c b/ofproto/bond.c > > index cfdf44f85..fb108d30a 100644 > > --- a/ofproto/bond.c > > +++ b/ofproto/bond.c > > @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond > > *, bool force) > > static bool bond_is_falling_back_to_ab(const struct bond *); > > static void bond_add_lb_output_buckets(const struct bond *); > > static void bond_del_lb_output_buckets(const struct bond *); > > +static bool bond_is_balanced(const struct bond *bond) > > OVS_REQ_RDLOCK(rwlock); > > > > > > /* Attempts to parse 's' as the name of a bond balancing mode. If > > successful, > > @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, > > const struct eth_addr mac) > > > > static void > > bond_active_member_changed(struct bond *bond) > > + OVS_REQ_WRLOCK(rwlock) > > { > > if (bond->active_member) { > > struct eth_addr mac; > > netdev_get_etheraddr(bond->active_member->netdev, &mac); > > bond->active_member_mac = mac; > > + if (!bond_is_balanced(bond)) { > > + bond->bond_revalidate = true; > > + } > > } else { > > bond->active_member_mac = eth_addr_zero; > > } > > @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, > > uint32_t *recirc_id, > > /* Rebalancing. */ > > > > static bool > > -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock) > > +bond_is_balanced(const struct bond *bond) > > See the other email, but I think we should re-add the annotation as there > might be other (new) callers of this function that need protection from > calling this without the readlock. > > > { > > return bond->rebalance_interval > > && (bond->balance == BM_SLB || bond->balance == BM_TCP) > > @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn > > *conn, > > } > > > > if (bond->active_member != member) { > > - bond->bond_revalidate = true; > > bond->active_member = member; > > VLOG_INFO("bond %s: active member is now %s", > > bond->name, member->name); > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > > index 945037ec0..52c233be9 100644 > > --- a/tests/system-traffic.at > > +++ b/tests/system-traffic.at > > @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 > > -w 2 10.1.1.2 | FORMAT_PING > > OVS_TRAFFIC_VSWITCHD_STOP > > AT_CLEANUP > > > > +AT_SETUP([datapath - bond active-backup failover]) > > +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])]) > > + > > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) > > +AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"]) > > + > > +ADD_NAMESPACES(at_ns0, at_ns1) > > + > > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") > > +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24") > > +on_exit 'ip link del link0a' > > +on_exit 'ip link del link0b' > > +AT_CHECK([ip link add link0a type veth peer name link1a]) > > +AT_CHECK([ip link add link0b type veth peer name link1b]) > > + > > +AT_CHECK([ip link set dev link0a up]) > > +AT_CHECK([ip link set dev link1a up]) > > +AT_CHECK([ip link set dev link0b up]) > > +AT_CHECK([ip link set dev link1b up]) > > + > > +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b > > bond_mode=active-backup]) > > +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b > > bond_mode=active-backup]) > > + > > +for i in `seq 1 3`; do > > Guess this is a leftover of your testing? > > > +dnl Set primary bond member. > > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \ > > + set port bond1 other_config:bond-primary=link1a]) > > + > > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2]) > > + > > +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv > > "100% packet loss"], [0]) > > Here you are fine with some packets being replied to, and below you want all > 12. Is this intended, and if so why? > > Also, 12 pings is quite some time, would 4 pings be enough? It cuts test time > from 16 seconds to 6 seconds.
The last few revisions have been due to the test not being reliable enough, so I wanted something that would be very reliable. But I probably went a bit too far with this change. The first ping isn't impacted by the failover so I didn't think it was important to measure packet loss there. -M > > > +dnl Check correct port is used. > > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq > > "actions:link[[01]]a"], [0]) > > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv > > "actions:link[[01]]b"], [0]) > > + > > +dnl Change primary bond member. > > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \ > > + set port bond1 other_config:bond-primary=link1b]) > > + > > +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 > > received"], [0]) > > + > > +dnl Check correct port is used. > > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv > > "actions:link[[01]]a"], [0]) > > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq > > "actions:link[[01]]b"], [0]) > > +done > > + > > +OVS_TRAFFIC_VSWITCHD_STOP > > +AT_CLEANUP > > FYI, the modified test with 4 pings, and removal of the for loop passed 50 > runs without any failure on my system. > > > AT_SETUP([datapath - ping over vxlan tunnel]) > > OVS_CHECK_TUNNEL_TSO() > > OVS_CHECK_VXLAN() > > -- > > 2.39.3 > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
