On 19 Oct 2023, at 15:00, Eelco Chaudron wrote:

> On 19 Oct 2023, at 4:37, Mike Pattrick wrote:
>
>> Currently a bond will not always revalidate when an active member
>> changes. This can result in counter-intuitive behaviors like the fact
>> that using ovs-appctl bond/set-active-member will cause the bond to
>> revalidate but changing other_config:bond-primary will not trigger a
>> revalidate in the bond.
>>
>> When revalidation is not set but the active member changes in an
>> unbalanced bond, OVS may send traffic out of previously active member
>> instead of the new active member.
>>
>> This change will always mark unbalanced bonds for revalidation if the
>> active member changes.
>
> Thanks for fixing my comments on V3, some more comments on the tests, and the 
> removed annotation.
>
> //Eelco
>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
>> Signed-off-by: Mike Pattrick <[email protected]>
>> ---
>> v2: Added a test
>> v3: Made the test more reliable
>> v4: Made test much more reliable
>> ---
>>  ofproto/bond.c          |  8 +++++--
>>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 56 insertions(+), 2 deletions(-)
>>
>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>> index cfdf44f85..fb108d30a 100644
>> --- a/ofproto/bond.c
>> +++ b/ofproto/bond.c
>> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond 
>> *, bool force)
>>  static bool bond_is_falling_back_to_ab(const struct bond *);
>>  static void bond_add_lb_output_buckets(const struct bond *);
>>  static void bond_del_lb_output_buckets(const struct bond *);
>> +static bool bond_is_balanced(const struct bond *bond) 
>> OVS_REQ_RDLOCK(rwlock);
>>
>>
>>  /* Attempts to parse 's' as the name of a bond balancing mode.  If 
>> successful,
>> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const 
>> struct eth_addr mac)
>>
>>  static void
>>  bond_active_member_changed(struct bond *bond)
>> +    OVS_REQ_WRLOCK(rwlock)
>>  {
>>      if (bond->active_member) {
>>          struct eth_addr mac;
>>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>>          bond->active_member_mac = mac;
>> +        if (!bond_is_balanced(bond)) {
>> +            bond->bond_revalidate = true;
>> +        }
>>      } else {
>>          bond->active_member_mac = eth_addr_zero;
>>      }
>> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, 
>> uint32_t *recirc_id,
>>  /* Rebalancing. */
>>
>>  static bool
>> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
>> +bond_is_balanced(const struct bond *bond)
>
> See the other email, but I think we should re-add the annotation as there 
> might be other (new) callers of this function that need protection from 
> calling this without the readlock.

Ignore this comment, you are right anotating it at declaration is enough. I 
thought it failed for me in the past :(

//Eelco


>>  {
>>      return bond->rebalance_interval
>>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
>> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn 
>> *conn,
>>      }
>>
>>      if (bond->active_member != member) {
>> -        bond->bond_revalidate = true;
>>          bond->active_member = member;
>>          VLOG_INFO("bond %s: active member is now %s",
>>                    bond->name, member->name);
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 945037ec0..52c233be9 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 
>> 2 10.1.1.2 | FORMAT_PING
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>
>> +AT_SETUP([datapath - bond active-backup failover])
>> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>> +AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
>> +on_exit 'ip link del link0a'
>> +on_exit 'ip link del link0b'
>> +AT_CHECK([ip link add link0a type veth peer name link1a])
>> +AT_CHECK([ip link add link0b type veth peer name link1b])
>> +
>> +AT_CHECK([ip link set dev link0a up])
>> +AT_CHECK([ip link set dev link1a up])
>> +AT_CHECK([ip link set dev link0b up])
>> +AT_CHECK([ip link set dev link1b up])
>> +
>> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b 
>> bond_mode=active-backup])
>> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b 
>> bond_mode=active-backup])
>> +
>> +for i in `seq 1 3`; do
>
> Guess this is a leftover of your testing?
>
>> +dnl Set primary bond member.
>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
>> +                    set port bond1 other_config:bond-primary=link1a])
>> +
>> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
>> +
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv 
>> "100% packet loss"], [0])
>
> Here you are fine with some packets being replied to, and below you want all 
> 12. Is this intended, and if so why?
>
> Also, 12 pings is quite some time, would 4 pings be enough? It cuts test time 
> from 16 seconds to 6 seconds.
>
>> +dnl Check correct port is used.
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]a"], 
>> [0])
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv 
>> "actions:link[[01]]b"], [0])
>> +
>> +dnl Change primary bond member.
>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
>> +                    set port bond1 other_config:bond-primary=link1b])
>> +
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 
>> received"], [0])
>> +
>> +dnl Check correct port is used.
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv 
>> "actions:link[[01]]a"], [0])
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]b"], 
>> [0])
>> +done
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>
> FYI, the modified test with 4 pings, and removal of the for loop passed 50 
> runs without any failure on my system.
>
>>  AT_SETUP([datapath - ping over vxlan tunnel])
>>  OVS_CHECK_TUNNEL_TSO()
>>  OVS_CHECK_VXLAN()
>> -- 
>> 2.39.3
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to