Ilya Maximets <i.maxim...@ovn.org> writes: > On 8/1/25 4:26 AM, chenyongch...@cestc.cn wrote: >> Hi all: >> Thank you for your attention to this issue. >> >> Initially, we noticed that bond_shift_load triggered >> bond->bond_revalidate = true, which then led to bond_reconfigure >> -> bond_entry_reset being executed. This ultimately caused >> USERSPACE_INVALID_PORT_DROP to occur. >> We believe that no packet loss should happen during traffic load >> balancing. >> >> Later, we discovered that changes to the >> bond-rebalance-interval setting could also lead to the same issue, >> which is why we reported the problem. >> >> In fact, we are more concerned about the packet drops caused >> by bond_shift_load. If the packet loss due to >> bond-rebalance-interval changes is truly unavoidable, we can accept >> that. >> Apologies for not describing the issue clearly enough earlier. > > No worries. Thanks for the clarification! > >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> chenyongch...@cestc.cn >> >> >> *发件人:* Ilya Maximets <mailto:i.maxim...@ovn.org> >> *发送时间:* 2025-08-01 04:48 >> *收件人:* Aaron Conole <mailto:acon...@redhat.com> >> *抄送:* i.maximets <mailto:i.maxim...@ovn.org>; dev >> <mailto:d...@openvswitch.org>; mhou <mailto:m...@redhat.com>; >> chenyongchang <mailto:chenyongch...@cestc.cn>; Adrian Moreno >> <mailto:amore...@redhat.com> >> *主题:* Re: [ovs-dev] [PATCH] bond: Do not flag a rebalance >> when adjusting time.【请注意,邮件由i.maximets....@gmail.com代发】 >> On 7/31/25 8:41 PM, Aaron Conole wrote: >> > Ilya Maximets <i.maxim...@ovn.org> writes: >> > >> >> On 7/30/25 4:47 PM, Aaron Conole via dev wrote: >> >>> When adjusting bond parameters, any adjustment is considered >> sufficient >> >>> for triggering a rebalance. This is a very simplistic config update >> >>> scheme that triggers complete rebalancing even if the time adjustment >> >>> would move the next expiration out beyond the last calculated >> expiration. >> >>> >> >>> For the interval parameter only, we can simply recalculate the expiry >> >>> deadline and let the next bond_run() event do the rebalance if >> needed. >> >>> Even if the recalculation would cause the deadline to have occurred >> in >> >>> the past, it should execute on the next bond_run() anyway. This is >> >>> still okay, as the rebalance interval timeout may not result in a >> >>> full rebalance anyway. >> >>> >> >>> Reported-at: >> https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg10409.html >> >>> Signed-off-by: Aaron Conole <acon...@redhat.com> >> >>> --- >> >>> ofproto/bond.c | 8 +++++++- >> >>> 1 file changed, 7 insertions(+), 1 deletion(-) >> >>> >> >>> diff --git a/ofproto/bond.c b/ofproto/bond.c >> >>> index 3859ddca08..86e21607e5 100644 >> >>> --- a/ofproto/bond.c >> >>> +++ b/ofproto/bond.c >> >>> @@ -459,8 +459,14 @@ bond_reconfigure(struct bond *bond, const >> struct bond_settings *s) >> >>> } >> >>> >> >>> if (bond->rebalance_interval != s->rebalance_interval) { >> >>> + /* Recompute the next rebalance interval by moving the >> next_rebalance >> >>> + * to be offset by the new interval. Then let the >> rebalance code >> >>> + * trigger a rebalance based on the new details. >> >> >> >> This sentence doesn't seem necessary. >> > >> > Removed. >> > >> >>> In this case, if >> >>> + * all that was updated is the rebalance interval, we can >> skip >> >>> + * triggering the rest of the port reconfigure mechanism. */ >> >> >> >> s/reconfigure mechanism/reconfiguration/ >> > >> > Ack. >> > >> >>> + int old_start_time = bond->next_rebalance - >> bond->rebalance_interval; >> >>> bond->rebalance_interval = s->rebalance_interval; >> >>> - revalidate = true; >> >>> + bond->next_rebalance = old_start_time + >> bond->rebalance_interval; >> >>> } >> >> >> >> We should likely still revalidate when the value goes to or from zero, >> >> because that impacts if the bond considered balanced or not. Even if >> >> that doesn't cause any problems today, it will inevitably be forgotten >> >> in the future. >> > >> > That makes sense to me - I'll add a case for it. >> > >> >> Also, the subtraction may technically overflow if the next_rebalance >> is >> >> not initialized. E.g. if it is not initialized yet when the bond is >> >> going from active-backup to balance-tcp. Which may be fine, given >> those >> >> are signed integers, but still doesn't feel right. >> > >> > I'm trying to envision a scenario that would make this happen. For >> > example, it gets reset when going from ab -> balance-{tcp,slb}. And >> > that will be reconfigured later in this very code path as well (since >> we >> > will call the bond_entry_reset if the config got changed for any config >> > option other than the interval). I can put a guard on !next_rebalance >> > and tell it to trigger revalidation then (since the very first >> execution >> > or something should be initialized to '0' and obviously that could >> > probably find a trigger). But still struggling to see a place where >> > changing bond mode won't also force the whole thing to be reinitialized >> > in bond_entry_reset() since revalidate will be 'true' via other means. >> >> It doesn't cause any issues, I agree, but it just didn't feel right. :) >> Feel free to keep this code as-is. >> >> > >> >> And this patch needs a Fixes tag. >> > >> > I didn't put one because I'm not sure if it's really a bugfix or an >> > improvement. If you consider it one, I'll add the fixes tag. >> >> Looking closer to the code, there are two separate issues here: >> >> 1. Traffic is re-shuffled among bond members on the rebalance interval >> change. >> >> 2. Traffic is being dropped for some time after the bond is reconfigured >> until the next bond_run(). This happens if other config options >> are changed as well. >> >> And, I think, the latter is what the original reporter was concerned >> about. >> This is a behavior introduced in: >> >> 586adfd047cb ("bond: Avoid deadlock while updating post recirculation >> rules.") >> >> Before that change we would not update post-recirc rules while the >> configuration is cleared and not re-constructed yet. But after the >> 586adfd047cb fix, we're now calling the update after we zeroed out the >> configuration, effectively just removing all the post recirculation rules >> or the lb-output mappings. >> >> Saying that, maybe the real fix is to use bond_update_post_recirc_rules__ >> with force instead of direct call to update_recirc_rules(). That will >> ensure that the configuration is not blank and we actually have real >> post-recirc rules / lb-output mapping at all times. I do not remember >> why I went with the direct call in that fix though. >> >> What do you think?
It looks reasonable to me, but I will submit that as a different change with a proper fixes tag. >> Not re-setting the config on timeout change is still needed, as that >> avoids re-shuffling the traffic among bond members (issue 1). >> >> The behavior in point 1 was always this way, AFAIU, so that can be >> treated >> as just an enhancement, I suppose. >> >> Best regards, Ilya Maximets. >> >> >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev