On 3/5/20 9:50 PM, Ben Pfaff wrote: > On Tue, Mar 03, 2020 at 01:47:48PM +0100, Dumitru Ceara wrote: >> CC-ing Andy and Ben. >> >> On 3/3/20 7:43 AM, Han Zhou wrote: >>> >>> >>> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dce...@redhat.com >>> <mailto:dce...@redhat.com>> wrote: >>>> >>>> On 2/29/20 12:00 AM, Dumitru Ceara wrote: >>>>> If the ovsdb-server reply to "monitor_cond_since" requests has >>>>> "found" == false then ovsdb_idl_db_parse_monitor_reply() calls >>>>> ovsdb_idl_db_clear() which iterates through all tables and >>>>> unconditionally sets table->cond_changed to false. >>>>> >>>>> However, if the client had already set a new condition for some of the >>>>> tables, this new condition request will never be sent to ovsdb-server >>>>> until the condition is reset to a different value. This is due to the >>>>> check in ovsdb_idl_db_set_condition(). >>>>> >>>>> In order to fix this we now also call ovsdb_idl_condition_clear() for >>>>> each table condition in ovsdb_idl_db_clear(). This ensures that >>>>> resetting the condition to the same value as before the >>>>> "monitor_cond_since" reply will trigger a new request. >>>>> >>>>> One way to replicate the issue is described in the bugzilla reporting >>>>> the bug, when ovn-controller is configured to use "ovn-monitor-all": >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 >>>>> >>>>> Reported-by: Dan Williams <d...@redhat.com <mailto:d...@redhat.com>> >>>>> Reported-at: https://bugzilla.redhat.com/1808125 >>>>> CC: Han Zhou <hz...@ovn.org <mailto:hz...@ovn.org>> >>>>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when >>> connection reset.") >>>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com >>> <mailto:dce...@redhat.com>> >>>> >>>> Hi Han, >>>> >>>> I was wondering if you could review this change because ovn-kubernetes >>>> is trying to enable ovn-monitor-all for scalability reasons but their CI >>>> fails due to the bug fixed by the patch below. >>>> >>>> Thanks, >>>> Dumitru >>>> >>>>> --- >>>>> lib/ovsdb-idl.c | 6 +++++- >>>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >>>>> index 190143f..64721ca 100644 >>>>> --- a/lib/ovsdb-idl.c >>>>> +++ b/lib/ovsdb-idl.c >>>>> @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) >>>>> struct ovsdb_idl_table *table = &db->tables[i]; >>>>> struct ovsdb_idl_row *row, *next_row; >>>>> >>>>> - table->cond_changed = false; >>>>> + if (table->cond_changed) { >>>>> + table->cond_changed = false; >>>>> + ovsdb_idl_condition_clear(&table->condition); >>>>> + } >>>>> + >>>>> if (hmap_is_empty(&table->rows)) { >>>>> continue; >>>>> } >>>>> >>>> >>> >>> Hi Dumitru> >>> If I understand the problem, it is when the client updated monitor >>> condition and at the same time it received monitor_cond_since reply from >>> server for it's previous conditional monitor request, and the handling >>> of the reply overwrite the flag table->cond_changed from true to false, >>> and because of this, the new condition is not sent to server. >>> >> Hi Han, >> >> Yes, that's what's happening. >> >>> However, I don't understand the fix. It seems that the fix should be >>> "don't overwrite the flag", instead of clearing the conditions as well. >>> Also, by simply looking at this code I don't understand why the flag >>> needs to be set to false when clearing data in IDL. What would be the >>> problem if we keep it as what it is? If you understand the details, >>> could you explain more about the fix? >>> >>> In addition, I didn't change this logic when implementing >>> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server >>> when connection reset."). In that commit it only reduced the chance of >>> calling ovsdb_idl_db_clear() as an optimization. Before that commit, >>> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and >>> it would result in same problem. >> You're right, my bad, I messed up the "git blame", sorry about that. It >> was actually introduced by: >> >> https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a >> >> I'll fix up the commit message in V2 once we decide on the best approach. >> >> Hi Andy, Ben, >> >> As far as I understand, whenever ovsdb_idl_db_clear() is called during >> reconnect, any buffered but unsent conditions should be cleared because >> they will be anyway resent with the following monitoring condition >> update message after reconnect has completed. >> >> However, clearing the "table->cond_changed" flag is not enough because >> when the IDL client (ovn-controller in my case) will try to set the same >> condition as before (e.g., "true" if ovn-monitor-all=true) >> ovsdb_idl_db_set_condition() will return early: >> >> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 >> >> Therefore in the cases when ovsdb_idl_db_clear() is called, we should >> also clear the existing conditions on the DB tables. >> >> As Han suggested, we could also revert the changes from 5351980b047f >> ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") >> and we wouldn't hit the issue anymore. I'm not sure however if this has >> any other major implications. >> >> What do you think? > > I honestly don't understand what's going on here well enough at the > moment. There's a lot of context. If necessary, I can try to untangle > it, but I'd rather leave it to the two of you if you can figure it out. >
Hi Han, I just sent a v2 of the patch. The code changes are the same as in v1, just that I updated the "Fixes" tag to reflect the correct commit that introduced the issue: https://patchwork.ozlabs.org/patch/1252278/ It would be great if you could help review this new version of the patch. Thanks, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev