On Tue, Mar 03, 2020 at 01:47:48PM +0100, Dumitru Ceara wrote: > CC-ing Andy and Ben. > > On 3/3/20 7:43 AM, Han Zhou wrote: > > > > > > On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dce...@redhat.com > > <mailto:dce...@redhat.com>> wrote: > >> > >> On 2/29/20 12:00 AM, Dumitru Ceara wrote: > >> > If the ovsdb-server reply to "monitor_cond_since" requests has > >> > "found" == false then ovsdb_idl_db_parse_monitor_reply() calls > >> > ovsdb_idl_db_clear() which iterates through all tables and > >> > unconditionally sets table->cond_changed to false. > >> > > >> > However, if the client had already set a new condition for some of the > >> > tables, this new condition request will never be sent to ovsdb-server > >> > until the condition is reset to a different value. This is due to the > >> > check in ovsdb_idl_db_set_condition(). > >> > > >> > In order to fix this we now also call ovsdb_idl_condition_clear() for > >> > each table condition in ovsdb_idl_db_clear(). This ensures that > >> > resetting the condition to the same value as before the > >> > "monitor_cond_since" reply will trigger a new request. > >> > > >> > One way to replicate the issue is described in the bugzilla reporting > >> > the bug, when ovn-controller is configured to use "ovn-monitor-all": > >> > https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 > >> > > >> > Reported-by: Dan Williams <d...@redhat.com <mailto:d...@redhat.com>> > >> > Reported-at: https://bugzilla.redhat.com/1808125 > >> > CC: Han Zhou <hz...@ovn.org <mailto:hz...@ovn.org>> > >> > Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when > > connection reset.") > >> > Signed-off-by: Dumitru Ceara <dce...@redhat.com > > <mailto:dce...@redhat.com>> > >> > >> Hi Han, > >> > >> I was wondering if you could review this change because ovn-kubernetes > >> is trying to enable ovn-monitor-all for scalability reasons but their CI > >> fails due to the bug fixed by the patch below. > >> > >> Thanks, > >> Dumitru > >> > >> > --- > >> > lib/ovsdb-idl.c | 6 +++++- > >> > 1 file changed, 5 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > >> > index 190143f..64721ca 100644 > >> > --- a/lib/ovsdb-idl.c > >> > +++ b/lib/ovsdb-idl.c > >> > @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) > >> > struct ovsdb_idl_table *table = &db->tables[i]; > >> > struct ovsdb_idl_row *row, *next_row; > >> > > >> > - table->cond_changed = false; > >> > + if (table->cond_changed) { > >> > + table->cond_changed = false; > >> > + ovsdb_idl_condition_clear(&table->condition); > >> > + } > >> > + > >> > if (hmap_is_empty(&table->rows)) { > >> > continue; > >> > } > >> > > >> > > > > Hi Dumitru> > > If I understand the problem, it is when the client updated monitor > > condition and at the same time it received monitor_cond_since reply from > > server for it's previous conditional monitor request, and the handling > > of the reply overwrite the flag table->cond_changed from true to false, > > and because of this, the new condition is not sent to server. > > > Hi Han, > > Yes, that's what's happening. > > > However, I don't understand the fix. It seems that the fix should be > > "don't overwrite the flag", instead of clearing the conditions as well. > > Also, by simply looking at this code I don't understand why the flag > > needs to be set to false when clearing data in IDL. What would be the > > problem if we keep it as what it is? If you understand the details, > > could you explain more about the fix? > > > > In addition, I didn't change this logic when implementing > > monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server > > when connection reset."). In that commit it only reduced the chance of > > calling ovsdb_idl_db_clear() as an optimization. Before that commit, > > ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and > > it would result in same problem. > You're right, my bad, I messed up the "git blame", sorry about that. It > was actually introduced by: > > https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a > > I'll fix up the commit message in V2 once we decide on the best approach. > > Hi Andy, Ben, > > As far as I understand, whenever ovsdb_idl_db_clear() is called during > reconnect, any buffered but unsent conditions should be cleared because > they will be anyway resent with the following monitoring condition > update message after reconnect has completed. > > However, clearing the "table->cond_changed" flag is not enough because > when the IDL client (ovn-controller in my case) will try to set the same > condition as before (e.g., "true" if ovn-monitor-all=true) > ovsdb_idl_db_set_condition() will return early: > > https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 > > Therefore in the cases when ovsdb_idl_db_clear() is called, we should > also clear the existing conditions on the DB tables. > > As Han suggested, we could also revert the changes from 5351980b047f > ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") > and we wouldn't hit the issue anymore. I'm not sure however if this has > any other major implications. > > What do you think?
I honestly don't understand what's going on here well enough at the moment. There's a lot of context. If necessary, I can try to untangle it, but I'd rather leave it to the two of you if you can figure it out. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev