On 3/25/20 7:21 PM, Ilya Maximets wrote: > On 3/3/20 1:47 PM, Dumitru Ceara wrote: >> CC-ing Andy and Ben. >> >> On 3/3/20 7:43 AM, Han Zhou wrote: >>> >>> >>> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dce...@redhat.com >>> <mailto:dce...@redhat.com>> wrote: >>>> >>>> On 2/29/20 12:00 AM, Dumitru Ceara wrote: >>>>> If the ovsdb-server reply to "monitor_cond_since" requests has >>>>> "found" == false then ovsdb_idl_db_parse_monitor_reply() calls >>>>> ovsdb_idl_db_clear() which iterates through all tables and >>>>> unconditionally sets table->cond_changed to false. >>>>> >>>>> However, if the client had already set a new condition for some of the >>>>> tables, this new condition request will never be sent to ovsdb-server >>>>> until the condition is reset to a different value. This is due to the >>>>> check in ovsdb_idl_db_set_condition(). >>>>> >>>>> In order to fix this we now also call ovsdb_idl_condition_clear() for >>>>> each table condition in ovsdb_idl_db_clear(). This ensures that >>>>> resetting the condition to the same value as before the >>>>> "monitor_cond_since" reply will trigger a new request. >>>>> >>>>> One way to replicate the issue is described in the bugzilla reporting >>>>> the bug, when ovn-controller is configured to use "ovn-monitor-all": >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 >>>>> >>>>> Reported-by: Dan Williams <d...@redhat.com <mailto:d...@redhat.com>> >>>>> Reported-at: https://bugzilla.redhat.com/1808125 >>>>> CC: Han Zhou <hz...@ovn.org <mailto:hz...@ovn.org>> >>>>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when >>> connection reset.") >>>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com >>> <mailto:dce...@redhat.com>> >>>> >>>> Hi Han, >>>> >>>> I was wondering if you could review this change because ovn-kubernetes >>>> is trying to enable ovn-monitor-all for scalability reasons but their CI >>>> fails due to the bug fixed by the patch below. >>>> >>>> Thanks, >>>> Dumitru >>>> >>>>> --- >>>>> lib/ovsdb-idl.c | 6 +++++- >>>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >>>>> index 190143f..64721ca 100644 >>>>> --- a/lib/ovsdb-idl.c >>>>> +++ b/lib/ovsdb-idl.c >>>>> @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) >>>>> struct ovsdb_idl_table *table = &db->tables[i]; >>>>> struct ovsdb_idl_row *row, *next_row; >>>>> >>>>> - table->cond_changed = false; >>>>> + if (table->cond_changed) { >>>>> + table->cond_changed = false; >>>>> + ovsdb_idl_condition_clear(&table->condition); >>>>> + } >>>>> + >>>>> if (hmap_is_empty(&table->rows)) { >>>>> continue; >>>>> } >>>>> >>>> >>> >>> Hi Dumitru> >>> If I understand the problem, it is when the client updated monitor >>> condition and at the same time it received monitor_cond_since reply from >>> server for it's previous conditional monitor request, and the handling >>> of the reply overwrite the flag table->cond_changed from true to false, >>> and because of this, the new condition is not sent to server. >>> >> Hi Han, >> >> Yes, that's what's happening. >> >>> However, I don't understand the fix. It seems that the fix should be >>> "don't overwrite the flag", instead of clearing the conditions as well. >>> Also, by simply looking at this code I don't understand why the flag >>> needs to be set to false when clearing data in IDL. What would be the >>> problem if we keep it as what it is? If you understand the details, >>> could you explain more about the fix? >>> >>> In addition, I didn't change this logic when implementing >>> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server >>> when connection reset."). In that commit it only reduced the chance of >>> calling ovsdb_idl_db_clear() as an optimization. Before that commit, >>> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and >>> it would result in same problem. >> You're right, my bad, I messed up the "git blame", sorry about that. It >> was actually introduced by: >> >> https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a >> >> I'll fix up the commit message in V2 once we decide on the best approach. >> >> Hi Andy, Ben, >> >> As far as I understand, whenever ovsdb_idl_db_clear() is called during >> reconnect, any buffered but unsent conditions should be cleared because >> they will be anyway resent with the following monitoring condition >> update message after reconnect has completed. >> >> However, clearing the "table->cond_changed" flag is not enough because >> when the IDL client (ovn-controller in my case) will try to set the same >> condition as before (e.g., "true" if ovn-monitor-all=true) >> ovsdb_idl_db_set_condition() will return early: >> >> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 >> >> Therefore in the cases when ovsdb_idl_db_clear() is called, we should >> also clear the existing conditions on the DB tables. >> >> As Han suggested, we could also revert the changes from 5351980b047f >> ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") >> and we wouldn't hit the issue anymore. I'm not sure however if this has >> any other major implications. >
Hi Ilya, Thanks for the review. > If we're clearing existing conditions, we will send empty conditions in > initial monitor_cond after reconnect, AFAIU, i.e. application will stop > receiving any updates. I think that breaks the logic that was behind commit > 5351980b047f. The application that uses idl library will have to set up > conditions from scratch in order to avoid logic changes. I think that > reconnection process should not affect application anyhow and we can't force > application to track all the reconnections and reconfigure conditions. > Ok, makes sense. I was looking at ovn-controller as the standard application that uses the idl library and ovn-controller tracks reconnects and also reconfigures conditions at every iteration (also when reconnects happen). But you're right, this change would force all idl user apps to do the same thing. > So, I think that it's better to revert commit 5351980b047f instead, because > otherwise we will break its logic anyway. And, in the end, this was just a > performance optimization. > Also, after commit 403a6a0cb003 ("ovsdb-idl: Fast resync from server when > connection reset."), ovsdb_idl_db_parse_monitor_reply() doesn't call > ovsdb_idl_db_clear() in most cases, i.e. the performance optimization of > sending same conditions twice might be less important. > Ack, I'll send a v3 that reverts 5351980b047f. Thanks, Dumitru > Best regards, Ilya Maximets. > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev