On 3/25/20 7:21 PM, Ilya Maximets wrote:
> On 3/3/20 1:47 PM, Dumitru Ceara wrote:
>> CC-ing Andy and Ben.
>>
>> On 3/3/20 7:43 AM, Han Zhou wrote:
>>>
>>>
>>> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dce...@redhat.com
>>> <mailto:dce...@redhat.com>> wrote:
>>>>
>>>> On 2/29/20 12:00 AM, Dumitru Ceara wrote:
>>>>> If the ovsdb-server reply to "monitor_cond_since" requests has
>>>>> "found" == false then ovsdb_idl_db_parse_monitor_reply() calls
>>>>> ovsdb_idl_db_clear() which iterates through all tables and
>>>>> unconditionally sets table->cond_changed to false.
>>>>>
>>>>> However, if the client had already set a new condition for some of the
>>>>> tables, this new condition request will never be sent to ovsdb-server
>>>>> until the condition is reset to a different value. This is due to the
>>>>> check in ovsdb_idl_db_set_condition().
>>>>>
>>>>> In order to fix this we now also call ovsdb_idl_condition_clear() for
>>>>> each table condition in ovsdb_idl_db_clear(). This ensures that
>>>>> resetting the condition to the same value as before the
>>>>> "monitor_cond_since" reply will trigger a new request.
>>>>>
>>>>> One way to replicate the issue is described in the bugzilla reporting
>>>>> the bug, when ovn-controller is configured to use "ovn-monitor-all":
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6
>>>>>
>>>>> Reported-by: Dan Williams <d...@redhat.com <mailto:d...@redhat.com>>
>>>>> Reported-at: https://bugzilla.redhat.com/1808125
>>>>> CC: Han Zhou <hz...@ovn.org <mailto:hz...@ovn.org>>
>>>>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when
>>> connection reset.")
>>>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com
>>> <mailto:dce...@redhat.com>>
>>>>
>>>> Hi Han,
>>>>
>>>> I was wondering if you could review this change because ovn-kubernetes
>>>> is trying to enable ovn-monitor-all for scalability reasons but their CI
>>>> fails due to the bug fixed by the patch below.
>>>>
>>>> Thanks,
>>>> Dumitru
>>>>
>>>>> ---
>>>>>  lib/ovsdb-idl.c | 6 +++++-
>>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>>> index 190143f..64721ca 100644
>>>>> --- a/lib/ovsdb-idl.c
>>>>> +++ b/lib/ovsdb-idl.c
>>>>> @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db)
>>>>>          struct ovsdb_idl_table *table = &db->tables[i];
>>>>>          struct ovsdb_idl_row *row, *next_row;
>>>>>
>>>>> -        table->cond_changed = false;
>>>>> +        if (table->cond_changed) {
>>>>> +            table->cond_changed = false;
>>>>> +            ovsdb_idl_condition_clear(&table->condition);
>>>>> +        }
>>>>> +
>>>>>          if (hmap_is_empty(&table->rows)) {
>>>>>              continue;
>>>>>          }
>>>>>
>>>>
>>>
>>> Hi Dumitru>
>>> If I understand the problem, it is when the client updated monitor
>>> condition and at the same time it received monitor_cond_since reply from
>>> server for it's previous conditional monitor request, and the handling
>>> of the reply overwrite the flag table->cond_changed from true to false,
>>> and because of this, the new condition is not sent to server.
>>>
>> Hi Han,
>>
>> Yes, that's what's happening.
>>
>>> However, I don't understand the fix. It seems that the fix should be
>>> "don't overwrite the flag", instead of clearing the conditions as well.
>>> Also, by simply looking at this code I don't understand why the flag
>>> needs to be set to false when clearing data in IDL. What would be the
>>> problem if we keep it as what it is? If you understand the details,
>>> could you explain more about the fix?
>>>
>>> In addition, I didn't change this logic when implementing
>>> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server
>>> when connection reset."). In that commit it only reduced the chance of
>>> calling ovsdb_idl_db_clear() as an optimization. Before that commit,
>>> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and
>>> it would result in same problem.
>> You're right, my bad, I messed up the "git blame", sorry about that. It
>> was actually introduced by:
>>
>> https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a
>>
>> I'll fix up the commit message in V2 once we decide on the best approach.
>>
>> Hi Andy, Ben,
>>
>> As far as I understand, whenever ovsdb_idl_db_clear() is called during
>> reconnect, any buffered but unsent conditions should be cleared because
>> they will be anyway resent with the following monitoring condition
>> update message after reconnect has completed.
>>
>> However, clearing the "table->cond_changed" flag is not enough because
>> when the IDL client (ovn-controller in my case) will try to set the same
>> condition as before (e.g., "true" if ovn-monitor-all=true)
>> ovsdb_idl_db_set_condition() will return early:
>>
>> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518
>>
>> Therefore in the cases when ovsdb_idl_db_clear() is called, we should
>> also clear the existing conditions on the DB tables.
>>
>> As Han suggested, we could also revert the changes from 5351980b047f
>> ("ovsdb-idl: Avoid sending redundant conditional monitoring updates")
>> and we wouldn't hit the issue anymore. I'm not sure however if this has
>> any other major implications.
>

Hi Ilya,

Thanks for the review.

> If we're clearing existing conditions, we will send empty conditions in
> initial monitor_cond after reconnect, AFAIU, i.e. application will stop
> receiving any updates.  I think that breaks the logic that was behind commit
> 5351980b047f.  The application that uses idl library will have to set up
> conditions from scratch in order to avoid logic changes.  I think that
> reconnection process should not affect application anyhow and we can't force
> application to track all the reconnections and reconfigure conditions.
> 
Ok, makes sense. I was looking at ovn-controller as the standard
application that uses the idl library and ovn-controller tracks
reconnects and also reconfigures conditions at every iteration (also
when reconnects happen). But you're right, this change would force all
idl user apps to do the same thing.

> So, I think that it's better to revert commit 5351980b047f instead, because
> otherwise we will break its logic anyway.  And, in the end, this was just a
> performance optimization.
> Also, after commit 403a6a0cb003 ("ovsdb-idl: Fast resync from server when
> connection reset."),  ovsdb_idl_db_parse_monitor_reply() doesn't call
> ovsdb_idl_db_clear() in most cases, i.e. the performance optimization of
> sending same conditions twice might be less important.
>

Ack, I'll send a v3 that reverts 5351980b047f.

Thanks,
Dumitru


> Best regards, Ilya Maximets.
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to