Re: [ovs-dev] [PATCH ovn] utilities: disable OVSDB inactivity probes for non-daemon ovn-nbctl

Vladislav Odintsov Tue, 18 Apr 2023 08:32:04 -0700


> On 18 Apr 2023, at 16:59, Dumitru Ceara <dce...@redhat.com> wrote:
> 
> On 4/14/23 15:37, Vladislav Odintsov wrote:
>> Hi Dumitru,
>> 
>>> On 13 Apr 2023, at 11:55, Dumitru Ceara <dce...@redhat.com> wrote:
>>> 
>>> Hi Vladislav,
>>> 
>>> On 4/3/23 12:08, Ilya Maximets wrote:
>>>> On 3/31/23 17:17, Dumitru Ceara wrote:
>>>>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl
>>>>>> (non-daemon mode), it doesn’t make sense to have client -> server
>>>>>> inactivity probes. If cable is unplugged, will just hit tcp session
>>>>>> timeout, IIUC.
>>>>>> Please correct me if I’m wrong.
>>>>>> 
>>>>> 
>>>>> You're right but the default timeouts are quite large for TCP sessions:
>>>>> 
>>>>> With keepalives default:
>>>>> tcp_keepalive_time - INTEGER
>>>>> How often TCP sends out keepalive messages when keepalive is enabled.
>>>>> Default: 2hours.
>>>>> 
>>>>> tcp_keepalive_probes - INTEGER
>>>>> How many keepalive probes TCP sends out, until it decides that the
>>>>> connection is broken. Default value: 9.
>>>>> 
>>>>> tcp_keepalive_intvl - INTEGER
>>>>> How frequently the probes are send out. Multiplied by
>>>>> tcp_keepalive_probes it is time to kill not responding connection,
>>>>> after probes started. Default value: 75sec i.e. connection
>>>>> will be aborted after ~11 minutes of retries.
>>>>> 
>>>>> DB clients don't even enable tcp keepalives IIRC.
>>>>> 
>>>>> So then we depend retransmits timing out:
>>>>> tcp_retries2 - INTEGER
>>>>> This value influences the timeout of an alive TCP connection,
>>>>> when RTO retransmissions remain unacknowledged.
>>>>> Given a value of N, a hypothetical TCP connection following
>>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would
>>>>> retransmit N times before killing the connection at the (N+1)th RTO.
>>>>> 
>>>>> The default value of 15 yields a hypothetical timeout of 924.6
>>>>> seconds and is a lower bound for the effective timeout.
>>>>> TCP will effectively time out at the first RTO which exceeds the
>>>>> hypothetical timeout.
>>>>> 
>>>>> RFC 1122 recommends at least 100 seconds for the timeout,
>>>>> which corresponds to a value of at least 8.
>>>>> 
>>>>> Isn't it possible that the connection suddenly goes down in between a
>>>>> transaction request and its reply but with all TCP segments being
>>>>> ACKed?
>>>> 
>>>> Right.  If the client already sent the request and only waits for a
>>>> reply,
>>>> the retransmission timeout will not have any effect.  An idle TCP session
>>>> may stay open practically forever, because it is designed to survive
>>>> network
>>>> disruptions.  There is no such thing as a TCP timeout in a general case.
>>>> 
>>> 
>>> I was wondering if you have more thoughts on this?  Should we go for the
>>> approach where OVN sets the probe interval to a "reasonably large"
>>> value by default?  For example, using 30s would mean failure after ~60
>>> seconds of inactivity.
>> 
>> Actually I’m okay with your proposal to set a reasonable high inactivity
>> probe (120/180 seconds?).
>> Maybe it should be not only for non-daemon operation of dbctl but also
>> for all modes?
>> 
> 
> It's probably a good idea, yes.
> 
>> Alternatively, I thought it could be useful to add a new command line
>> option like --inactivity-probe=N for the utilities to allow the user to
>> choose and set interval which is needed in each case.
>> 
>> What do you think about it?
>> 
> 
> Northd uses NB_Global.options:northd_probe_interval as inactivity
> interval.  Can we do the same thing and use as default the high value
> until the first successful connection to the NB?


Good idea, I’ll dig into this next week, thanks.

> 
>>> 
>>> Thanks,
>>> Dumitru
>>> 
>>>>> 
>>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dce...@redhat.com> wrote:
>>>>>>> 
>>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>>>>> Hi Dumitru,
>>>>>>>> 
>>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dce...@redhat.com> wrote:
>>>>>>>>> 
>>>>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov
>>>>>>>>>> <odiv...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms
>>>>>>>>>>> could be not
>>>>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for
>>>>>>>>>>> ovn-*ctl
>>>>>>>>>>> running
>>>>>>>>>>> in non-daemon mode.
>>>>>>>>>>> 
>>>>>>>>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com>
>>>>>>>>>>> ---
>>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>>> 
>>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>>>>   if (daemon_mode) {
>>>>>>>>>>>       server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>>>>   } else {
>>>>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>>>> 
>>>>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>>>> 
>>>>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>>>> 
>>>>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch
>>>>>>>>> that at
>>>>>>>>> apply time if it looks OK to you.
>>>>>>>> 
>>>>>>>> I’ve got no objections here.
>>>>>>>> Small nit: set_idl_probe_interval function needs also a remote.
>>>>>>>> Like this:
>>>>>>>> 
>>>>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>>>> 
>>>>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>>>> 
>>>>>>> 
>>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>>>>> this probe interval to a very high value instead?  That's for the case
>>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies
>>>>>>> because of
>>>>>>> for example cable being unplugged somewhere on the way between the
>>>>>>> two.
>>>>>>> 
>>>>>>> [0]
>>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> +
>>>>>>>>>>>       struct ctl_command *commands;
>>>>>>>>>>>       size_t n_commands;
>>>>>>>>>>>       char *error;
>>>>>>>>>>> --
>>>>>>>>>>> 2.36.1
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dev mailing list
>>>>>>>>>>> d...@openvswitch.org
>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> Looks good to me, thanks.
>>>>>>>>>> 
>>>>>>>>>> Reviewed-by: Ales Musil <amu...@redhat.com>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable
>>>>>>>>> branches
>>>>>>>>> too, what do you think?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Dumitru
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Vladislav Odintsov
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> d...@openvswitch.org <mailto:d...@openvswitch.org>
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Vladislav Odintsov
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> d...@openvswitch.org
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>> 
>>> 
>> 
>> 
>> Regards,
>> Vladislav Odintsov


Regards,
Vladislav Odintsov

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] utilities: disable OVSDB inactivity probes for non-daemon ovn-nbctl

Reply via email to