> On 18 Apr 2023, at 16:59, Dumitru Ceara <dce...@redhat.com> wrote: > > On 4/14/23 15:37, Vladislav Odintsov wrote: >> Hi Dumitru, >> >>> On 13 Apr 2023, at 11:55, Dumitru Ceara <dce...@redhat.com> wrote: >>> >>> Hi Vladislav, >>> >>> On 4/3/23 12:08, Ilya Maximets wrote: >>>> On 3/31/23 17:17, Dumitru Ceara wrote: >>>>> On 3/31/23 16:51, Vladislav Odintsov wrote: >>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl >>>>>> (non-daemon mode), it doesn’t make sense to have client -> server >>>>>> inactivity probes. If cable is unplugged, will just hit tcp session >>>>>> timeout, IIUC. >>>>>> Please correct me if I’m wrong. >>>>>> >>>>> >>>>> You're right but the default timeouts are quite large for TCP sessions: >>>>> >>>>> With keepalives default: >>>>> tcp_keepalive_time - INTEGER >>>>> How often TCP sends out keepalive messages when keepalive is enabled. >>>>> Default: 2hours. >>>>> >>>>> tcp_keepalive_probes - INTEGER >>>>> How many keepalive probes TCP sends out, until it decides that the >>>>> connection is broken. Default value: 9. >>>>> >>>>> tcp_keepalive_intvl - INTEGER >>>>> How frequently the probes are send out. Multiplied by >>>>> tcp_keepalive_probes it is time to kill not responding connection, >>>>> after probes started. Default value: 75sec i.e. connection >>>>> will be aborted after ~11 minutes of retries. >>>>> >>>>> DB clients don't even enable tcp keepalives IIRC. >>>>> >>>>> So then we depend retransmits timing out: >>>>> tcp_retries2 - INTEGER >>>>> This value influences the timeout of an alive TCP connection, >>>>> when RTO retransmissions remain unacknowledged. >>>>> Given a value of N, a hypothetical TCP connection following >>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would >>>>> retransmit N times before killing the connection at the (N+1)th RTO. >>>>> >>>>> The default value of 15 yields a hypothetical timeout of 924.6 >>>>> seconds and is a lower bound for the effective timeout. >>>>> TCP will effectively time out at the first RTO which exceeds the >>>>> hypothetical timeout. >>>>> >>>>> RFC 1122 recommends at least 100 seconds for the timeout, >>>>> which corresponds to a value of at least 8. >>>>> >>>>> Isn't it possible that the connection suddenly goes down in between a >>>>> transaction request and its reply but with all TCP segments being >>>>> ACKed? >>>> >>>> Right. If the client already sent the request and only waits for a >>>> reply, >>>> the retransmission timeout will not have any effect. An idle TCP session >>>> may stay open practically forever, because it is designed to survive >>>> network >>>> disruptions. There is no such thing as a TCP timeout in a general case. >>>> >>> >>> I was wondering if you have more thoughts on this? Should we go for the >>> approach where OVN sets the probe interval to a "reasonably large" >>> value by default? For example, using 30s would mean failure after ~60 >>> seconds of inactivity. >> >> Actually I’m okay with your proposal to set a reasonable high inactivity >> probe (120/180 seconds?). >> Maybe it should be not only for non-daemon operation of dbctl but also >> for all modes? >> > > It's probably a good idea, yes. > >> Alternatively, I thought it could be useful to add a new command line >> option like --inactivity-probe=N for the utilities to allow the user to >> choose and set interval which is needed in each case. >> >> What do you think about it? >> > > Northd uses NB_Global.options:northd_probe_interval as inactivity > interval. Can we do the same thing and use as default the high value > until the first successful connection to the NB?
Good idea, I’ll dig into this next week, thanks. > >>> >>> Thanks, >>> Dumitru >>> >>>>> >>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dce...@redhat.com> wrote: >>>>>>> >>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>>>>>> Hi Dumitru, >>>>>>>> >>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dce...@redhat.com> wrote: >>>>>>>>> >>>>>>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov >>>>>>>>>> <odiv...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms >>>>>>>>>>> could be not >>>>>>>>>>> sufficient. This patch disables OVSDB inactivity probes for >>>>>>>>>>> ovn-*ctl >>>>>>>>>>> running >>>>>>>>>>> in non-daemon mode. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com> >>>>>>>>>>> --- >>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>>>>>> if (daemon_mode) { >>>>>>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>>>>>> } else { >>>>>>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>>>>>> >>>>>>>>> I think I'd avoid using the idl function directly and call instead: >>>>>>>>> >>>>>>>>> set_idl_probe_interval(idl, 0); >>>>>>>>> >>>>>>>>> Just to keep it aligned with all other uses in OVN. I can patch >>>>>>>>> that at >>>>>>>>> apply time if it looks OK to you. >>>>>>>> >>>>>>>> I’ve got no objections here. >>>>>>>> Small nit: set_idl_probe_interval function needs also a remote. >>>>>>>> Like this: >>>>>>>> >>>>>>>> set_idl_probe_interval(idl, db, 0); >>>>>>>> >>>>>>>> Also, please correct typo in commit message: defatult -> default. >>>>>>>> >>>>>>> >>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set >>>>>>> this probe interval to a very high value instead? That's for the case >>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies >>>>>>> because of >>>>>>> for example cable being unplugged somewhere on the way between the >>>>>>> two. >>>>>>> >>>>>>> [0] >>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>>>>>> >>>>>>>>> >>>>>>>>>>> + >>>>>>>>>>> struct ctl_command *commands; >>>>>>>>>>> size_t n_commands; >>>>>>>>>>> char *error; >>>>>>>>>>> -- >>>>>>>>>>> 2.36.1 >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> dev mailing list >>>>>>>>>>> d...@openvswitch.org >>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Looks good to me, thanks. >>>>>>>>>> >>>>>>>>>> Reviewed-by: Ales Musil <amu...@redhat.com> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable >>>>>>>>> branches >>>>>>>>> too, what do you think? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dumitru >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vladislav Odintsov >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> d...@openvswitch.org <mailto:d...@openvswitch.org> >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> >>>>>> >>>>>> Regards, >>>>>> Vladislav Odintsov >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> d...@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>> >> >> >> Regards, >> Vladislav Odintsov Regards, Vladislav Odintsov _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev