On Wed, Aug 10, 2022 at 6:34 AM Mark Michelson <[email protected]
<mailto:[email protected]>> wrote:
>
> On 8/9/22 16:12, Han Zhou wrote:
> >
> >
> > On Tue, Aug 9, 2022 at 12:57 PM Han Zhou <[email protected]
<mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > >
> > >
> > >
> > > On Tue, Aug 9, 2022 at 12:53 PM Mark Michelson
<[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > > >
> > > > On 8/9/22 15:12, Ilya Maximets wrote:
> > > > > On 8/9/22 20:59, Mark Michelson wrote:
> > > > >> On 8/8/22 09:31, Ilya Maximets wrote:
> > > > >>> On 8/5/22 08:28, Han Zhou wrote:
> > > > >>>> On Thu, Jun 30, 2022 at 12:55 PM Mark Michelson
> > <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
> > > > >>>>>
> > > > >>>>> Hi OVS and OVN devs,
> > > > >>>>>
> > > > >>>>> The OVN team has considered the idea of moving inactivity
> > probes (i.e.
> > > > >>>>> OVSDB echo requests/replies) into a background thread.
> > > > >>>>>
> > > > >>>>> OVN logical networks can be very large, meaning that ovn
> > components such
> > > > >>>>> as ovn-northd and ovn-controller may take a while to process
> > everything
> > > > >>>>> in an OVSDB database. On large clusters, we end up
seeing the
> > following
> > > > >>>>> loop occur:
> > > > >>>>>
> > > > >>>>> 1. The OVN component connects to the database.
> > > > >>>>> 2. The OVN component must compute the entire contents of
> > database.
> > > > >>>>> 3. While the OVN component is executing its main loop, the
> > inactivity
> > > > >>>>> probe interval expires. The OVN component is disconnected
> > from database.
> > > > >>>>> 4. The OVN component finishes its computation.
> > > > >>>>> 5. Since the OVN component is disconnected from the
database,
> > it must
> > > > >>>>> reconnect. Go to step 1.
> > > > >>>>>
> > > > >>>>> This makes for an unstable and slow experience. Typically if
> > OVN can get
> > > > >>>>> past the initial loop after connecting to the database, then
> > incremental
> > > > >>>>> processing will allow for subsequent loops to execute much
> > more quickly.
> > > > >>>>> However, the constant disconnect-reconnect makes OVN operate
> > at its
> > > > >>>>> slowest at all times.
> > > > >>>>>
> > > > >>>>> The way we've dealt with this before is to try to
optimize the
> > > > >>>>> performance of OVN components, while also advising that the
> > inactivity
> > > > >>>>> probe gets set to a high value. The problem is that as the
> > demand for
> > > > >>>>> larger and larger logical networks grows, the execution time
> > of OVN is
> > > > >>>>> hard to bring down much more, but the inactivity probes have
> > to keep
> > > > >>>>> getting higher and higher to avoid the described scenario.
> > Once OVN
> > > > >>>>> reaches its "stable" state where incremental processing
makes
> > loops
> > > > >>>>> execute quickly, this high inactivity probe becomes
> > detrimental. It
> > > > >>>>> means that if there is a legitimate disconnection, then we
> > don't detect
> > > > >>>>> it very quickly.
> > > > >>>>>
> > > > >>>> Hi Mark,
> > > > >>>>
> > > > >>>> Sorry for the late reply.
> > > > >>>> For the slowness of ovn-northd/ovn-controller, our
practice is
> > to disable
> > > > >>>> the inactivity probe from DB server side. It seems harmless
> > because if the
> > > > >>>> client network is recovered it would just reconnect, right?
> > > > >>>> The problem is more with the other direction: probe from
> > client to server.
> > > > >>>> This probe is required because if a client (e.g.
> > ovn-controller) doesn't
> > > > >>>> need to send any transactions to the server for a long time
> > then it would
> > > > >>>> not detect a server crash sooner, thus would not trigger
> > reconnecting to
> > > > >>>> another server in the cluster in time.
> > > > >>>> With the client -> server probe enabled, there is a scale
> > problem with the
> > > > >>>> SB server, if it is connected with a huge number of clients
> > and if the
> > > > >>>> probe interval is not high enough, because when it is busy
> > serving the
> > > > >>>> clients it may fail responding the probes in time, causing
> > some clients
> > > > >>>> reconnecting and re-transmitting data, a cascaded failure.
> > > > >>>> So for the background thread approach may appear to be
helpful
> > for the
> > > > >>>> server side.
> > > > >>>> However, there is already an alternative to the server side
> > scale problem
> > > > >>>> (I haven't tried yet), the OVSDB relay, which can reduce the
> > number of
> > > > >>>> clients per server to a very low number. In this case, a
short
> > probe should
> > > > >>>> not matter.
> > > > >>>> So based on the above thoughts, I am not sure if it is really
> > necessary to
> > > > >>>> have the background probe handling.
> > > > >>>>
> > > > >>>>> As mentioned at the top of the email, a possible solution is
> > to put the
> > > > >>>>> inactivity probes into a background thread. Is this in the
> > spirit of the
> > > > >>>>> inactivity probe? From my point of view, the inactivity
probe
> > should
> > > > >>>>> fail only in a serious error condition, such as a network
> > outage, or a
> > > > >>>>> program crash. If a program is "busy" it is still "active"
> > and should
> > > > >>>>> therefore not be subject to inactivity probe failures.
> > However, I want
> > > > >>>>> to get the opinions of the list on this.
> > > > >>>>
> > > > >>>> This is indeed a controversial point. If a program is "busy"
> > but "active",
> > > > >>>> I agree it shouldn't be subject to probe failures.
However, if
> > the program
> > > > >>>> is not responsive at all, due to bugs, e.g. the main
thread is
> > in a dead
> > > > >>>> loop (although it is not very likely to happen in OVN
> > components), should
> > > > >>>> the probe fail? I think the answer is yes in this case,
> > considering the
> > > > >>>> case when a SB server node is not responsive at all
because of
> > bugs while
> > > > >>>> the background thread is still responding to probes, the
> > client wouldn't
> > > > >>>> notice the problem and would not reconnect to a healthy node,
> > which defies
> > > > >>>> the purpose of the probe.
> > > > >>>
> > > > >>> I think, there is one more thing we should keep in mind -
plain
> > old TCP
> > > > >>> keepalive functionality. Instead of implementing some special
> > background
> > > > >>> threads, users can just disable all inactivity probes and
> > LD_PRELOAD
> > > > >>> keepalive library with desired configuration that will be
> > automatically
> > > > >>> applied to all connections and it will be kernel's
> > responsibility to
> > > > >>> handle probes. IIUC, that should cover all the same cases
as a
> > background
> > > > >>> thread, and will have the same drawbacks of not detecting the
> > process being
> > > > >>> stuck in an infinite loop.
> > > > >>>
> > > > >>> Best regards, Ilya Maximets.
> > > > >>>
> > > > >>
> > > > >> A long time ago (July 2020), Anton Ivanov provided a patch that
> > enabled keepalives at the kernel level. This is the latest version of
> > the patch I could find:
> >
https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371262.html
<https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371262.html>
> >
<https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371262.html
<https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371262.html>>
> > > > >>
> > > > >> I had some findings on that particular patch, and AFAIK, there
> > was never a follow-up patch sent (although I may have missed it when
> > searching).
> > > > >>
> > > > >> Would something like that be a viable alternative?
> > > > >
> > > > > The beauty of this is that you don't need to change the code!
> > > > > Just install libkeepalive in you favorite distribution and
> > > > > LD_PREVOAD the library while starting the application.
> > > > > See some examples here:
> > > > >
> >
https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#libkeepalive
<https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#libkeepalive>
> >
<https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#libkeepalive
<https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#libkeepalive>>
> > > > >
> > > > > Best regards, Ilya Maximets.
> > > > >
> > > > Wow, so if I understand correctly, we could handle this
entirely in
> > > > ovn-ctl and the systemd service files, and we'd get inactivity
probes
> > > > that work independent of what the application is doing. That's
> > pretty nifty.
> > > >
> > > > It doesn't address Han's concern about timing out if the code
is in a
> > > > dead loop or is deadlocked. But that's the sort of thing a local
> > > > watchdog could take care of instead.
> > >
> > > Hi Mark, this concern was about moving the probe handling to a
> > background thread. With the current probe handling mechanism in
> > ovsdb-server and ovn-controller/ovn-northd this is not a concern
because
> > it is handled by the main loop and would definitely reflect the
> > dead-loop/lock situation.
> > >
> > Sorry, maybe I should make it more clear:
> > 1. The server to client probe seems unnecessary, and in practice it can
> > be disabled (setting to 0)
> > 2. The client to server probe is needed to detect server failures, but
> > it can cause unnecessary reconnections when the server is busy but
still
> > working. Moving probe to background thread can solve the problem, but
> > enabling TCP-keepalive can achieve the same without code change.
> > 3. Background probe or TCP-keepalive would not detect server
> > dead-loop/lock situations, and if that's the concern, we can still
> > enable the current probe mechanism for client->server direction, but to
> > avoid the problem of "unnecessary reconnections when the server is
> > busy", we need to set it to a big interval.
> > 4. It is not ideal to set the probe to a big interval because it would
> > slow-down the detection of real problems. So ideally with OVSDB relay
> > (or other optimizations), the server load wouldn't be too high and a
> > short probe interval works.
>
> Thanks for the breakdown, Han. I thought about this overnight and using
> the preloaded libkeepalive is not quite as straightforward a solution as
> I initially thought.
>
> First, it will require code changes to OVS/OVN. We would need to
> disable/ignore current inactivity probe options if the TCP keepalives
> are enabled. It's not much of a change, but it still needs to be done.
>
I haven't thought about this, but wondering why wouldn't the two
co-exist? If the user wants to disable inactivity probes, there are
already settings (on both client and server sides) for that. Why need
code change? If they prefer TCP keepalives to detect failures such node
crash faster, and at the same time inactivity probes to detect dead
loop/lock of the server, that's also fine, right?