On Tue, Aug 23, 2022 at 7:15 AM Han Zhou <[email protected]> wrote:
>
>
>
> On Tue, Aug 23, 2022 at 6:45 AM Ilya Maximets <[email protected]> wrote:
> >
> > On 8/22/22 19:39, Han Zhou wrote:
> > >
> > >
> > > On Mon, Aug 22, 2022 at 3:33 AM Ilya Maximets <[email protected]
<mailto:[email protected]>> wrote:
> > >>
> > >> On 8/22/22 12:21, Dumitru Ceara wrote:
> > >> > Hi Ilya,
> > >> >
> > >> > On 8/22/22 11:19, Ilya Maximets wrote:
> > >> >> daemon_started_recently() concept is flawed in terms that it uses
> > >> >> fixed number of iterations for a countdown and a fixed timeout, so
> > >> >> the undesired removal of configured resources can still happen
under
> > >> >> certain conditions even with this mechanism in place.
> > >> >>
> > >> >> The root cause of the original problem is that conditional
monitoring
> > >> >> in ovn-controller works in stages.  ovn-controller doesn't know
all
> > >> >> the information it needs to monitor until the first monitor reply
> > >> >> arrives, then it adjusts conditions and receives all the data.
That
> > >> >> doesn't work well with the ovsdb_idl_has_ever_connected() check
that
> > >> >> will return 'true' right after the first monitor reply.
> > >> >> This leads to situation where resources can be removed after the
> > >> >> first monitor reply and re-added back after condition change.
> > >> >>
> > >> >> Instead of waiting a fixed number of iterations/seconds, the
correct
> > >> >> solution should be to always start with unconditional monitoring.
> > >> >> When the full initial snapshot is received, switch to conditional
> > >> >> monitoring as data is complete.  This way
ovsdb_idl_has_ever_connected()
> > >> >> check will provide an accurate answer to the question: if we have
> > >> >> a sufficient amount of data to make decisions about resource
removal?
> > >> >
> > >> > This does sound like the cleanest solution, I agree.  But does it
create
> > >> > an issue in large scale deployments where conditional monitoring is
> > >> > used?  The full initial SB snapshot is likely quite large.
> > >>
> > >> It can be large, yes, though it shouldn't be critical.  We will not
hold
> > >> all this memory for long.  If it is critical, we can fall back on the
> > >> "Alternative solution" below.  It should not be hard to implement.
> > >> I'll wait for the discussion here to conclude before working on that.
> > >
> > > Thanks Ilya. I admit this is an approach I didn't think about before.
It is indeed cleaner.
> > > However, the spike of memory may be undesirable, given that one of
the major advantages of conditional monitoring is saving memory on the
client side. I guess many environments don't care much about memory, but
still some environments do care, and in such cases it is critical.
ovn-controller is the distributed part, so at scale the total difference is
actually significant. Even if the memory spike won't last for long, the
memory still needs to be reserved for that. Otherwise it is possible to
trigger OOM just when ovn-controller restarts. I just did a scale test in a
simulated 800 node ovn-k8s environment, with ovn-monitor-all enabled the
memory is at least 10x more than conditional monitoring (and my machine
just got OOM before the memory spike completes).
> >
> >
> > OK, makes sense.  I'll work towards checking the condition change
> > sequence number then.
> >
> > >
> > >>
> > >> >
> > >> >>
> > >> >> Once switched to conditional monitoring, southbound database will
> > >> >> remove all the unnecessary data from the ovn-controller, so it
will
> > >> >> not need to keep it for long.  And since setup is using
conditional
> > >> >> monitoring, it's likely not that big anyway to be concerned about
> > >> >> a brief spike in memory consumption on startup.
> > >> >>
> > >> >
> > >> > For OpenShift and RedHat OpenStack this stands, we use
> > >> > ovn-monitor-all=true for now.  But I'm not sure about other large
scale
> > >> > deployments.  I'll let Han comment more on this too.
> > >> >
> > >> >> This change will also remove unnecessary waiting on startup when
all
> > >> >> the data is already available.
> > >> >>
> > >> >> The main change is in the ovsdb_idl_has_ever_connected() check in
the
> > >> >> update_sb_monitors(), the rest is mostly a revert of commits that
> > >> >> introduced the startup delay.
> > >> >>
> > >> >> Test enhanced to run with and w/o conditional monitoring, since
the
> > >> >> issue is closely related it it.
> > >> >>
> > >> >
> > >> > Good idea.
> > >> >
> > >> >> Alternative solution might be to store the first condition change
> > >> >> sequence number and replace all ovsdb_idl_has_ever_connected()
calls
> > >> >> with a function that also checks for the current sequence number
being
> > >> >> higher than condition change one, but that will require slightly
more
> > >> >> code.  Can be implemented in the future, if the first
unconditional
> > >> >> monitor reply will become a problem some day.
> > >> >>
> > >
> > > I believe this was the first solution we have thought about
initially. The problem is that, when new data comes (as a result of monitor
condition change), it may trigger a change of condition again, and so on.
In theory, we don't know for sure what is the point that all "initial" data
has been downloaded.
> > > It is possible that we assume there are at most "N" monitor condition
changes required to download the required initial snapshot of data, but
then this is almost equivalent to the current "N" iterations approach.
> > > Any idea to solve this problem?
> >
> > I don't think there is a problem, unless we have a circular dependency
> > in condition updates.  But wouldn't that be a much more serious problem
> > on its own?  I'll recheck all the conditions that we have to make sure
> > we're not running in circles, but my understanding is that it should be
> > enough to just wait for the first or second condition update to
complete.
> > Any new data that we're receiving afterwards should be an actual new
> > data created after the ovn-controller is already connected, so it will
> > not affect pre-existing ports and OF rules.
> >
> It's not about circular dependency. It is about how we determine if a
datapath is related to the chassis. We start from the local VIFs and the
DPs they belong to, and each next iteration would change monitor conditions
to monitor the port-bindings of the next level of connected DPs according
to the patch ports already monitored. So, how many iterations are required
to download the data depends on the *hops* in the logical topology.
>
> A possible solution is to always monitor all port-bindings, but the
side-effect would be memory and CPU cost at scale. (CPU cost may be ok if
we optimize and eliminate loops of SBREC_PORT_BINDING_TABLE_FOR_EACH).
>
Sorry that I just realized that maybe it is sufficient to just monitor all
"patch" ports, and the code is already doing that. So it seems this
alternative should work in theory: the first round we can monitor for the
directly connected DPs for the local VIFs, and the second round we should
know all indirectly connected DPs through patch ports, and initiate
monitoring for them.
If we go down this path, I think there are a little more details to be
figured out. I did observe that today the ovn-controller adds monitor
conditions hop-by-hop, so we need to figure out why and fix that (assume
the theory is correct). Also, what if there are no VIFs attached on the
chassis yet, so there won't be any monitor condition change, and we should
not wait in this case.

Thanks,
Han

> > >
> > > Thanks,
> > > Han
> > >
> > >> >> Fixes: 2b27eb3482a3 ("patch.c: Avoid patch interface deletion &
recreation during restart.")
> > >> >
> > >> > It seems to me we need to do something similar for OpenFlow
clearing
> > >> > too.  Right now we wait a configured amount of time:
> > >> >
> > >> > https://github.com/ovn-org/ovn/commit/896adfd2d <
https://github.com/ovn-org/ovn/commit/896adfd2d>
> > >>
> > >> Hmm, yes.  It seems like with the correct handling of Sb updates in
place
> > >> we should be able to just implement an "ideal" solution:
> > >>
> > >> """
> > >>   Ideally, ovn-controller should clear the flows after it
> > >>   completes the new flows computing.
> > >> """
> > >>
> > >> Best regards, Ilya Maximets.
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to