On Thu, 2022-12-15 at 19:46 +0100, Frode Nordahl wrote:
> On Thu, Dec 15, 2022 at 7:16 PM Dan Williams <d...@redhat.com> wrote:
> > On Thu, 2022-12-15 at 19:04 +0100, Frode Nordahl wrote:
> 
> [ snip ]
> 
> > > Thank you for proposing this patch!
> > > 
> > > We've been seeing reports of schema upgrades failing in the too
> > > and
> > > have waited for some way to reproduce and see if this would be a
> > > fix.
> > > 
> > > Are you seeing this with clustered databases, and could your
> > > problem
> > > be related to the election timer? If it is, raising the client
> > > side
> > > timer alone could be problematic.
> > 
> > Yes clustered.
> > 
> > A 450MB LB-heavy database generated by ovn-kubernetes with OVN
> > 22.06
> > (which lacks DP groups for SB load balancers) was being upgraded
> > from
> > OVN 22.06 -> 22.09 (but using same OVS 2.17 version, so no change
> > to
> > ovsdb-server) and when the ovsdb-servers got restarted as part of
> > the
> > ovn-kube container upgrade, they took longer to read+parse the
> > database
> > than the 30s upgrade_cluster() timer, thus the container failed and
> > was
> > put in CrashLoopBackoff by Kubernetes.
> > 
> > ovn-ctl was never able to update the schema, and thus 22.09 ovn-
> > northd
> > was never able to reduce the DB size by rewriting it to use DP
> > groups
> > for SB LBs and recover.
> > 
> > This patch is really a workaround and needs a corresponding ovn-ctl
> > patch to accept a timeout for the NB/SB DB start functions that our
> > OpenShift container scripts would pass.
> 
> Right, so you plan to couple the value of the ovsdb-client timeout to
> the value of the ovsdb-server election timer, that makes sense.

Actually no, I hadn't. AFAIK they aren't related. Election timer is
more relevant while ovsdb-server is running (eg, number of connected
client and what monitors they requested), while the timeout in this
patch is only about how long the schema upgrade process waits for
ovsdb-server to respond at startup (eg, size of the on-disk database it
has to parse in and allocate memory for).

FWIW our OpenShift + ovn-kubernetes election timers are 16 seconds and
that works in clusters the same scale as the problem DB we have. But
reading that DB in takes ~28 or 30 seconds. So I'd probably bump the DB
startup timeout we use to > 45s since there are 2 other databases still
running to take the load while the 3rd one is restarting.

> 
> > The real fix is, like Ilya suggests, "reduce the size of the DB" as
> > we've found that the most effective scale strategy for OpenShift
> > and
> > ovn-kubernetes. And I think that strategy has paid off tremendously
> > over the last 2 years we've been working on OVN & ovsdb-server
> > scale.
> > 
> > Huge credit to Ilya and the OVN team for making that happen...
> 
> Hear hear, wonders has been worked on in many of the OVS and OVN
> components over the past years, my hat off to everyone involved.
> 
> The best place to continue the discussion would probably be in the
> thread below, but in short, I agree that reducing the size of the DB
> is of course the best medicine. However, this specific piece of the
> machinery is involved in upgrades, and upgrades are unfortunately
> required to move our shared user base forward to the new promised
> land
> of software that produces smaller databases :)

Yep. Maybe some of this can be backported to help ease that upgrade
process.

Dan

> 
> > > I recently raised a discussion about this on the list to figure
> > > out
> > > possible paths forward [0][1].
> > > 
> > > 0:
> > > https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
> > > 1: https://bugs.launchpad.net/bugs/1999605
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to