On Thu, Dec 15, 2022 at 7:16 PM Dan Williams <[email protected]> wrote:
> On Thu, 2022-12-15 at 19:04 +0100, Frode Nordahl wrote:

[ snip ]

> > Thank you for proposing this patch!
> >
> > We've been seeing reports of schema upgrades failing in the too and
> > have waited for some way to reproduce and see if this would be a fix.
> >
> > Are you seeing this with clustered databases, and could your problem
> > be related to the election timer? If it is, raising the client side
> > timer alone could be problematic.
>
> Yes clustered.
>
> A 450MB LB-heavy database generated by ovn-kubernetes with OVN 22.06
> (which lacks DP groups for SB load balancers) was being upgraded from
> OVN 22.06 -> 22.09 (but using same OVS 2.17 version, so no change to
> ovsdb-server) and when the ovsdb-servers got restarted as part of the
> ovn-kube container upgrade, they took longer to read+parse the database
> than the 30s upgrade_cluster() timer, thus the container failed and was
> put in CrashLoopBackoff by Kubernetes.
>
> ovn-ctl was never able to update the schema, and thus 22.09 ovn-northd
> was never able to reduce the DB size by rewriting it to use DP groups
> for SB LBs and recover.
>
> This patch is really a workaround and needs a corresponding ovn-ctl
> patch to accept a timeout for the NB/SB DB start functions that our
> OpenShift container scripts would pass.

Right, so you plan to couple the value of the ovsdb-client timeout to
the value of the ovsdb-server election timer, that makes sense.

> The real fix is, like Ilya suggests, "reduce the size of the DB" as
> we've found that the most effective scale strategy for OpenShift and
> ovn-kubernetes. And I think that strategy has paid off tremendously
> over the last 2 years we've been working on OVN & ovsdb-server scale.
>
> Huge credit to Ilya and the OVN team for making that happen...

Hear hear, wonders has been worked on in many of the OVS and OVN
components over the past years, my hat off to everyone involved.

The best place to continue the discussion would probably be in the
thread below, but in short, I agree that reducing the size of the DB
is of course the best medicine. However, this specific piece of the
machinery is involved in upgrades, and upgrades are unfortunately
required to move our shared user base forward to the new promised land
of software that produces smaller databases :)

> > I recently raised a discussion about this on the list to figure out
> > possible paths forward [0][1].
> >
> > 0:
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
> > 1: https://bugs.launchpad.net/bugs/1999605

-- 
Frode Nordahl
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to