On Tue, Jun 28, 2022 at 8:20 AM Dumitru Ceara <[email protected]> wrote:
>
> On 6/28/22 02:49, Han Zhou wrote:
> > When ovn-controller is restarted, it may need multiple iterations of
> > main loop before completely download all related data from SB DB,
> > especially when ovn-monitor-all=false, so after restart, before it sees
> > the related localnet ports from SB DB, it treats the related patch
> > ports on the chassis as not needed, and deleted them. Later when it
> > downloads thoses port-bindings it recreates them.  For a graceful
> > upgrade, we don't this to happen, because it would break the traffic.
> >
> > This is especially problematic at ovn-k8s setups because the external
> > side of the patch port is used to program some flows for external
> > network communication. When the patch ports are recreated, the OF port
> > number changes and those flows are broken. The CMS would detect and fix
> > the flows but it would result in even longer downtime.
> >
> > This patch postpone the deletion of the patch ports, with the assumption
> > that leaving the unused ports on the chassis for little longer is not
> > harmful.
> >
> > Signed-off-by: Han Zhou <[email protected]>
> > ---
> >  controller/patch.c      | 15 ++++++++-
> >  tests/ovn-controller.at | 71 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 85 insertions(+), 1 deletion(-)
> >
> > diff --git a/controller/patch.c b/controller/patch.c
> > index ed831302c..9faae5879 100644
> > --- a/controller/patch.c
> > +++ b/controller/patch.c
> > @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
> >
> >      /* Now 'existing_ports' only still contains patch ports that exist
in the
> >       * database but shouldn't.  Delete them from the database. */
> > +
> > +    /* Wait for some iterations before really deleting any patch
ports, because
> > +     * with conditional monitoring it is possible that related SB data
is not
> > +     * completely downloaded yet after last restart of ovn-controller.
> > +     * Otherwise it may cause unncessary dataplane interruption during
> > +     * restart/upgrade. */
> > +    static int delete_patch_port_delay = 10;
>
> Hi Han,
>
> It's possible that ovn-controller wakes up 10 times in a row immediately
> due to local OVSDB changes and doesn't process any new SB updates in
> that time so 10 might not be enough in some cases.
>

Thanks Dumitru for the review!
In theory I agree with you 10 times is not 100% guaranteeing SB update
completed if other things are triggering the wakeups. However, in practice,
the purpose here is for the start/restart scenario. I think it is very
unlikely that local OVSDB is changing that frequently at the same time when
ovn-controller is being restarted. We have some similar logic (not ideal
for sure) at vif-plug.c:vif_plug_run() for similar reasons.

> Does it make sense to wait a given amount of time instead?  E.g., a few
> seconds?  Can we make this configurable somehow?  Maybe an
> ovn-controller command line argument to override the default?

Waiting for a given amount of time is also not ideal. It is possible that
when ovn-controller starts the SB IDL is not connected (due to server side
problems/ control plane network problems, etc.) so we don't know how long
it should wait at all.

I am ok with adding command line arguments to adjust, but I'd really want
to avoid that unless it is proved to be necessary. I'd rather use a bigger
hardcoded value to avoid another knob which is not easy to understand by
users - it should be something handled by the code itself.

Thanks,
Han

>
> Thanks,
> Dumitru
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to