On Tue, Jun 28, 2022 at 8:20 AM Dumitru Ceara <[email protected]> wrote: > > On 6/28/22 02:49, Han Zhou wrote: > > When ovn-controller is restarted, it may need multiple iterations of > > main loop before completely download all related data from SB DB, > > especially when ovn-monitor-all=false, so after restart, before it sees > > the related localnet ports from SB DB, it treats the related patch > > ports on the chassis as not needed, and deleted them. Later when it > > downloads thoses port-bindings it recreates them. For a graceful > > upgrade, we don't this to happen, because it would break the traffic. > > > > This is especially problematic at ovn-k8s setups because the external > > side of the patch port is used to program some flows for external > > network communication. When the patch ports are recreated, the OF port > > number changes and those flows are broken. The CMS would detect and fix > > the flows but it would result in even longer downtime. > > > > This patch postpone the deletion of the patch ports, with the assumption > > that leaving the unused ports on the chassis for little longer is not > > harmful. > > > > Signed-off-by: Han Zhou <[email protected]> > > --- > > controller/patch.c | 15 ++++++++- > > tests/ovn-controller.at | 71 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 85 insertions(+), 1 deletion(-) > > > > diff --git a/controller/patch.c b/controller/patch.c > > index ed831302c..9faae5879 100644 > > --- a/controller/patch.c > > +++ b/controller/patch.c > > @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn, > > > > /* Now 'existing_ports' only still contains patch ports that exist in the > > * database but shouldn't. Delete them from the database. */ > > + > > + /* Wait for some iterations before really deleting any patch ports, because > > + * with conditional monitoring it is possible that related SB data is not > > + * completely downloaded yet after last restart of ovn-controller. > > + * Otherwise it may cause unncessary dataplane interruption during > > + * restart/upgrade. */ > > + static int delete_patch_port_delay = 10; > > Hi Han, > > It's possible that ovn-controller wakes up 10 times in a row immediately > due to local OVSDB changes and doesn't process any new SB updates in > that time so 10 might not be enough in some cases. >
Thanks Dumitru for the review! In theory I agree with you 10 times is not 100% guaranteeing SB update completed if other things are triggering the wakeups. However, in practice, the purpose here is for the start/restart scenario. I think it is very unlikely that local OVSDB is changing that frequently at the same time when ovn-controller is being restarted. We have some similar logic (not ideal for sure) at vif-plug.c:vif_plug_run() for similar reasons. > Does it make sense to wait a given amount of time instead? E.g., a few > seconds? Can we make this configurable somehow? Maybe an > ovn-controller command line argument to override the default? Waiting for a given amount of time is also not ideal. It is possible that when ovn-controller starts the SB IDL is not connected (due to server side problems/ control plane network problems, etc.) so we don't know how long it should wait at all. I am ok with adding command line arguments to adjust, but I'd really want to avoid that unless it is proved to be necessary. I'd rather use a bigger hardcoded value to avoid another knob which is not easy to understand by users - it should be something handled by the code itself. Thanks, Han > > Thanks, > Dumitru > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
