On 6/28/22 02:49, Han Zhou wrote:
> When ovn-controller is restarted, it may need multiple iterations of
> main loop before completely download all related data from SB DB,
> especially when ovn-monitor-all=false, so after restart, before it sees
> the related localnet ports from SB DB, it treats the related patch
> ports on the chassis as not needed, and deleted them. Later when it
> downloads thoses port-bindings it recreates them.  For a graceful
> upgrade, we don't this to happen, because it would break the traffic.
> 
> This is especially problematic at ovn-k8s setups because the external
> side of the patch port is used to program some flows for external
> network communication. When the patch ports are recreated, the OF port
> number changes and those flows are broken. The CMS would detect and fix
> the flows but it would result in even longer downtime.
> 
> This patch postpone the deletion of the patch ports, with the assumption
> that leaving the unused ports on the chassis for little longer is not
> harmful.
> 
> Signed-off-by: Han Zhou <[email protected]>
> ---
>  controller/patch.c      | 15 ++++++++-
>  tests/ovn-controller.at | 71 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 1 deletion(-)
> 
> diff --git a/controller/patch.c b/controller/patch.c
> index ed831302c..9faae5879 100644
> --- a/controller/patch.c
> +++ b/controller/patch.c
> @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>  
>      /* Now 'existing_ports' only still contains patch ports that exist in the
>       * database but shouldn't.  Delete them from the database. */
> +
> +    /* Wait for some iterations before really deleting any patch ports, 
> because
> +     * with conditional monitoring it is possible that related SB data is not
> +     * completely downloaded yet after last restart of ovn-controller.
> +     * Otherwise it may cause unncessary dataplane interruption during
> +     * restart/upgrade. */
> +    static int delete_patch_port_delay = 10;

Hi Han,

It's possible that ovn-controller wakes up 10 times in a row immediately
due to local OVSDB changes and doesn't process any new SB updates in
that time so 10 might not be enough in some cases.

Does it make sense to wait a given amount of time instead?  E.g., a few
seconds?  Can we make this configurable somehow?  Maybe an
ovn-controller command line argument to override the default?

Thanks,
Dumitru


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to