On 2/8/22 13:15, Xavier Simonart wrote:
> Before this patch, when all flows for a port have been installed in OVS,
> ovn-installed was immediately written to the OVS DB; the notification of
> this change to the OVN controller (handled by OVS interface handler)
> might cause a full recompute if the SBDB is readonly / still handling
> the chassis update in PortBinding.
> With this patch, ovn-installed is not written to the OVS DB while the
> SBDB is readonly.  This reduces the number of (un-necessary) full
> recomputes.
> Note that this was causing some spurious full recompute in test "ACL
> with Port Group conjunction flow efficiency", causing the test to fail
> from time to time. The test case was also updated as part of this patch
> to check that ovn-installed was always properly set.
> 
> Signed-off-by: Xavier Simonart <[email protected]>
> ---

Hi Xavier,

>  controller/binding.c |  2 +-
>  tests/ovn.at         | 10 +++++++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/controller/binding.c b/controller/binding.c
> index 4d62b0858..536cbc513 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -688,7 +688,7 @@ local_binding_set_up(struct shash *local_bindings, const 
> char *pb_name,
>          local_binding_find(local_bindings, pb_name);
>      struct binding_lport *b_lport = 
> local_binding_get_primary_lport(lbinding);
>  
> -    if (!ovs_readonly && lbinding && lbinding->iface
> +    if (!ovs_readonly && !sb_readonly && lbinding && lbinding->iface
>              && !smap_get_bool(&lbinding->iface->external_ids,
>                                OVN_INSTALLED_EXT_ID, false)) {

This makes sense but we're adding a dependency on the SB.  In some
cases, e.g., at scale when the SB ovn-controller is connected to is
snapshotting, it can take a while (a few seconds) until a locally
initiated transaction commits and ovn-controller is notified.

I'm guessing those are corner cases, but it would be nice if we could
test this change at scale with a CMS that uses the "ovn-installed" OVS
external-id.

AFAIK the only CMS doing that is ovn-kubernetes.  Pending some scale
test results with OpenShift/Kubernetes with ovn-k8s, and with a few
minor comments below:

Acked-by: Dumitru Ceara <[email protected]>

>          VLOG_INFO("Setting lport %s ovn-installed in OVS", pb_name);
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 957eb7850..1a009561b 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -28325,6 +28325,8 @@ sim_add hv1
>  as hv1
>  ovs-vsctl add-br br-phys
>  ovn_attach n1 br-phys 192.168.0.1
> +#ovn-appctl -t ovn-controller vlog/set dbg
> +ovn-appctl -t ovn-sb vlog/set dbg
>  

I'm guessing these were debug leftovers.

>  ovn-nbctl lr-add lr0
>  
> @@ -28353,6 +28355,8 @@ ovs-vsctl add-port br-int lsp0-1 -- set interface 
> lsp0-1 external_ids:iface-id=l
>  
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | 
> wc -l) == 22])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 
> external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-1 
> external_ids:ovn-installed` = '"true"'])
>  
>  # Save the current lflow_run counter
>  lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
> @@ -28377,11 +28381,13 @@ AT_CHECK([test $(ovs-ofctl dump-flows br-int 
> table=44 | grep 192.168 | wc -l) ==
>  ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0 
> external_ids:iface-id=lsp0-0
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | 
> wc -l) == 12])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 
> external_ids:ovn-installed` = '"true"'])
>  
>  # 4. Bind a lsp (lsp9-0) that doesn't belong to pg1, should not see any 
> change.
>  ovs-vsctl add-port br-int lsp9-0 -- set interface lsp9-0 
> external_ids:iface-id=lsp9-0
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | 
> wc -l) == 12])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp9-0 
> external_ids:ovn-installed` = '"true"'])
>  
>  # 5. Bind another 2 lsps (lsp1-0 lsp1-1) that belong to pg1 and on a 
> different
>  #    LS (ls1), should see conjunction flows doubled (12 x 2 = 24)
> @@ -28389,6 +28395,8 @@ ovs-vsctl add-port br-int lsp1-0 -- set interface 
> lsp1-0 external_ids:iface-id=l
>  ovs-vsctl add-port br-int lsp1-1 -- set interface lsp1-1 
> external_ids:iface-id=lsp1-1
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | 
> wc -l) == 24])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-0 
> external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-1 
> external_ids:ovn-installed` = '"true"'])
>  
>  # 6. Simulate a SB port-group "del and add" notification to ovn-controller 
> in the
>  #    same IDL iteration. ovn-controller should still program the same flows. 
> In
> @@ -28414,11 +28422,11 @@ for i in $(seq 1 10); do
>  
>      # Finally check flow count is the same as before.
>      AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction 
> | wc -l) == 24])
> +    OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 
> external_ids:ovn-installed` = '"true"'])
>  done
>  
>  # Make sure all the above was performed with I-P (no recompute)
>  AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter 
> lflow_run) == $lflow_run])
> -

Accidental?  I would've missed it but it actually generates a conflict
when applying on top of current main branch.

>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])

Thanks,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to