On 8/8/24 13:53, Ales Musil wrote:
> 
> 
> On Thu, Aug 8, 2024 at 1:26 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 8/8/24 13:18, Ales Musil wrote:
>     >
>     >
>     > On Thu, Aug 8, 2024 at 12:55 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>> wrote:
>     >
>     >     On 8/8/24 12:38, Ales Musil wrote:
>     >     > The test was checking if there was any recompute after all 
> operations.
>     >     > This proven to be flaky because there might have been "random"
>     >     > recompute happening in the middle due to the following events:
>     >     >
>     >     > 1) SB becomes read only
>     >     > 2) non_vif_data handler for OVS_interface fails
>     >     > 3) Recompute of non_vif_data is not allowed because SB is read 
> only
>     >     > 4) non_vif_data is marked as cancelled
>     >     > 5) Next engine run will do force recompute of all nodes
>     >     >
>     >     > To prevent that check specifically for the handler that is 
> supposed
>     >     > to gracefully handle the interface changes.
>     >     >
>     >     > Fixes: 603f38da062b ("controller: Remove OvS iface type check in 
> I-P processing.")
>     >     > Signed-off-by: Ales Musil <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>>
>     >     > ---
>     >     >  tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>> | 19 ++++++++++---------
>     >     >  1 file changed, 10 insertions(+), 9 deletions(-)
>     >     >
>     >     > diff --git a/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>> 
> b/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>>
>     >     > index 898981867..f61c8c432 100644
>     >     > --- a/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>>
>     >     > +++ b/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>>
>     >     > @@ -3308,51 +3308,52 @@ net_add n1
>     >     >  sim_add hv1
>     >     >  ovs-vsctl add-br br-phys
>     >     >  ovn_attach n1 br-phys 192.168.0.20
>     >     > +ovn-appctl vlog/set inc_proc_eng:dbg
>     >     > 
>     >     >  check ovn-nbctl ls-add ls0
>     >     >  check ovn-nbctl lsp-add ls0 vif
>     >     > 
>     >     > -ovn-appctl inc-engine/clear-stats
>     >     > +m4_define([HANDLER_MESSAGE], [runtime_data, recompute (failed 
> handler for input ovs_interface_shadow)])
>     >     > 
>     >     >  ovs-vsctl -- add-port br-int vif -- \
>     >     >      set Interface vif external-ids:iface-id=vif
>     >     >  wait_row_count Port_Binding 1 logical_port="vif" up=true
>     >     > +AT_CHECK([test 0 = $(grep -c "HANDLER_MESSAGE" 
> hv1/ovn-controller.log)])
>     >
>     >     Hi, Ales.  Thanks for the fix!
>     >
>     >     The change seems fine at a glance (I didn't test), but the test 
> itself
>     >     becomes fragile.  All the checks are negative here, i.e. we're 
> always
>     >     checking that there was no match in the log.  If the log message 
> changes
>     >     slightly in the future, the test will keep silently passing while 
> not
>     >     actually testing anything useful.
>     >
>     >     There could be a few ways to avoid that:
>     >     1. Collect recompute stats per failed node, so we can explicitly 
> check
>     >        for a specific reason.  May still be a little flaky though.
>     >     2. Add at least one positive check that ensures that log we're 
> looking
>     >        for can be actually emitted.
>     >
>     >     Best regards, Ilya Maximets.
>     >
>     >
>     > Hi Ilya,
>     >
>     > thank you for the review, how about the following diff?
>     > It adds one positive check for the case which should
>     > definitely cause the handler failure.
>     >
>     > diff --git a/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>> 
> b/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>>
>     > index f61c8c432..68314ad5d 100644
>     > --- a/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>>
>     > +++ b/tests/ovn-controller.at <http://ovn-controller.at> 
> <http://ovn-controller.at <http://ovn-controller.at>>
>     > @@ -3355,5 +3355,18 @@ ovs-vsctl del-port br-int vif
>     >  wait_row_count Port_Binding 1 logical_port="vif" up=false
>     >  AT_CHECK([test 0 = $(grep -c "HANDLER_MESSAGE" 
> hv1/ovn-controller.log)])
>     >  
>     > +# Make sure that setting dummy iface in two different transaction
> 
>     * transactions
> 
>     > +# causes recompute.
> 
>     I'm not sure if we can rely on these transactions not be squashed.
>     Maybe add a port with external id, wait for it to be claimed, then remove
>     the external id?  Will that cause re-compute?
> 
> 
> No, the removal should be without recompute.

OK.  I guess, we can sort of rely on these to be two separate transactions
indirectly.  ovs-vsctl waits for reply from ovs-vswitchd, so hopefully it's
enough time for ovn-controller to trigger processing and not receive both
updates in the same IDL run.  But still a little time-dependent.

>  
> 
> 
>     > +ovs-vsctl add-port br-int vif -- \
>     > +    set Interface vif type=dummy
>     > +ovs-vsctl set Interface vif external-ids:iface-id=vif
> 
>     All ovs-vsctl operations should be under check.
> 
> 
> Ack
>  
> 
> 
>     > +wait_row_count Port_Binding 1 logical_port="vif" up=true
>     > +cat hv1/ovn-controller.log
> 
>     This log file should be already captures with AT_CAPTURE_FILE, isn't it?
> 
> 
> That was a mistake, it won't be there in v2.
> 
> 
>     > +AT_CHECK([test $(grep -c "HANDLER_MESSAGE" hv1/ovn-controller.log) -ge 
> 1])
>     > +
>     > +ovs-vsctl del-port br-int vif
> 
>     check
> 
> 
> Ack
>  
> 
> 
>     > +wait_row_count Port_Binding 1 logical_port="vif" up=false
>     > +AT_CHECK([test $(grep -c "HANDLER_MESSAGE" hv1/ovn-controller.log) -ge 
> 1])
> 
>     '-ge 1' is always true here.
> 
> 
> The removal is probably pointless here because it shouldn't
> cause anything extra and it's checked above anyway.
>  
> 
> 
>     > +
>     >  OVN_CLEANUP([hv1])
>     >  AT_CLEANUP
>     >
>     > Thanks,
>     > Ales
> 
> 
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to