On 10 Mar 2023, at 17:20, Simon Horman wrote:

> On Fri, Mar 10, 2023 at 10:15:44AM +0100, Simon Horman wrote:
>> On Thu, Mar 09, 2023 at 05:22:43PM +0100, Eelco Chaudron wrote:
>>>
>>>
>>> On 9 Mar 2023, at 15:42, Simon Horman wrote:
>>>
>>>> On Wed, Mar 08, 2023 at 04:18:47PM +0100, Eelco Chaudron wrote:
>>>>> Run "make check-offloads" as part of the GitHub actions tests.
>>>>>
>>>>> This test was run 25 times using GitHub actions, and the
>>>>> failing rerun test cases where excluded. There are quite some
>>>>> first-run failures, but unfortunately, there is no other
>>>>> more stable kernel available as a GitHub-hosted runner.
>>>>>
>>>>> Did not yet include sanitizers in the run, as it's causing
>>>>> the test to run too long >30min and there seems to be (timing)
>>>>> issues with some of the tests.
>>>>>
>>>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>>>
>>>> Hi Eelco,
>>>>
>>>> I like this patch a lot.
>>>> But I am observing reliability problems when executing the new job.
>>>>
>>>> For 5 runs, on each occasion some tests failed the first time.
>>>> And on 3 of those runs at least one test failed on the recheck,
>>>> so the job failed.
>>>
>>> Damn :)
>>
>> Yes, it pained me to report this.
>>
>>> I did 25 runs (I did not check for re-runs), and they were fine. I also 
>>> cleaned up my jobs recently, so I no longer have them.
>>>
>>> I can do this again and figure out wich tests are failing. Then analyze the 
>>> failures to see if we need to exclude them or can fine-tune them.
>>
>> I will see if I can spend some cycles on reproducing this (outside of GHA).
>> I'll likely start with the tests that show up in the summary below.
>
> I started off by looking at check-offloads test:
>
> 50. system-traffic.at:1524: testing datapath - basic truncate action ...
>
> I haven't dug into the code to debug the problem yet.
> But I have collected some results that might be interesting.
>
>
> My testing was on a low-end VM with Ubuntu 18.04, with no HW offload:
> $ uname -psv
> Linux #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 x86_64


So I took the same approach, I have local vagrant VM with ubuntu 22.11 (like on 
GitHub) and ran the tests.

I thought I fixed it by this old commit:

  
https://github.com/openvswitch/ovs/commit/22968988b820aa17a9b050c901208b7d4bed9dac

However, as you can see even after excluding the remaining failures I could not 
figure out, it still fails randomly:

  https://github.com/chaudron/ovs/actions

Note that the above 25x runs I did before and none of the above tests failed…

I was not able to make any of these tests fail on my local Ubuntu, and also 
analysing the results did not lead to a specific thing to fix.

As this is working fine on my Fedora (VM) setup for multiple runs without any 
problem I’ll abandon this patch now :( I’ll try to get a buy-in form the Robot 
to run the datapath tests as part of its sanity check for now…

//Eelco

> Using the latest main branch:
> d2f6fbe9fe6c ("ofproto-dpif-upcall: Remove redundant time_msec() in 
> revalidate().")
>
> I ran this in a for loop that exited on failure.
>
> for i in $(seq 50); do echo $i; TESTSUITEFLAGS="50 -v" make check-offloads >& 
> 50.log || break; done
>
> I did this 5 times.
> On one run it failed 1st go, on 3 other runs it failed in less than 10
> iterations, and on the another it made it to the 16th iteration.
>
> On one of the runs the error looked like this:
>
> ...
> ./system-traffic.at:1573: ovs-appctl revalidator/purge
> ./system-traffic.at:1574: ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" 
> | sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p'
> --- /dev/null   2023-03-10 15:43:42.948000000 +0000
> +++ /home/horms/ovs/tests/system-offloads-testsuite.dir/at-groups/50/stderr   
>   2023-03-10 16:00:02.904493675 +0000
> @@ -0,0 +1 @@
> +ovs-ofctl: br0: failed to connect to socket (Connection reset by peer)
> ...
>
> The others looked like this:
>
>
> ./system-traffic.at:1620: ovs-appctl revalidator/purge
> ./system-traffic.at:1621: ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" 
> | sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p'
> ./system-traffic.at:1626: ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" 
> | sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p'
> ./system-traffic.at:1630: check_logs
> --- /dev/null   2023-03-10 15:43:42.948000000 +0000
> +++ /home/horms/ovs/tests/system-offloads-testsuite.dir/at-groups/50/stdout   
>   2023-03-10 16:03:06.996925562 +0000
> @@ -0,0 +1,2 @@
> +2023-03-10T16:03:04.666Z|00137|dpif|WARN|system@ovs-system: failed to 
> flow_del (No such file or directory) 
> ufid:9505babe-8c30-4ee3-bd58-aa4f6f310219 
> recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0xd4),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=a6:02:03:38:97:15,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=fe80::a402:3ff:fe38:9715,dst=ff02::16,label=0,proto=58,tclass=0,hlimit=1,frag=no),icmpv6(type=143,code=0)
> +2023-03-10T16:03:04.666Z|00138|dpif|WARN|system@ovs-system: failed to 
> flow_del (No such file or directory) 
> ufid:c86015ff-4a41-4879-92b5-518c52be4b46 
> recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=a6:02:03:38:97:15,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=fe80::a402:3ff:fe38:9715,dst=ff02::2,label=0,proto=58,tclass=0,hlimit=255,frag=no),icmpv6(type=133,code=0)

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to