On 10 Mar 2023, at 17:20, Simon Horman wrote:
> On Fri, Mar 10, 2023 at 10:15:44AM +0100, Simon Horman wrote: >> On Thu, Mar 09, 2023 at 05:22:43PM +0100, Eelco Chaudron wrote: >>> >>> >>> On 9 Mar 2023, at 15:42, Simon Horman wrote: >>> >>>> On Wed, Mar 08, 2023 at 04:18:47PM +0100, Eelco Chaudron wrote: >>>>> Run "make check-offloads" as part of the GitHub actions tests. >>>>> >>>>> This test was run 25 times using GitHub actions, and the >>>>> failing rerun test cases where excluded. There are quite some >>>>> first-run failures, but unfortunately, there is no other >>>>> more stable kernel available as a GitHub-hosted runner. >>>>> >>>>> Did not yet include sanitizers in the run, as it's causing >>>>> the test to run too long >30min and there seems to be (timing) >>>>> issues with some of the tests. >>>>> >>>>> Signed-off-by: Eelco Chaudron <[email protected]> >>>> >>>> Hi Eelco, >>>> >>>> I like this patch a lot. >>>> But I am observing reliability problems when executing the new job. >>>> >>>> For 5 runs, on each occasion some tests failed the first time. >>>> And on 3 of those runs at least one test failed on the recheck, >>>> so the job failed. >>> >>> Damn :) >> >> Yes, it pained me to report this. >> >>> I did 25 runs (I did not check for re-runs), and they were fine. I also >>> cleaned up my jobs recently, so I no longer have them. >>> >>> I can do this again and figure out wich tests are failing. Then analyze the >>> failures to see if we need to exclude them or can fine-tune them. >> >> I will see if I can spend some cycles on reproducing this (outside of GHA). >> I'll likely start with the tests that show up in the summary below. > > I started off by looking at check-offloads test: > > 50. system-traffic.at:1524: testing datapath - basic truncate action ... > > I haven't dug into the code to debug the problem yet. > But I have collected some results that might be interesting. > > > My testing was on a low-end VM with Ubuntu 18.04, with no HW offload: > $ uname -psv > Linux #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 x86_64 So I took the same approach, I have local vagrant VM with ubuntu 22.11 (like on GitHub) and ran the tests. I thought I fixed it by this old commit: https://github.com/openvswitch/ovs/commit/22968988b820aa17a9b050c901208b7d4bed9dac However, as you can see even after excluding the remaining failures I could not figure out, it still fails randomly: https://github.com/chaudron/ovs/actions Note that the above 25x runs I did before and none of the above tests failed… I was not able to make any of these tests fail on my local Ubuntu, and also analysing the results did not lead to a specific thing to fix. As this is working fine on my Fedora (VM) setup for multiple runs without any problem I’ll abandon this patch now :( I’ll try to get a buy-in form the Robot to run the datapath tests as part of its sanity check for now… //Eelco > Using the latest main branch: > d2f6fbe9fe6c ("ofproto-dpif-upcall: Remove redundant time_msec() in > revalidate().") > > I ran this in a for loop that exited on failure. > > for i in $(seq 50); do echo $i; TESTSUITEFLAGS="50 -v" make check-offloads >& > 50.log || break; done > > I did this 5 times. > On one run it failed 1st go, on 3 other runs it failed in less than 10 > iterations, and on the another it made it to the 16th iteration. > > On one of the runs the error looked like this: > > ... > ./system-traffic.at:1573: ovs-appctl revalidator/purge > ./system-traffic.at:1574: ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" > | sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p' > --- /dev/null 2023-03-10 15:43:42.948000000 +0000 > +++ /home/horms/ovs/tests/system-offloads-testsuite.dir/at-groups/50/stderr > 2023-03-10 16:00:02.904493675 +0000 > @@ -0,0 +1 @@ > +ovs-ofctl: br0: failed to connect to socket (Connection reset by peer) > ... > > The others looked like this: > > > ./system-traffic.at:1620: ovs-appctl revalidator/purge > ./system-traffic.at:1621: ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" > | sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p' > ./system-traffic.at:1626: ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" > | sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p' > ./system-traffic.at:1630: check_logs > --- /dev/null 2023-03-10 15:43:42.948000000 +0000 > +++ /home/horms/ovs/tests/system-offloads-testsuite.dir/at-groups/50/stdout > 2023-03-10 16:03:06.996925562 +0000 > @@ -0,0 +1,2 @@ > +2023-03-10T16:03:04.666Z|00137|dpif|WARN|system@ovs-system: failed to > flow_del (No such file or directory) > ufid:9505babe-8c30-4ee3-bd58-aa4f6f310219 > recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0xd4),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=a6:02:03:38:97:15,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=fe80::a402:3ff:fe38:9715,dst=ff02::16,label=0,proto=58,tclass=0,hlimit=1,frag=no),icmpv6(type=143,code=0) > +2023-03-10T16:03:04.666Z|00138|dpif|WARN|system@ovs-system: failed to > flow_del (No such file or directory) > ufid:c86015ff-4a41-4879-92b5-518c52be4b46 > recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=a6:02:03:38:97:15,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=fe80::a402:3ff:fe38:9715,dst=ff02::2,label=0,proto=58,tclass=0,hlimit=255,frag=no),icmpv6(type=133,code=0) _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
