On Fri, Mar 10, 2023 at 10:15:44AM +0100, Simon Horman wrote:
> On Thu, Mar 09, 2023 at 05:22:43PM +0100, Eelco Chaudron wrote:
> > 
> > 
> > On 9 Mar 2023, at 15:42, Simon Horman wrote:
> > 
> > > On Wed, Mar 08, 2023 at 04:18:47PM +0100, Eelco Chaudron wrote:
> > >> Run "make check-offloads" as part of the GitHub actions tests.
> > >>
> > >> This test was run 25 times using GitHub actions, and the
> > >> failing rerun test cases where excluded. There are quite some
> > >> first-run failures, but unfortunately, there is no other
> > >> more stable kernel available as a GitHub-hosted runner.
> > >>
> > >> Did not yet include sanitizers in the run, as it's causing
> > >> the test to run too long >30min and there seems to be (timing)
> > >> issues with some of the tests.
> > >>
> > >> Signed-off-by: Eelco Chaudron <[email protected]>
> > >
> > > Hi Eelco,
> > >
> > > I like this patch a lot.
> > > But I am observing reliability problems when executing the new job.
> > >
> > > For 5 runs, on each occasion some tests failed the first time.
> > > And on 3 of those runs at least one test failed on the recheck,
> > > so the job failed.
> > 
> > Damn :)
> 
> Yes, it pained me to report this.
> 
> > I did 25 runs (I did not check for re-runs), and they were fine. I also 
> > cleaned up my jobs recently, so I no longer have them.
> > 
> > I can do this again and figure out wich tests are failing. Then analyze the 
> > failures to see if we need to exclude them or can fine-tune them.
> 
> I will see if I can spend some cycles on reproducing this (outside of GHA).
> I'll likely start with the tests that show up in the summary below.

I started off by looking at check-offloads test:

50. system-traffic.at:1524: testing datapath - basic truncate action ...

I haven't dug into the code to debug the problem yet.
But I have collected some results that might be interesting.


My testing was on a low-end VM with Ubuntu 18.04, with no HW offload:
$ uname -psv
Linux #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 x86_64

Using the latest main branch:
d2f6fbe9fe6c ("ofproto-dpif-upcall: Remove redundant time_msec() in 
revalidate().")

I ran this in a for loop that exited on failure.

for i in $(seq 50); do echo $i; TESTSUITEFLAGS="50 -v" make check-offloads >& 
50.log || break; done

I did this 5 times.
On one run it failed 1st go, on 3 other runs it failed in less than 10
iterations, and on the another it made it to the 16th iteration.

On one of the runs the error looked like this:

...
./system-traffic.at:1573: ovs-appctl revalidator/purge
./system-traffic.at:1574: ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | 
sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p'
--- /dev/null   2023-03-10 15:43:42.948000000 +0000
+++ /home/horms/ovs/tests/system-offloads-testsuite.dir/at-groups/50/stderr     
2023-03-10 16:00:02.904493675 +0000
@@ -0,0 +1 @@
+ovs-ofctl: br0: failed to connect to socket (Connection reset by peer)
...

The others looked like this:


./system-traffic.at:1620: ovs-appctl revalidator/purge
./system-traffic.at:1621: ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | 
sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p'
./system-traffic.at:1626: ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | 
sed -n 's/.*\(n\_bytes=[0-9]*\).*/\1/p'
./system-traffic.at:1630: check_logs 
--- /dev/null   2023-03-10 15:43:42.948000000 +0000
+++ /home/horms/ovs/tests/system-offloads-testsuite.dir/at-groups/50/stdout     
2023-03-10 16:03:06.996925562 +0000
@@ -0,0 +1,2 @@
+2023-03-10T16:03:04.666Z|00137|dpif|WARN|system@ovs-system: failed to flow_del 
(No such file or directory) ufid:9505babe-8c30-4ee3-bd58-aa4f6f310219 
recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0xd4),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=a6:02:03:38:97:15,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=fe80::a402:3ff:fe38:9715,dst=ff02::16,label=0,proto=58,tclass=0,hlimit=1,frag=no),icmpv6(type=143,code=0)
+2023-03-10T16:03:04.666Z|00138|dpif|WARN|system@ovs-system: failed to flow_del 
(No such file or directory) ufid:c86015ff-4a41-4879-92b5-518c52be4b46 
recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=a6:02:03:38:97:15,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=fe80::a402:3ff:fe38:9715,dst=ff02::2,label=0,proto=58,tclass=0,hlimit=255,frag=no),icmpv6(type=133,code=0)
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to