On 3/29/23 17:34, Simon Horman wrote: > On Tue, Mar 28, 2023 at 01:45:22PM +0200, Eelco Chaudron wrote: >> >> >> On 10 Mar 2023, at 17:20, Simon Horman wrote: >> >>> On Fri, Mar 10, 2023 at 10:15:44AM +0100, Simon Horman wrote: >>>> On Thu, Mar 09, 2023 at 05:22:43PM +0100, Eelco Chaudron wrote: >>>>> >>>>> >>>>> On 9 Mar 2023, at 15:42, Simon Horman wrote: >>>>> >>>>>> On Wed, Mar 08, 2023 at 04:18:47PM +0100, Eelco Chaudron wrote: >>>>>>> Run "make check-offloads" as part of the GitHub actions tests. >>>>>>> >>>>>>> This test was run 25 times using GitHub actions, and the >>>>>>> failing rerun test cases where excluded. There are quite some >>>>>>> first-run failures, but unfortunately, there is no other >>>>>>> more stable kernel available as a GitHub-hosted runner. >>>>>>> >>>>>>> Did not yet include sanitizers in the run, as it's causing >>>>>>> the test to run too long >30min and there seems to be (timing) >>>>>>> issues with some of the tests. >>>>>>> >>>>>>> Signed-off-by: Eelco Chaudron <[email protected]> >>>>>> >>>>>> Hi Eelco, >>>>>> >>>>>> I like this patch a lot. >>>>>> But I am observing reliability problems when executing the new job. >>>>>> >>>>>> For 5 runs, on each occasion some tests failed the first time. >>>>>> And on 3 of those runs at least one test failed on the recheck, >>>>>> so the job failed. >>>>> >>>>> Damn :) >>>> >>>> Yes, it pained me to report this. >>>> >>>>> I did 25 runs (I did not check for re-runs), and they were fine. I also >>>>> cleaned up my jobs recently, so I no longer have them. >>>>> >>>>> I can do this again and figure out wich tests are failing. Then analyze >>>>> the failures to see if we need to exclude them or can fine-tune them. >>>> >>>> I will see if I can spend some cycles on reproducing this (outside of GHA). >>>> I'll likely start with the tests that show up in the summary below. >>> >>> I started off by looking at check-offloads test: >>> >>> 50. system-traffic.at:1524: testing datapath - basic truncate action ... >>> >>> I haven't dug into the code to debug the problem yet. >>> But I have collected some results that might be interesting. >>> >>> >>> My testing was on a low-end VM with Ubuntu 18.04, with no HW offload: >>> $ uname -psv >>> Linux #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 x86_64 >> >> >> So I took the same approach, I have local vagrant VM with ubuntu 22.11 (like >> on GitHub) and ran the tests. >> >> I thought I fixed it by this old commit: >> >> >> https://github.com/openvswitch/ovs/commit/22968988b820aa17a9b050c901208b7d4bed9dac >> >> However, as you can see even after excluding the remaining failures I could >> not figure out, it still fails randomly: >> >> https://github.com/chaudron/ovs/actions >> >> Note that the above 25x runs I did before and none of the above tests failed… >> >> I was not able to make any of these tests fail on my local Ubuntu, and also >> analysing the results did not lead to a specific thing to fix. >> >> As this is working fine on my Fedora (VM) setup for multiple runs without >> any problem I’ll abandon this patch now :( I’ll try to get a buy-in form the >> Robot to run the datapath tests as part of its sanity check for now… > > Thanks Eelco, > > it's a shame this proved to be elusive. > > Perhaps with a newer, as yet unreleased, Ubuntu version things will > improve - perhaps it is a kernel issue. > > Or perhaps we have some deep problem related to running in > resource constrained environments, that we may uncover some day. > > In any case, thanks for looking at this. > I agree that it makes sense to abandon this patchset for now. > And that using the Robot may be a promising alternative, for now.
FWIW, we can spin up a Fedora VM in Cirrus CI, if that helps. We may even ask for more CPUs if needed. See how we do CI in ovn-heater for example: https://github.com/ovn-org/ovn-heater/blob/main/.cirrus.yml If VM is not necessary, we may also use containers similarly to GitHub Actions. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
