On 3/29/23 17:34, Simon Horman wrote:
> On Tue, Mar 28, 2023 at 01:45:22PM +0200, Eelco Chaudron wrote:
>>
>>
>> On 10 Mar 2023, at 17:20, Simon Horman wrote:
>>
>>> On Fri, Mar 10, 2023 at 10:15:44AM +0100, Simon Horman wrote:
>>>> On Thu, Mar 09, 2023 at 05:22:43PM +0100, Eelco Chaudron wrote:
>>>>>
>>>>>
>>>>> On 9 Mar 2023, at 15:42, Simon Horman wrote:
>>>>>
>>>>>> On Wed, Mar 08, 2023 at 04:18:47PM +0100, Eelco Chaudron wrote:
>>>>>>> Run "make check-offloads" as part of the GitHub actions tests.
>>>>>>>
>>>>>>> This test was run 25 times using GitHub actions, and the
>>>>>>> failing rerun test cases where excluded. There are quite some
>>>>>>> first-run failures, but unfortunately, there is no other
>>>>>>> more stable kernel available as a GitHub-hosted runner.
>>>>>>>
>>>>>>> Did not yet include sanitizers in the run, as it's causing
>>>>>>> the test to run too long >30min and there seems to be (timing)
>>>>>>> issues with some of the tests.
>>>>>>>
>>>>>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>>>>>
>>>>>> Hi Eelco,
>>>>>>
>>>>>> I like this patch a lot.
>>>>>> But I am observing reliability problems when executing the new job.
>>>>>>
>>>>>> For 5 runs, on each occasion some tests failed the first time.
>>>>>> And on 3 of those runs at least one test failed on the recheck,
>>>>>> so the job failed.
>>>>>
>>>>> Damn :)
>>>>
>>>> Yes, it pained me to report this.
>>>>
>>>>> I did 25 runs (I did not check for re-runs), and they were fine. I also 
>>>>> cleaned up my jobs recently, so I no longer have them.
>>>>>
>>>>> I can do this again and figure out wich tests are failing. Then analyze 
>>>>> the failures to see if we need to exclude them or can fine-tune them.
>>>>
>>>> I will see if I can spend some cycles on reproducing this (outside of GHA).
>>>> I'll likely start with the tests that show up in the summary below.
>>>
>>> I started off by looking at check-offloads test:
>>>
>>> 50. system-traffic.at:1524: testing datapath - basic truncate action ...
>>>
>>> I haven't dug into the code to debug the problem yet.
>>> But I have collected some results that might be interesting.
>>>
>>>
>>> My testing was on a low-end VM with Ubuntu 18.04, with no HW offload:
>>> $ uname -psv
>>> Linux #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 x86_64
>>
>>
>> So I took the same approach, I have local vagrant VM with ubuntu 22.11 (like 
>> on GitHub) and ran the tests.
>>
>> I thought I fixed it by this old commit:
>>
>>   
>> https://github.com/openvswitch/ovs/commit/22968988b820aa17a9b050c901208b7d4bed9dac
>>
>> However, as you can see even after excluding the remaining failures I could 
>> not figure out, it still fails randomly:
>>
>>   https://github.com/chaudron/ovs/actions
>>
>> Note that the above 25x runs I did before and none of the above tests failed…
>>
>> I was not able to make any of these tests fail on my local Ubuntu, and also 
>> analysing the results did not lead to a specific thing to fix.
>>
>> As this is working fine on my Fedora (VM) setup for multiple runs without 
>> any problem I’ll abandon this patch now :( I’ll try to get a buy-in form the 
>> Robot to run the datapath tests as part of its sanity check for now…
> 
> Thanks Eelco,
> 
> it's a shame this proved to be elusive.
> 
> Perhaps with a newer, as yet unreleased, Ubuntu version things will
> improve - perhaps it is a kernel issue.
> 
> Or perhaps we have some deep problem related to running in
> resource constrained environments, that we may uncover some day.
> 
> In any case, thanks for looking at this.
> I agree that it makes sense to abandon this patchset for now.
> And that using the Robot may be a promising alternative, for now.

FWIW, we can spin up a Fedora VM in Cirrus CI, if that helps.
We may even ask for more CPUs if needed.

See how we do CI in ovn-heater for example:
  https://github.com/ovn-org/ovn-heater/blob/main/.cirrus.yml

If VM is not necessary, we may also use containers similarly
to GitHub Actions.

Best regards, Ilya Maximets.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to