Ian Stokes <[email protected]> writes:

> On 9/11/2018 4:51 PM, Aaron Conole wrote:
>> "Eelco Chaudron" <[email protected]> writes:
>>
>>> On 6 Sep 2018, at 10:56, Aaron Conole wrote:
>>>
>>>> As of June, the 0-day robot has tested over 450 patch series.
>>>> Occasionally it spams the list (apologies for that), but for the
>>>> majority of the time it has caught issues before they made it to the
>>>> tree - so it's accomplishing the initial goal just fine.
>>>>
>>>> I see lots of ways it can improve.  Currently, the bot runs on a light
>>>> system.  It takes ~20 minutes to complete a set of tests, including
>>>> all
>>>> the checkpatch and rebuild runs.  That's not a big issue.  BUT, it
>>>> does
>>>> mean that the machine isn't able to perform all the kinds of
>>>> regression
>>>> tests that we would want.  I want to improve this in a way that
>>>> various
>>>> contributors can bring their own hardware and regression tests to the
>>>> party.  In that way, various projects can detect potential issues
>>>> before
>>>> they would ever land on the tree and it could flag functional changes
>>>> earlier in the process.
>>>>
>>>> I'm not sure the best way to do that.  One thing I'll be doing is
>>>> updating the bot to push a series that successfully builds and passes
>>>> checkpatch to a special branch on a github repository to kick off
>>>> travis
>>>> builds.  That will give us a more complete regression coverage, and we
>>>> could be confident that a series won't break something major.  After
>>>> that, I'm not sure how to notify various alternate test
>>>> infrastructures
>>>> how to kick off their own tests using the patched sources.
>>>>
>>>> My goal is to get really early feedback on patch series.  I've sent
>>>> this
>>>> out to the folks I know are involved in testing and test discussions
>>>> in
>>>> the hopes that we can talk about how best to get more CI happening.
>>>> The
>>>> open questions:
>>>>
>>>> 1. How can we notify various downstream consumers of OvS of these
>>>>     0-day builds?  Should we just rely on people rolling their own?
>>>>     Should there be a more formalized framework?  How will these other
>>>>     test frameworks report any kind of failures?
>>>>
>>>> 2. What kinds of additional testing do we want to see the robot
>>>> include?
>>>
>>> First of all thanks for the 0-day robot, I really like the idea…
> +1, great work on this.
>
>>>
>>> One thing I feel would really benefit is some basic performance
>>> testing, like a PVP test for the kernel/dpdk datapath. This will help
>>> easily identifying performance impacting patches as they happen…
>>> Rather than people figuring out after a release why their performance
>>> has dropped.
>>
>
> To date I've been using vsperf to conduct p2p, pvp, pvvp, vxlan tests
> etc. The framework for a lot of these are already in place. It
> supports a number of traffic gens also such as t-rex, moongen etc. as
> well as the commercial usual suspects.

I think we also have some VSPerf tests, as well.  I'm happy to use
whatever :)

> The vsperf CI also published a large number of tests with both OVS
> DPDK and OVS kernel. Not sure if it is still running however, I'll
> look into it as the graphs in the link below seem out of date.
>
> https://wiki.opnfv.org/display/vsperf/VSPERF+CI+Results#VSPERFCIResults-OVSwithDPDK
>
> Currently it uses DPDK 17.08 and OVS 2.9 by default but I have it
> working with DPDK 17.11 and OVS master on my own system easily enough.

Awesome.  I'll pull a Bane:

"Your precious test suite, gratefully accepted!"

>> Yes - I hope to pull in the work you've done for ovs_perf to have some
>> kind of baselines.
>>
>> For this to make sense, I think it also needs to have a bunch of
>> hardware that we can benchmark (hint hint to some of the folks in the CC
>> list :).  Not for absolute numbers, but at least to detect significant
>> changes.
>
> Working on it :). It leads to another discussion though, if we have
> hardware ready to ship then where should it go? Where's the best place
> to host and maintain the CI system.
>
>>
>> I'm also not sure how to measure a 'problem.'  Do we run a test
>> pre-series, and then run it post-series?  In that case, we could slowly
>> degrade performance over time without any noticing.  Do we take it from
>> the previous release, and compare?  Might make more sense, but I don't
>> know if it has other problems associated.  What are the thresholds we
>> use for saying something is a regression?  How do we report it to
>> developers?
>
> It's a good point, we typically run perf tests nightly in order to
> gauge any degradation on OVS master. Possibly this could help in
> comparison as long as the HW is the same. I would be anxious not to
> overburden the robot test system from the get go however. It's primary
> purpose initially would be to provide feedback on patch series so I'd
> like to avoid having it tied up with performance checking what has
> been upstreamed already.

I agree with the burdening part.  I think from our side we'd only be
using the same hardware each time, because anything else is too
difficult to keep automated (imagine setting up DPDK parameters over and
over again, or realizing that a NIC is not in the correct PCI slot).

> In this case maybe it would make sense to run a baseline performance
> test once a week and use this against incoming patch series for
> comparison?

That could make sense - okay food for thought.

> Ian
>>
>>>>     Should the test results be made available in general on some kind
>>>> of
>>>>     public facing site?  Should it just stay as a "bleep bloop -
>>>>     failure!" marker?
>>>>
>>>> 3. What other concerns should be addressed?
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to