On 9/11/2018 4:51 PM, Aaron Conole wrote:
"Eelco Chaudron" <[email protected]> writes:
On 6 Sep 2018, at 10:56, Aaron Conole wrote:
As of June, the 0-day robot has tested over 450 patch series.
Occasionally it spams the list (apologies for that), but for the
majority of the time it has caught issues before they made it to the
tree - so it's accomplishing the initial goal just fine.
I see lots of ways it can improve. Currently, the bot runs on a light
system. It takes ~20 minutes to complete a set of tests, including
all
the checkpatch and rebuild runs. That's not a big issue. BUT, it
does
mean that the machine isn't able to perform all the kinds of
regression
tests that we would want. I want to improve this in a way that
various
contributors can bring their own hardware and regression tests to the
party. In that way, various projects can detect potential issues
before
they would ever land on the tree and it could flag functional changes
earlier in the process.
I'm not sure the best way to do that. One thing I'll be doing is
updating the bot to push a series that successfully builds and passes
checkpatch to a special branch on a github repository to kick off
travis
builds. That will give us a more complete regression coverage, and we
could be confident that a series won't break something major. After
that, I'm not sure how to notify various alternate test
infrastructures
how to kick off their own tests using the patched sources.
My goal is to get really early feedback on patch series. I've sent
this
out to the folks I know are involved in testing and test discussions
in
the hopes that we can talk about how best to get more CI happening.
The
open questions:
1. How can we notify various downstream consumers of OvS of these
0-day builds? Should we just rely on people rolling their own?
Should there be a more formalized framework? How will these other
test frameworks report any kind of failures?
2. What kinds of additional testing do we want to see the robot
include?
First of all thanks for the 0-day robot, I really like the idea…
+1, great work on this.
One thing I feel would really benefit is some basic performance
testing, like a PVP test for the kernel/dpdk datapath. This will help
easily identifying performance impacting patches as they happen…
Rather than people figuring out after a release why their performance
has dropped.
To date I've been using vsperf to conduct p2p, pvp, pvvp, vxlan tests
etc. The framework for a lot of these are already in place. It supports
a number of traffic gens also such as t-rex, moongen etc. as well as the
commercial usual suspects.
The vsperf CI also published a large number of tests with both OVS DPDK
and OVS kernel. Not sure if it is still running however, I'll look into
it as the graphs in the link below seem out of date.
https://wiki.opnfv.org/display/vsperf/VSPERF+CI+Results#VSPERFCIResults-OVSwithDPDK
Currently it uses DPDK 17.08 and OVS 2.9 by default but I have it
working with DPDK 17.11 and OVS master on my own system easily enough.
Yes - I hope to pull in the work you've done for ovs_perf to have some
kind of baselines.
For this to make sense, I think it also needs to have a bunch of
hardware that we can benchmark (hint hint to some of the folks in the CC
list :). Not for absolute numbers, but at least to detect significant
changes.
Working on it :). It leads to another discussion though, if we have
hardware ready to ship then where should it go? Where's the best place
to host and maintain the CI system.
I'm also not sure how to measure a 'problem.' Do we run a test
pre-series, and then run it post-series? In that case, we could slowly
degrade performance over time without any noticing. Do we take it from
the previous release, and compare? Might make more sense, but I don't
know if it has other problems associated. What are the thresholds we
use for saying something is a regression? How do we report it to
developers?
It's a good point, we typically run perf tests nightly in order to gauge
any degradation on OVS master. Possibly this could help in comparison as
long as the HW is the same. I would be anxious not to overburden the
robot test system from the get go however. It's primary purpose
initially would be to provide feedback on patch series so I'd like to
avoid having it tied up with performance checking what has been
upstreamed already.
In this case maybe it would make sense to run a baseline performance
test once a week and use this against incoming patch series for comparison?
Ian
Should the test results be made available in general on some kind
of
public facing site? Should it just stay as a "bleep bloop -
failure!" marker?
3. What other concerns should be addressed?
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev