On 11 Sep 2018, at 17:51, Aaron Conole wrote:
"Eelco Chaudron" <[email protected]> writes:
On 6 Sep 2018, at 10:56, Aaron Conole wrote:
As of June, the 0-day robot has tested over 450 patch series.
Occasionally it spams the list (apologies for that), but for the
majority of the time it has caught issues before they made it to the
tree - so it's accomplishing the initial goal just fine.
I see lots of ways it can improve. Currently, the bot runs on a
light
system. It takes ~20 minutes to complete a set of tests, including
all
the checkpatch and rebuild runs. That's not a big issue. BUT, it
does
mean that the machine isn't able to perform all the kinds of
regression
tests that we would want. I want to improve this in a way that
various
contributors can bring their own hardware and regression tests to
the
party. In that way, various projects can detect potential issues
before
they would ever land on the tree and it could flag functional
changes
earlier in the process.
I'm not sure the best way to do that. One thing I'll be doing is
updating the bot to push a series that successfully builds and
passes
checkpatch to a special branch on a github repository to kick off
travis
builds. That will give us a more complete regression coverage, and
we
could be confident that a series won't break something major. After
that, I'm not sure how to notify various alternate test
infrastructures
how to kick off their own tests using the patched sources.
My goal is to get really early feedback on patch series. I've sent
this
out to the folks I know are involved in testing and test discussions
in
the hopes that we can talk about how best to get more CI happening.
The
open questions:
1. How can we notify various downstream consumers of OvS of these
0-day builds? Should we just rely on people rolling their own?
Should there be a more formalized framework? How will these
other
test frameworks report any kind of failures?
2. What kinds of additional testing do we want to see the robot
include?
First of all thanks for the 0-day robot, I really like the idea…
One thing I feel would really benefit is some basic performance
testing, like a PVP test for the kernel/dpdk datapath. This will help
easily identifying performance impacting patches as they happen…
Rather than people figuring out after a release why their performance
has dropped.
Yes - I hope to pull in the work you've done for ovs_perf to have some
kind of baselines.
For this to make sense, I think it also needs to have a bunch of
hardware that we can benchmark (hint hint to some of the folks in the
CC
list :). Not for absolute numbers, but at least to detect significant
changes.
I'm also not sure how to measure a 'problem.' Do we run a test
pre-series, and then run it post-series? In that case, we could
slowly
degrade performance over time without any noticing. Do we take it
from
the previous release, and compare? Might make more sense, but I don't
know if it has other problems associated. What are the thresholds we
use for saying something is a regression? How do we report it to
developers?
Guess both in an ideal world, and maybe add a weekly baseline for master
:)
Having a graph of this would be really nice. However, this might be a
whole project on itself, i.e. performance runs on all commits to
master…
Should the test results be made available in general on some kind
of
public facing site? Should it just stay as a "bleep bloop -
failure!" marker?
3. What other concerns should be addressed?
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev