On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <[email protected]> wrote:
On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
>Â Â On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <[email protected]>
wrote:
>
>Â Â Â On 30/11/15 12:51, Ruby Loo wrote:
>
>Â Â Â Â On 30 November 2015 at 10:19, Derek Higgins
<[email protected]
>Â Â Â Â <mailto:[email protected]>> wrote:
>
>Â Â Â Â Ã*Â Ã*Â Hi All,
>
>Â Â Â Â Ã*Â Ã*Â Ã*Â Ã*Â Ã*Â A few months tripleo switch from
its devtest based CI to
>Â Â Â Â one
>Â Â Â Â Ã*Â Ã*Â that was based on instack. Before doing this we
anticipated
>Â Â Â Â Ã*Â Ã*Â disruption in the ci jobs and removed them from
non tripleo
>Â Â Â Â projects.
>
>Â Â Â Â Ã*Â Ã*Â Ã*Â Ã*Â Ã*Â We'd like to investigate adding it
back to heat and
>Â Â Â Â ironic as
>Â Â Â Â Ã*Â Ã*Â these are the two projects where we find our ci
provides the
>Â Â Â Â most
>Â Â Â Â Ã*Â Ã*Â value. But we can only do this if the results
from the job are
>Â Â Â Â Ã*Â Ã*Â treated as voting.
>
>Â Â Â Â What does this mean? That the tripleo job could vote and do
a -1 and
>Â Â Â Â block ironic's gate?
>
>Â Â Â Â Ã*Â Ã*Â Ã*Â Ã*Â Ã*Â In the past most of the non tripleo
projects tended to
>Â Â Â Â ignore
>Â Â Â Â Ã*Â Ã*Â the results from the tripleo job as it wasn't
unusual for the
>Â Â Â Â job to
>Â Â Â Â Ã*Â Ã*Â broken for days at a time. The thing is, ignoring
the results of
>Â Â Â Â the
>Â Â Â Â Ã*Â Ã*Â job is the reason (the majority of the time) it
was broken in
>Â Â Â Â the
>Â Â Â Â Ã*Â Ã*Â first place.
>Â Â Â Â Ã*Â Ã*Â Ã*Â Ã*Â Ã*Â To decrease the number of breakages
we are now no longer
>Â Â Â Â Ã*Â Ã*Â running master code for everything (for the non
tripleo projects
>Â Â Â Â we
>Â Â Â Â Ã*Â Ã*Â bump the versions we use periodically if they are
working). I
>Â Â Â Â Ã*Â Ã*Â believe with this model the CI jobs we run have
become a lot
>Â Â Â Â more
>Â Â Â Â Ã*Â Ã*Â reliable, there are still breakages but far less
frequently.
>
>Â Â Â Â Ã*Â Ã*Â What I proposing is we add at least one of our
tripleo jobs back
>Â Â Â Â to
>Â Â Â Â Ã*Â Ã*Â both heat and ironic (and other projects
associated with them
>Â Â Â Â e.g.
>Â Â Â Â Ã*Â Ã*Â clients, ironicinspector etc..), tripleo will
switch to running
>Â Â Â Â Ã*Â Ã*Â latest master of those repositories and the cores
approving on
>Â Â Â Â those
>Â Â Â Â Ã*Â Ã*Â projects should wait for a passing CI jobs before
hitting
>Â Â Â Â approve.
>Â Â Â Â Ã*Â Ã*Â So how do people feel about doing this? can we
give it a go? A
>Â Â Â Â Ã*Â Ã*Â couple of people have already expressed an
interest in doing
>Â Â Â Â this
>Â Â Â Â Ã*Â Ã*Â but I'd like to make sure were all in agreement
before switching
>Â Â Â Â it on.
>
>Â Â Â Â This seems to indicate that the tripleo jobs are
non-voting, or at
>Â Â Â Â least
>Â Â Â Â won't block the gate -- so I'm fine with adding tripleo
jobs to
>Â Â Â Â ironic.
>Â Â Â Â But if you want cores to wait/make sure they pass, then
shouldn't they
>Â Â Â Â be voting? (Guess I'm a bit confused.)
>
>Â Â Â +1
>
>Â Â Â I don't think it hurts to turn it on, but tbh I'm
uncomfortable with the
>Â Â Â mental overhead of a non-voting job that I have to manually
treat as a
>Â Â Â voting job. If it's stable enough to make it a voting job, I'd
prefer we
>Â Â Â just make it voting. And if it's not then I'd like to see it
be made
>Â Â Â stable enough to be a voting job and then make it voting.
>
>Â Â This is roughly where I sit as well -- if it's non-voting,
experience
>Â Â tells me that it will largely be ignored, and as such, isn't a
good use of
>Â Â resources.
I'm sure you can appreciate it's something of a chicken/egg problem
though
- if everyone always ignores non-voting jobs, they never become voting.
That effect is magnified with TripleO though, because it consumes so
many
OpenStack projects, any one of which has the capability to break our CI,
so
in an ideal world we'd have voting feedback on all-the-things, but
that's
not where we are right now due in large-part to the steady stream of
regressions (from Heat, Ironic and other projects).
>Â Â I haven't looked at tripleo or tripleoci in a while, so I wont
assume that
>Â Â my recollection of the CI jobs bears any resemblance to what
exists today.
>Â Â Could you explain what areas of ironic (or its subprojects) will
be
>Â Â covered by these tests?Ã*Â If they are already covered by
existing tests,
>Â Â then I don't see the benefit of adding another job; conversely,
if this is
>Â Â testing areas we don't cover today, then there's probably value
in running
>Â Â tripleoci in a voting fashion for now and then moving that
coverage into
>Â Â ironic's project testing.
I like to think of TripleO as a trunk-chasing "power user", and as such
gives very valuable "user" feedback, including breaking things in
exciting
ways you hadn't anticipated in your project integration tests.
This has, in the case of Heat at least, made TripleO an extremely
effective
"kitchen sink" stress test, and has uncovered numerous issues we failed
to
find with out internal tests (obviously we do add coverage when we find
them).
In the case of Ironic, I think the usage is somewhat less demanding, but
no
less "real world" - here's a good example for you:
https://bugs.launchpad.net/ironic/+bug/1507738
In this case, Ironic landed a change to master, which broke all existing
deployments using Centos/RHEL derived distributions, so master Ironic
has
been broken for folks using those distros for over 6 weeks.
I know in that case, the problem was really old ipxe image in the
distro,
and yes there were several possible workarounds, but as a developer who
cares about users, I personally would rather get gate feedback than
angry
users on IRC/email when I unwittingly break the world for them ;)
(note, I'm not assigning any blame above, it's one of *many* examples of
unexpected breakage due to insufficient gate feedback of real usage
accross
many projects).
Great example, Steve, and I agree that more and faster feedback from users
into patches is a good thing. I'm also sad that it was broken for that
long and no one raised the issue in our meeting until this week.
This particular bug highlights a gap in Ironic's test coverage which I
would be delighted if someone wants to close -- that we aren't testing
support for RH-based distros. Closing that gap doesn't require TripleoCI
at all; we should simply add a dsvm job for Ironic on Fedora, using a
Fedora-based ramdisk. That will help prevent similar regressions in the
future.
Anyway, I have big reservations about putting TripleoCI on a path to ever
gating Ironic patches. I started to bikeshed on that and then deleted it
... tldr; I believe it is important for this job to vote in a non-gating
way. As a reviewer, I'm unlikely to pay attention to it if it doesn't
vote, and there's a good reason for this:
Non-voting jobs are used for experimentation. A non-voting job is a job
that we want to vote, but which we don't trust enough yet. It has been
promoted from the experimental pipeline to the check pipeline so that it
gets a lot more runs and so that we can stabilize it enough to make it
voting.