Re: [openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

Derek Higgins Wed, 02 Dec 2015 08:05:17 -0800


On 02/12/15 12:53, Steven Hardy wrote:

On Tue, Dec 01, 2015 at 05:10:57PM -0800, Devananda van der Veen wrote:

    On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <[email protected]> wrote:

      On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
      >Â  Â  On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <[email protected]>
      wrote:
      >
      >Â  Â  Â  On 30/11/15 12:51, Ruby Loo wrote:
      >
      >Â  Â  Â  Â  On 30 November 2015 at 10:19, Derek Higgins
      <[email protected]
      >Â  Â  Â  Â  <mailto:[email protected]>> wrote:
      >
      >Â  Â  Â  Â  Ã*Â  Ã*Â  Hi All,
      >
      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â A few months tripleo switch from
      its devtest based CI to
      >Â  Â  Â  Â  one
      >Â  Â  Â  Â  Ã*Â  Ã*Â  that was based on instack. Before doing this we
      anticipated
      >Â  Â  Â  Â  Ã*Â  Ã*Â  disruption in the ci jobs and removed them from
      non tripleo
      >Â  Â  Â  Â  projects.
      >
      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â We'd like to investigate adding it
      back to heat and
      >Â  Â  Â  Â  ironic as
      >Â  Â  Â  Â  Ã*Â  Ã*Â  these are the two projects where we find our ci
      provides the
      >Â  Â  Â  Â  most
      >Â  Â  Â  Â  Ã*Â  Ã*Â  value. But we can only do this if the results
      from the job are
      >Â  Â  Â  Â  Ã*Â  Ã*Â  treated as voting.
      >
      >Â  Â  Â  Â  What does this mean? That the tripleo job could vote and do
      a -1 and
      >Â  Â  Â  Â  block ironic's gate?
      >
      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â In the past most of the non tripleo
      projects tended to
      >Â  Â  Â  Â  ignore
      >Â  Â  Â  Â  Ã*Â  Ã*Â  the results from the tripleo job as it wasn't
      unusual for the
      >Â  Â  Â  Â  job to
      >Â  Â  Â  Â  Ã*Â  Ã*Â  broken for days at a time. The thing is, ignoring
      the results of
      >Â  Â  Â  Â  the
      >Â  Â  Â  Â  Ã*Â  Ã*Â  job is the reason (the majority of the time) it
      was broken in
      >Â  Â  Â  Â  the
      >Â  Â  Â  Â  Ã*Â  Ã*Â  first place.
      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â To decrease the number of breakages
      we are now no longer
      >Â  Â  Â  Â  Ã*Â  Ã*Â  running master code for everything (for the non
      tripleo projects
      >Â  Â  Â  Â  we
      >Â  Â  Â  Â  Ã*Â  Ã*Â  bump the versions we use periodically if they are
      working). I
      >Â  Â  Â  Â  Ã*Â  Ã*Â  believe with this model the CI jobs we run have
      become a lot
      >Â  Â  Â  Â  more
      >Â  Â  Â  Â  Ã*Â  Ã*Â  reliable, there are still breakages but far less
      frequently.
      >
      >Â  Â  Â  Â  Ã*Â  Ã*Â  What I proposing is we add at least one of our
      tripleo jobs back
      >Â  Â  Â  Â  to
      >Â  Â  Â  Â  Ã*Â  Ã*Â  both heat and ironic (and other projects
      associated with them
      >Â  Â  Â  Â  e.g.
      >Â  Â  Â  Â  Ã*Â  Ã*Â  clients, ironicinspector etc..), tripleo will
      switch to running
      >Â  Â  Â  Â  Ã*Â  Ã*Â  latest master of those repositories and the cores
      approving on
      >Â  Â  Â  Â  those
      >Â  Â  Â  Â  Ã*Â  Ã*Â  projects should wait for a passing CI jobs before
      hitting
      >Â  Â  Â  Â  approve.
      >Â  Â  Â  Â  Ã*Â  Ã*Â  So how do people feel about doing this? can we
      give it a go? A
      >Â  Â  Â  Â  Ã*Â  Ã*Â  couple of people have already expressed an
      interest in doing
      >Â  Â  Â  Â  this
      >Â  Â  Â  Â  Ã*Â  Ã*Â  but I'd like to make sure were all in agreement
      before switching
      >Â  Â  Â  Â  it on.
      >
      >Â  Â  Â  Â  This seems to indicate that the tripleo jobs are
      non-voting, or at
      >Â  Â  Â  Â  least
      >Â  Â  Â  Â  won't block the gate -- so I'm fine with adding tripleo
      jobs to
      >Â  Â  Â  Â  ironic.
      >Â  Â  Â  Â  But if you want cores to wait/make sure they pass, then
      shouldn't they
      >Â  Â  Â  Â  be voting? (Guess I'm a bit confused.)
      >
      >Â  Â  Â  +1
      >
      >Â  Â  Â  I don't think it hurts to turn it on, but tbh I'm
      uncomfortable with the
      >Â  Â  Â  mental overhead of a non-voting job that I have to manually
      treat as a
      >Â  Â  Â  voting job. If it's stable enough to make it a voting job, I'd
      prefer we
      >Â  Â  Â  just make it voting. And if it's not then I'd like to see it
      be made
      >Â  Â  Â  stable enough to be a voting job and then make it voting.
      >
      >Â  Â  This is roughly where I sit as well -- if it's non-voting,
      experience
      >Â  Â  tells me that it will largely be ignored, and as such, isn't a
      good use of
      >Â  Â  resources.

      I'm sure you can appreciate it's something of a chicken/egg problem
      though
      - if everyone always ignores non-voting jobs, they never become voting.

      That effect is magnified with TripleO though, because it consumes so
      many
      OpenStack projects, any one of which has the capability to break our CI,
      so
      in an ideal world we'd have voting feedback on all-the-things, but
      that's
      not where we are right now due in large-part to the steady stream of
      regressions (from Heat, Ironic and other projects).
      >Â  Â  I haven't looked at tripleo or tripleoci in a while, so I wont
      assume that
      >Â  Â  my recollection of the CI jobs bears any resemblance to what
      exists today.
      >Â  Â  Could you explain what areas of ironic (or its subprojects) will
      be
      >Â  Â  covered by these tests?Ã*Â  If they are already covered by
      existing tests,
      >Â  Â  then I don't see the benefit of adding another job; conversely,
      if this is
      >Â  Â  testing areas we don't cover today, then there's probably value
      in running
      >Â  Â  tripleoci in a voting fashion for now and then moving that
      coverage into
      >Â  Â  ironic's project testing.

      I like to think of TripleO as a trunk-chasing "power user", and as such
      gives very valuable "user" feedback, including breaking things in
      exciting
      ways you hadn't anticipated in your project integration tests.

      This has, in the case of Heat at least, made TripleO an extremely
      effective
      "kitchen sink" stress test, and has uncovered numerous issues we failed
      to
      find with out internal tests (obviously we do add coverage when we find
      them).

      In the case of Ironic, I think the usage is somewhat less demanding, but
      no
      less "real world" - here's a good example for you:

      https://bugs.launchpad.net/ironic/+bug/1507738

      In this case, Ironic landed a change to master, which broke all existing
      deployments using Centos/RHEL derived distributions, so master Ironic
      has
      been broken for folks using those distros for over 6 weeks.

      I know in that case, the problem was really old ipxe image in the
      distro,
      and yes there were several possible workarounds, but as a developer who
      cares about users, I personally would rather get gate feedback than
      angry
      users on IRC/email when I unwittingly break the world for them ;)

      (note, I'm not assigning any blame above, it's one of *many* examples of
      unexpected breakage due to insufficient gate feedback of real usage
      accross
      many projects).

    Great example, Steve, and I agree that more and faster feedback from users
    into patches is a good thing. I'm also sad that it was broken for that
    long and no one raised the issue in our meeting until this week.
    This particular bug highlights a gap in Ironic's test coverage which I
    would be delighted if someone wants to close -- that we aren't testing
    support for RH-based distros. Closing that gap doesn't require TripleoCI
    at all; we should simply add a dsvm job for Ironic on Fedora, using a
    Fedora-based ramdisk. That will help prevent similar regressions in the
    future.
    Anyway, I have big reservations about putting TripleoCI on a path to ever
    gating Ironic patches. I started to bikeshed on that and then deleted it
    ... tldr; I believe it is important for this job to vote in a non-gating
    way. As a reviewer, I'm unlikely to pay attention to it if it doesn't
    vote, and there's a good reason for this:
    Non-voting jobs are used for experimentation. A non-voting job is a job
    that we want to vote, but which we don't trust enough yet. It has been
    promoted from the experimental pipeline to the check pipeline so that it
    gets a lot more runs and so that we can stabilize it enough to make it
    voting.


Ah, I think all we have here is a terminology mismatch around "non voting"
vs "non gating".

AFAIK what is being proposed is to reinstate the TripleO jobs so they *do*
vote on any change (+1/-1), but they do not block the gate, so we won't get
in the way if occasional outages happen.

Yes, this is exactly what I wanted to do, nothing would be changing fromhow it used to be, the tripleo jobs would vote with a -1/+1 butapprovers could still approve if they wanted to (i.e. not in the gate).The only thing I am asking we do differently to the way it used to be isan agreement to not blindly ignore the results of the tripleo job asignoring the results is what causes a lot of the breakages in the firstplace.

As for the gating side of the conversation, I don't think actual gatingis feasible at least in the short term. This would put a higher demandon our resources (a demand I'm not sure we have the hardware to meet)and I don't think we have the redundancy necessary in our cloud.

    I was going to suggest that tripleoci vote as a third party CI system (I
    know, it's not actually a third-party CI system, but I'd like to vote like
    one). And then I noticed that it used to do just that. [0] If I'm
    interpreting it correctly, the "gate-tripleo-ironic*" jobs voted from a
    separate account, left an informative -1, but did not block the gate.
    That's exactly what I would like in this case.


+1, I think that's what's being proposed, so we're in agreement! :)

Steve

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

Reply via email to