Re: [openstack-dev] [openstack-qa] Post job failures

2014-10-09 Thread Joe Gordon
What about using graphite + logstash to power a post-job
/nightly-job/post-merge-periodic (the new thing we talked about in Germany)
dashboard?

There are a few different use cases for a dashboard for jobs that don't
report on gerrit changes.

* Track the success an failure rates over time
  * If I am maintaining a a job that doesn't vote anywhere, I will check
this daily
  * If I am part of the core team of a project where one feature is tested
post-merge, I want to periodically check this to see if that feature is
being maintained.
* Provide links to logs for failed jobs so the cause of the failure can be
investigated


We can do all this with graphite on logstash. Graphite for the tracking the
trends (something like http://jogo.github.io/gate/) and logstash to find
the logs for failed jobs (we can get around the 10 day logstash window by
saving the results instead of overwriting them every time we regenerate the
list of log links)

And if we really want some sort of alerts, there are a lot of graphite
tools (http://graphite.readthedocs.org/en/latest/tools.html) that can give
us alerts on metrics (alert me if the last X runs of job-foo-bar failed).


On Wed, Oct 1, 2014 at 9:46 AM, Jeremy Stanley  wrote:

> On 2014-10-01 10:39:40 -0400 (-0400), Matthew Treinish wrote:
> [...]
> > So I actually think as a first pass this would be the best way to
> > handle it. You can leave comments on a closed gerrit changes,
> [...]
>
> Not so easy as it sounds. Jobs in post are running on an arbitrary
> Git commit (more often than not, a merge commit), and mapping that
> back to a change in Gerrit is nontrivial.
> --
> Jeremy Stanley
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-qa] Post job failures

2014-10-01 Thread Jeremy Stanley
On 2014-10-01 10:39:40 -0400 (-0400), Matthew Treinish wrote:
[...]
> So I actually think as a first pass this would be the best way to
> handle it. You can leave comments on a closed gerrit changes,
[...]

Not so easy as it sounds. Jobs in post are running on an arbitrary
Git commit (more often than not, a merge commit), and mapping that
back to a change in Gerrit is nontrivial.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-qa] Post job failures

2014-10-01 Thread Matthew Treinish
Hi Josh,

Just a heads up that you shouldn't use this list for any discussion. We've moved
all of the discussion off this list into openstack-dev. The only reason we
haven't removed the openstack-qa list is so we have a separate address for the
periodic job results. (which honestly hasn't been the most effective approach
for handling those jobs)

On Wed, Oct 01, 2014 at 07:39:44PM +1000, Joshua Hesketh wrote:
> Hello QA team,
> 
> When a job fails in the post queue (which have jobs that are triggered on a
> change being merged) no warning or failure message is sent anywhere so it
> does so silently. This has caused an issue in the past[0] and there are
> likely more cases we don't know about.
> 
> We should report failures somewhere but since post jobs don't come from
> gerrit they can't be reported back to gerrit trivially. And even if we could
> it would be a comment on a closed change.

So I actually think as a first pass this would be the best way to handle it. You
can leave comments on a closed gerrit changes, it would still generate the same
notifications for people who have that enabled. It also would be picked up in
the ci results table on the top which I think might be convenient.

Long term I'm thinking we might need to make a separate dashboard view for all
of these jobs so we can track the results over time. I don't think instantaneous
reporting is actually important for post or periodic jobs because if it were
they'd be running in check or gate then. Back in the days when there was a
single jenkins, the jenkins dashboard could be used for this to a certain extent
which was useful. 

> 
> My feeling is an easy solution is to email somewhere when a post job fails.
> However I'm not sure where might be an appropriate location for that. Would
> this mailing list, for example, be a good place to start and then see how we
> go?

I really don't think this is the right approach. The issue is that most of these
things are a project specific failure and you'd either be spamming everyone 
that it failed or small set of people who aren't interested. I also feel that we
run the post jobs far too frequently to have it be sent to any ML.
> 
> I've set up the change here: https://review.openstack.org/#/c/125298/
> 
> Cheers,
> Josh
> 
> [0]
> http://lists.openstack.org/pipermail/openstack-dev/2014-September/046481.html
> 

-Matt Treinish


pgpTOBhqldUAY.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev