Re: current build failure automated messages

2016-08-07 Thread Andreas Gustafsson
On July 20, Paul Goyette wrote:
> For me, I'd be interested in another message that detailed changes in 
> the sets of Pass/Fail ATF tests.  The "New test failures" and "Tests No 
> Longer Failing" lines are useful for me.

As you may have noticed, email is now sent to current-users when an
ATF test case goes from consistently passing to consistently failing,
for some definition of "consistently".

There is currently no corresponding email when a test case gets fixed.
-- 
Andreas Gustafsson, g...@gson.org


Re: current build failure automated messages

2016-08-06 Thread Andreas Gustafsson
> Please make it thread properly to the original failure...

Should be threaded now.
-- 
Andreas Gustafsson, g...@gson.org


Re: current build failure automated messages

2016-08-01 Thread Andreas Gustafsson
Joerg Sonnenberger wrote:
> Please make it thread properly to the original failure...

I'll see what I can do about that.
-- 
Andreas Gustafsson, g...@gson.org


Re: current build failure automated messages

2016-07-31 Thread Joerg Sonnenberger
On Sun, Jul 31, 2016 at 12:26:58PM +0300, Andreas Gustafsson wrote:
> As you may have noticed, the build server now sends email to
> current-users not only when the i386 build breaks, but also
> when it has been fixed, with the subject line
> 
>   Automated report: NetBSD-current/i386 build success

Please make it thread properly to the original failure...

Joerg


Re: current build failure automated messages

2016-07-31 Thread Andreas Gustafsson
As you may have noticed, the build server now sends email to
current-users not only when the i386 build breaks, but also
when it has been fixed, with the subject line

  Automated report: NetBSD-current/i386 build success

-- 
Andreas Gustafsson, g...@gson.org


Re: current build failure automated messages

2016-07-20 Thread Andreas Gustafsson
Paul Goyette wrote:
> For me, I'd be interested in another message that detailed changes in 
> the sets of Pass/Fail ATF tests.  The "New test failures" and "Tests No 
> Longer Failing" lines are useful for me.

The "new test failures" part I have already implemented some time ago.
I have been testing it by running it on my personal testbed with the
emails going to myself, and I have been forwarding them to the
relevant committers as needed.  I do intend to deploy it on b5 with
the emails going to current-users eventually.  New test failures
actually happen pretty rarely compared to build breakage anyway.

The hard part of this is filtering out spurious reports from tests
that fail randomly.  My current heuristic is that a test has a "new
failure" if the last three runs all failed and the 20 or so runs
before that all passed, and this seems to be working pretty well.
-- 
Andreas Gustafsson, g...@gson.org


Re: current build failure automated messages

2016-07-20 Thread Andreas Gustafsson
Greg Troxel wrote:
> I would like to see not only a build-ok message (on transition from fail
> to pass),

I have now implemented this, and it's being tested on my personal
testbed before I deploy it on the TNF one.

> but also a fail message on every fresh build during the
> failure time,

I think "on every fresh build" is too often, especially if many
commits are being made - it could end up being more than one email
per hour.  I'd be OK with one every 12 hours.

> with 3 separate subject lines for new-fail still-fail
> now-ok, so that these are easy to filter out in one's MUA.

Sure.
-- 
Andreas Gustafsson, g...@gson.org


Re: current build failure automated messages

2016-07-19 Thread Paul Goyette

On Tue, 19 Jul 2016, Greg Troxel wrote:



Andreas Gustafsson  writes:


I can appreciate that - different people have different preferences
and workflows.  I'd like to hear the opinions of other developers -
if there is a consensus that "build has been fixed" email notifications
would be useful, I can certainly add them.  And even if the consensus
is that they are not useful, I just might add them anyway, but
hardcode the recipient address as "kre" :)


I would like to see not only a build-ok message (on transition from fail
to pass), but also a fail message on every fresh build during the
failure time, with 3 separate subject lines for new-fail still-fail
now-ok, so that these are easy to filter out in one's MUA.


For me, I'd be interested in another message that detailed changes in 
the sets of Pass/Fail ATF tests.  The "New test failures" and "Tests No 
Longer Failing" lines are useful for me.



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


Re: current build failure automated messages

2016-07-19 Thread Greg Troxel

Andreas Gustafsson  writes:

> I can appreciate that - different people have different preferences
> and workflows.  I'd like to hear the opinions of other developers -
> if there is a consensus that "build has been fixed" email notifications
> would be useful, I can certainly add them.  And even if the consensus
> is that they are not useful, I just might add them anyway, but
> hardcode the recipient address as "kre" :)

I would like to see not only a build-ok message (on transition from fail
to pass), but also a fail message on every fresh build during the
failure time, with 3 separate subject lines for new-fail still-fail
now-ok, so that these are easy to filter out in one's MUA.


signature.asc
Description: PGP signature


Re: current build failure automated messages

2016-07-19 Thread Robert Elz
Date:Tue, 19 Jul 2016 12:40:34 +0300
From:Andreas Gustafsson 
Message-ID:  <22413.62866.53313.424...@guava.gson.org>


  | You don't have to screen scrape the HTML reports - you can get the
  | underlying data by anonymous rsync, as described in
  | 
  |   https://mail-index.netbsd.org/current-users/2015/10/18/msg028217.html

Thanks - I haven't seen that message (yet) - I have a kind of bizarre
way of reading NetBSD e-mail, I cherry pick current messages, reading any
where the subject looks interesting, and then I have two "current" pointers,
where I am really reading - one is back in early 2014 somewhere, and is
where I was up to in reading (more or less) everything - until a busy period
in 2013 (I think) made me just start the cherry picking of current messages
and only very occasionally moving the real read pointer forwards (though
it has moved 5 or 6 months - probably more, since then.)Then early this
year I just established a second read pointer, and went back to reading all
the messages again, just leaving a big gap in the messages I had read.
Then of course, that one slipped behind as well - it is currently up to
the start of May, and I'm back cherry picking again...  That one I am trying
to get caught up, but the gap between April 2014 (1st read pointer) and
Jan 2016 (where I started reading again) is not closing very fast - every
now and again when I'm bored I go process a couple of hundred messages
from back then, but it will take a while before I reach Oct 2015.)

  | With those, finding out if the
  | latest build succeeded is a one-liner in sh, for example:

Yes, just finding the success/fail or the build, even using the HTML
version, is trivial - but I also wanted the commit messages to be available.
I am not going to use them often, but a few times I have seen a build
failure, looked into it, and failed to work out how it should be fixed.
But then, obviously, it gets fixed by someone who knows how - by looking
at how it was fixed, hopefully I can learn more.   Some old dogs actually
appreciate new tricks...   For that I need to know what happened between the
last failure to build and when it started working again.

I will take a look at what is there and see if processing those logs
would be easier than the current processing of the HTML (which is not
really very difficult.)

  | The Python code that generates the existing HTML reports and email
  | notifications is also available if you want it.

Not really.   python is on my "I hope I never have to go there" list...

  | A new monthly report page is created when there is a build result to
  | report from building sources with a CVS source date in that month.

OK, thanks.

  | Since the internal date storage format of CVS has a "month" field, in
  | principle the above definition is complete without introducing the
  | concept of a time zone.

Not directly in the report generating code, but it is there in the way
CVS works, and is always UTC (or always on a unix type system anyway.)
I will adapt my script.   By adopting CVS dates for this, you're effectively
adopting UTC for the dates in the file names - which is as it should be.

  | so a commit made after 0:00 UTC on the 1st will trigger the
  | creation of a new report page once it has been built.

That's fine - a failure to fetch the log, if the script requests it before
it is created will (should, and almost does) just cause the script to
assume that there is no status change - which must be true, if it was not
changed (to fixed or to broken) after the last commit of the previous month,
and there has been no commit this month, then it is still in the same state.

I realise at the minute my script doesn't quite handle this properly, but
I will fix that, the file fetching part of it is the easy part...

  | Again, I think it is would be better to use the underlying data than
  | to screen scrape HTML reports that were never intended for machine
  | parsing.

Understood.   But for now, what I have works, so at least until the
generated HTML changes, I'm happy...

  | If I end up adding the "build fixed" notifications to the TNF test
  | server, it will be a reimplementation in Python anyway, sharing code
  | with the existing build failure notifications.

Sure, with the raw data, and knowledge how to use it (and much of the code
already existing) that would be a much better way.

kre

ps: if anyone has any actual interest in the script I posted, let me know,
and I will either send (via private e-mail) it after it is fixed, or
send it to the list again if there are many requests.



Re: current build failure automated messages

2016-07-19 Thread Andreas Gustafsson
Robert Elz wrote:
>   | If the page says "Build: OK" at the end, the issue has
>   | been fixed.  At least for me, this is less work overall than it would
>   | be to handle twice the number of emails.
> 
> I actually cannot imagine that being possible for me, one more e-mail to
> delete every few days is nothing, just switching to a browser and waiting
> for it to page in takes an order of magnitude longer - let alone the
> startup time if I don't have one running (which is not unusual) - plus
> that I can read e-mail on a text terminal trivially, and while that web
> page would not be hard for a text browser to process, it just seems wrong
> to me...

I can appreciate that - different people have different preferences
and workflows.  I'd like to hear the opinions of other developers -
if there is a consensus that "build has been fixed" email notifications
would be useful, I can certainly add them.  And even if the consensus
is that they are not useful, I just might add them anyway, but
hardcode the recipient address as "kre" :)

>   | If we're going to start sending more emails, I think adding notifications
>   | saying "build is still failing after 24 hours" would be more useful
>   | than "build now succeeding again".
> 
> I'm not sure about "more" useful, but that would certainly be useful,
> though I think 12 hours would be a better timeout - that's long enough
> for whoever broke it to have had time to fix things before causing others
> to be provoked into getting involved.

Noted.

> I am appending the script I am now using below.   One caveat - as is the way
> with these things - I made a couple of minor adjustments to the script after
> it worked earlier - and there has not been another failure since to validate
> it still works (it should, but ...)

You don't have to screen scrape the HTML reports - you can get the
underlying data by anonymous rsync, as described in

  https://mail-index.netbsd.org/current-users/2015/10/18/msg028217.html

This will probably yield more data than you want, but rsync has plenty
of options and can hopefully be coaxed into mirroring only the
"bracket.db" files, for example.  With those, finding out if the
latest build succeeded is a one-liner in sh, for example:

  find i386 -name bracket.db | sort | tail | xargs grep build_status | grep 
build_status=0

The Python code that generates the existing HTML reports and email
notifications is also available if you want it.

> Also, I have no idea of the timezone in which the log files are created, so
> I am currently running the script using just local time (for whoever runs it.)
> That only affects the name of the file that is fetched, and if right near the
> beginning of the month, the previous one - just in case the commit list that
> causes a failure, or corrects a failure, spans the month boundary).  As it
> is now, I am probably going to start attempting to fetch August's log before
> it first gets created (as August will come earlier for me than many of you).
> Of course, if the timezone for those files is from Japan or Australia then
> all would be fine (for me).   It should probably be, and probably is, UTC,
> but before I make the script work that way, I'd appreciate confirmation from
> someone who knows (that is: what timezone is used when deciding it is time
> to create a new log file -- i.e.: that a new month has started?)

A new monthly report page is created when there is a build result to
report from building sources with a CVS source date in that month.

Since the internal date storage format of CVS has a "month" field, in
principle the above definition is complete without introducing the
concept of a time zone.  In practice, the NetBSD CVS repository uses
UTC dates, so a commit made after 0:00 UTC on the 1st will trigger the
creation of a new report page once it has been built.

> It would also be easier if the html markup actually marked the content
> rather than just for appearance (class="build" means a different background
> colour, class="ok" just means "text is green" and class="fail" "text is red",
> and they're used that way... ideally there should be different classes for
> different purposes, and if several of them all just happen to result in the
> same appearance, that would be fine...)

Again, I think it is would be better to use the underlying data than
to screen scrape HTML reports that were never intended for machine
parsing.

> Also, the script, attached below, attempts to make a directory
>   /var/db/build-status
> to keep track of what the current status is, for each architecture monitored,
> (and some other stuff) but unless it is run as root (not recommended), it is
> probably going to fail...   So just make the directory by hand before running
> the script, and give it a suitable owner and permissions.   The first time
> the script is run for an architecture it will send a more or less useless
> e-mail which tells the current build status for that architecture (that it
> does 

Re: current build failure automated messages

2016-07-19 Thread Robert Elz
Date:Sat, 16 Jul 2016 12:15:41 +0300
From:Andreas Gustafsson 
Message-ID:  <22409.64317.493648.115...@guava.gson.org>

  | It would not be hard to implement, but I'm not sure it would be useful
  | enough to justify doubling the number of messages to the list.

With the current rate of build failures, "doubling the number of messages
to the list" means one extra message every few days normally - not something
I would be too concerned about.

  | What I do when I want to know if a reported build failure has been
  | fixed is to visit the web page whose URL is given at the very end of
  | the email.

I have used that if I want to get to the log of the actual build that
failed, when the extract in the e-mail is insufficient to work out what
actually happened.I guess it relates to how old I am these days, but
jumping on a browser is rarely my first reaction to anything - I mostly
predate www.* and basically consider http a total botch of a protocol, so,
I am almost always looking for an alternative.   Something simpler to deal
with.

  | If the page says "Build: OK" at the end, the issue has
  | been fixed.  At least for me, this is less work overall than it would
  | be to handle twice the number of emails.

I actually cannot imagine that being possible for me, one more e-mail to
delete every few days is nothing, just switching to a browser and waiting
for it to page in takes an order of magnitude longer - let alone the
startup time if I don't have one running (which is not unusual) - plus
that I can read e-mail on a text terminal trivially, and while that web
page would not be hard for a text browser to process, it just seems wrong
to me...

  | Most build failures are fixed quickly anyway.

Yes, that is exactly the point.   There was a build failure early this
morning (my time) - when I saw it (aside from the current-users message
about it) I had 2 e-mails from a script I created, one telling me of the
failure, and another telling me it was fixed.   Then I knew immediately
I could simply ignore the failure mail (the current-users message wasn't
even worth looking at.)   All I needed to see was the Subject lines of the
messages, and all the info was available for me to delete all of them
(in this cases I didn't, as this was the first failure since I got the
script finished, and I wanted to see how well it worked, or if it worked
at all ... which is also why I waited to reply here until now, I wanted
to have something productive, IMO anyway, to share...)

  | If we're going to start sending more emails, I think adding notifications
  | saying "build is still failing after 24 hours" would be more useful
  | than "build now succeeding again".

I'm not sure about "more" useful, but that would certainly be useful,
though I think 12 hours would be a better timeout - that's long enough
for whoever broke it to have had time to fix things before causing others
to be provoked into getting involved.

I am appending the script I am now using below.   One caveat - as is the way
with these things - I made a couple of minor adjustments to the script after
it worked earlier - and there has not been another failure since to validate
it still works (it should, but ...)

Currently I am just running it from cron every 15 minutes, but probably
better would be to have it triggered by the current-users build failure mail,
and then run it every N minutes until it goes to OK again, and then just
send the "now OK" mail and terminate.  That part (aside from the mail sending)
would get done by another script.

Also, I have no idea of the timezone in which the log files are created, so
I am currently running the script using just local time (for whoever runs it.)
That only affects the name of the file that is fetched, and if right near the
beginning of the month, the previous one - just in case the commit list that
causes a failure, or corrects a failure, spans the month boundary).  As it
is now, I am probably going to start attempting to fetch August's log before
it first gets created (as August will come earlier for me than many of you).
Of course, if the timezone for those files is from Japan or Australia then
all would be fine (for me).   It should probably be, and probably is, UTC,
but before I make the script work that way, I'd appreciate confirmation from
someone who knows (that is: what timezone is used when deciding it is time
to create a new log file -- i.e.: that a new month has started?)

It would also be easier if the html markup actually marked the content
rather than just for appearance (class="build" means a different background
colour, class="ok" just means "text is green" and class="fail" "text is red",
and they're used that way... ideally there should be different classes for
different purposes, and if several of them all just happen to result in the
same appearance, that would be fine...)

Also, the script, attached below, attempts to make a directory

Re: current build failure automated messages

2016-07-16 Thread Andreas Gustafsson
Robert Elz wrote:
> From time to time there are messages to current-users about
> build failures (the messages are generally useful even if sometimes
> the content - just what failed - can be most obscure ... but that's
> not the point of this message.)
> 
> I was wondering if it would be possible to also send "build now succeeding
> again" messages ?
>
> There are times when a build failure looks to be something I could
> investigate and perhaps fix - sometimes just looking at the commits
> around the time of the build failure report are enough to see that
> it already has been fixed, and so there is nothing to investigate,
> but other times it is not nearly so obvious.
> 
> A message indicating that the failing build is not failing any more
> seems like it would be useful to me (well, I know it would be useful
> to me, I suspect it might be useful to others as well.)
> 
> It could just contain that message, or better, as clearly from the
> failure reports, the info is available somewhere, a list of commits
> from the version that (first, since the previous success) failed, to
> the first version that succeeded building again.

It would not be hard to implement, but I'm not sure it would be useful
enough to justify doubling the number of messages to the list.

What I do when I want to know if a reported build failure has been
fixed is to visit the web page whose URL is given at the very end of
the email.  If the page says "Build: OK" at the end, the issue has
been fixed.  At least for me, this is less work overall than it would
be to handle twice the number of emails.

Most build failures are fixed quickly anyway.  If we're going to start
sending more emails, I think adding notifications saying "build is still
failing after 24 hours" would be more useful than "build now
succeeding again".
-- 
Andreas Gustafsson, g...@gson.org