Re: [HACKERS] Buildfarm alarms

2006-09-27 Thread Andrew Dunstan

I wrote:

Tom Lane wrote:
  

Andrew Dunstan [EMAIL PROTECTED] writes:


It could certainly be done. In general, I have generally taken the view
that owners have the responsibility for monitoring their own machines.
  

Sure, but providing them tools to do that seems within buildfarm's
purview.

For some types of failure, the buildfarm script could make a local
notification without bothering the server --- but a timeout on the
server side would cover a wider variety of failures, including this
machine is dead and ought to be removed from the farm.




Nothing gets removed. If a machine does not report on a branch for 30 days
it drops off the dashboard, but apart from that it is a retained historic
aretfact. This buildup in history has been gradually slowing down the
dashboard, in fact, but Ian Barwick tells me that he has rewritten my
lousy SQL to make it fast again, so we'll soon get that working better.

Anyway, I think we can do something fairly simply for these alarms. We'll
just have a special stanza in the config file, and a cron job that checks,
say, once a day, to see if we have exceeded the alarm period on any
machine/branch combination.

  


OK, I have a gadget to do this in place.


It looks at the config of the last build registered on each branch for a 
stanza called 'alerts' that would look like this:


 alerts = {
   HEAD = { alert_after = 24, alert_every = 48 },
   REL8_1_STABLE = { alert_after = 168, alert_every = 48 },
 }

The settings are in hours, so this says that if we haven't seen  a HEAD 
build in 1 day or a stable branch build in 1 week, alert the owner by 
email, and keep repeating the alert in each case every 2 days.


If some intrepid buildfarm owner wants to test this out by using low 
settings that would trigger an alert that would be good - the cron job 
runs every hour.


cheers

andrew


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Buildfarm alarms

2006-09-27 Thread Dave Page
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Andrew Dunstan
 Sent: 27 September 2006 14:56
 To: [EMAIL PROTECTED]
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Buildfarm alarms
 
 If some intrepid buildfarm owner wants to test this out by using low 
 settings that would trigger an alert that would be good - the 
 cron job 
 runs every hour.

Dunno about intrepid, but I've added the following to Snake:

alerts = {
 HEAD = { alert_after = 1, alert_every = 2 },
 REL8_1_STABLE = { alert_after = 168, alert_every = 48 },
 REL8_0_STABLE = { alert_after = 168, alert_every = 48 },
}

Thanks for your work on this.

Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Buildfarm alarms

2006-09-27 Thread Kris Jurka



On Wed, 27 Sep 2006, Andrew Dunstan wrote:

The settings are in hours, so this says that if we haven't seen  a HEAD build 
in 1 day or a stable branch build in 1 week, alert the owner by email, and 
keep repeating the alert in each case every 2 days.




How does this know if there wasn't a build because nothing in CVS changed 
over that time period?  Especially on the back branches it is normal to go 
weeks without a build.


Kris Jurka

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Buildfarm alarms

2006-09-27 Thread Andrew Dunstan

Kris Jurka wrote:



On Wed, 27 Sep 2006, Andrew Dunstan wrote:

The settings are in hours, so this says that if we haven't seen  a 
HEAD build in 1 day or a stable branch build in 1 week, alert the 
owner by email, and keep repeating the alert in each case every 2 days.




How does this know if there wasn't a build because nothing in CVS 
changed over that time period?  Especially on the back branches it is 
normal to go weeks without a build.


Kris Jurka



Indeed. The short answer is it doesn't. But there is a buildfarm config 
option to allow you to force a build every so often even if there hasn't 
been a CVS change, and I'm thinking of providing an option for this to 
be branch specific. The you would make this setting shorter than your 
alarm period for any branch you had an alarm set for.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [Pgbuildfarm-members] [HACKERS] Buildfarm alarms

2006-09-27 Thread Jim C. Nasby
On Wed, Sep 27, 2006 at 01:55:21PM -0400, Andrew Dunstan wrote:
 Kris Jurka wrote:
 
 
  On Wed, 27 Sep 2006, Andrew Dunstan wrote:
 
  The settings are in hours, so this says that if we haven't seen  a 
  HEAD build in 1 day or a stable branch build in 1 week, alert the 
  owner by email, and keep repeating the alert in each case every 2 days.
 
 
  How does this know if there wasn't a build because nothing in CVS 
  changed over that time period?  Especially on the back branches it is 
  normal to go weeks without a build.
 
  Kris Jurka
 
 
 Indeed. The short answer is it doesn't. But there is a buildfarm config 
 option to allow you to force a build every so often even if there hasn't 
 been a CVS change, and I'm thinking of providing an option for this to 
 be branch specific. The you would make this setting shorter than your 
 alarm period for any branch you had an alarm set for.

Another possibility is just having the client report no CVS changes
detected to the server, as a form of a ping.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [Pgbuildfarm-members] [HACKERS] Buildfarm alarms

2006-09-27 Thread Andrew Dunstan

Jim C. Nasby wrote:


Another possibility is just having the client report no CVS changes
detected to the server, as a form of a ping.
  


I am not going to re-architect the buildfarm client and server for this. 
I think what I have done will be quite sufficient. I suspect most people 
will only want alarms on HEAD anyway.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Buildfarm alarms

2006-09-26 Thread Dave Page
 

 -Original Message-
 From: Michael Meskes [mailto:[EMAIL PROTECTED] 
 Sent: 26 September 2006 08:57
 To: Joachim Wieland
 Cc: Dave Page; [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Buildfarm alarms
 
 On Mon, Sep 25, 2006 at 09:20:19PM +0200, Joachim Wieland wrote:
  Michael, could you please check and apply?
 
 Works for me, so I applied it. But then I only tested on Linux. :-)

OK, I now see just one, date format related failure:

== running regression test queries==
/usr/local/src/postgresql-8.2-dev/src/interfaces/ecpg/test/./tmp_check/i
nstall//usr/local/pgsql/bin/createuser -R -S -D -q regressuser1
/usr/local/src/postgresql-8.2-dev/src/interfaces/ecpg/test/./tmp_check/i
nstall//usr/local/pgsql/bin/createuser -R -S -D -q connectuser
/usr/local/src/postgresql-8.2-dev/src/interfaces/ecpg/test/./tmp_check/i
nstall//usr/local/pgsql/bin/createuser -R -S -D -q connectdb
testing connect/test1.pgc  ... ok
testing connect/test2.pgc  ... ok
testing connect/test3.pgc  ... ok
testing connect/test4.pgc  ... ok
testing connect/test5.pgc  ... ok
testing compat_informix/charfuncs.pgc  ... ok
testing compat_informix/dec_test.pgc   ... ok
testing compat_informix/rfmtdate.pgc   ... ok
testing compat_informix/rfmtlong.pgc   ... ok
testing compat_informix/rnull.pgc  ... ok
testing compat_informix/test_informix.pgc  ... ok
testing compat_informix/test_informix2.pgc ... ok
testing preproc/comment.pgc... ok
testing preproc/define.pgc ... ok
testing preproc/init.pgc   ... ok
testing preproc/type.pgc   ... ok
testing preproc/variable.pgc   ... FAILED (log, output)
testing preproc/whenever.pgc   ... ok
testing pgtypeslib/dt_test.pgc ... ok
testing pgtypeslib/dt_test2.pgc... ok
testing pgtypeslib/num_test.pgc... ok
testing pgtypeslib/num_test2.pgc   ... ok
testing sql/array.pgc  ... ok
testing sql/binary.pgc ... ok
testing sql/code100.pgc... ok
testing sql/copystdout.pgc ... ok
testing sql/define.pgc ... ok
testing sql/desc.pgc   ... ok
testing sql/dynalloc.pgc   ... ok
testing sql/dynalloc2.pgc  ... ok
testing sql/dyntest.pgc... ok
testing sql/execute.pgc... ok
testing sql/fetch.pgc  ... ok
testing sql/func.pgc   ... ok
testing sql/indicators.pgc ... ok
testing sql/quote.pgc  ... ok
testing sql/show.pgc   ... ok
testing sql/update.pgc ... ok
testing thread/thread.pgc  ... ok
testing thread/thread_implicit.pgc ... ok
== shutting down postmaster   ==
server stopped
make[1]: *** [check] Error 1
make[1]: Leaving directory
`/usr/local/src/postgresql-8.2-dev/src/interfaces/ecpg/test'
make: *** [check] Error 2



*** expected/preproc-variable.stderrFri Sep  8 10:03:40 2006
--- results/preproc-variable.stderr Tue Sep 26 09:51:00 2006
***
*** 44,50 
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGstore_result: line 68: allocating memory for 1 tuples
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGget_data line 68: RESULT: 07-14-1987 offset: -1 array:
Yes
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGget_data line 68: RESULT: 3 offset: -1 array: Yes
  [NO_PID]: sqlca: code: 0, state: 0
--- 44,50 
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGstore_result: line 68: allocating memory for 1 tuples
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGget_data line 68: RESULT: 14-07-1987 offset: -1 array:
Yes
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGget_data line 68: RESULT: 3 offset: -1 array: Yes
  [NO_PID]: sqlca: code: 0, state: 0
***
*** 60,66 
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGstore_result: line 68: allocating memory for 1 tuples
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGget_data line 68: RESULT: 07-14-1987 offset: -1 array:
Yes
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGget_data line 68: RESULT: 3 offset: -1 array: Yes
  [NO_PID]: sqlca: code: 0, state: 0
--- 60,66 
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGstore_result: line 68: allocating memory for 1 tuples
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGget_data line 68: RESULT: 14-07-1987 offset: -1 array:
Yes
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGget_data line 68: RESULT: 3 offset

Re: [HACKERS] Buildfarm alarms

2006-09-26 Thread Michael Meskes
On Tue, Sep 26, 2006 at 09:57:16AM +0100, Dave Page wrote:
 OK, I now see just one, date format related failure:
 ...

Did you run it with Joachim's patch or with up-to-date CVS checkout? It
seems to me that you do not have the latest changes to CVS. We added a
set datestyle to variable.pgc that should fix this failure.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Buildfarm alarms

2006-09-26 Thread Dave Page
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave Page
 Sent: 26 September 2006 10:41
 To: Michael Meskes
 Cc: Joachim Wieland; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Buildfarm alarms
 
  
 
  -Original Message-
  From: Michael Meskes [mailto:[EMAIL PROTECTED] 
  Sent: 26 September 2006 10:39
  To: Dave Page
  Cc: Joachim Wieland; pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] Buildfarm alarms
  
  On Tue, Sep 26, 2006 at 09:57:16AM +0100, Dave Page wrote:
   OK, I now see just one, date format related failure:
   ...
  
  Did you run it with Joachim's patch or with up-to-date CVS 
  checkout? It
  seems to me that you do not have the latest changes to CVS. 
 We added a
  set datestyle to variable.pgc that should fix this failure.
 
 No, I used Joachim's patch as anoncvs hadn't caught up. I'll run it
 again - thanks.

Yep - passes all tests now :-)

Thanks, Dave.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Buildfarm alarms

2006-09-26 Thread Joachim Wieland
On Mon, Sep 25, 2006 at 02:23:39PM +0100, Dave Page wrote:
 testing connect/test1.pgc  ... FAILED (log)
 testing compat_informix/dec_test.pgc   ... FAILED (output)
 testing preproc/variable.pgc   ... FAILED (log, output)
 testing pgtypeslib/dt_test.pgc ... FAILED (log, output)
 testing pgtypeslib/num_test.pgc... FAILED (output)
 testing pgtypeslib/num_test2.pgc   ... FAILED (output)

All should be fine now. I tested successfully with both cygwin and MinGW.


Joachim

-- 
Joachim Wieland  [EMAIL PROTECTED]
   GPG key available

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Buildfarm alarms

2006-09-26 Thread Dave Page
 

 -Original Message-
 From: Michael Meskes [mailto:[EMAIL PROTECTED] 
 Sent: 26 September 2006 10:39
 To: Dave Page
 Cc: Joachim Wieland; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Buildfarm alarms
 
 On Tue, Sep 26, 2006 at 09:57:16AM +0100, Dave Page wrote:
  OK, I now see just one, date format related failure:
  ...
 
 Did you run it with Joachim's patch or with up-to-date CVS 
 checkout? It
 seems to me that you do not have the latest changes to CVS. We added a
 set datestyle to variable.pgc that should fix this failure.

No, I used Joachim's patch as anoncvs hadn't caught up. I'll run it
again - thanks.

Regards Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Buildfarm alarms

2006-09-25 Thread Joachim Wieland
On Sun, Sep 24, 2006 at 11:51:49AM +0100, Dave Page wrote:
 wrong to the monitoring processes - what had happened was that both had
 hung or got in an inifinite loop in ECPG-check, the machine was running
 just fine

Is this still an issue? Can you provide more information? What happens if you
run ecpg-check manually? Which test hangs?


Joachim

-- 
Joachim Wieland  [EMAIL PROTECTED]
   GPG key available

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Buildfarm alarms

2006-09-25 Thread Dave Page
 

 -Original Message-
 From: Joachim Wieland [mailto:[EMAIL PROTECTED] 
 Sent: 25 September 2006 13:25
 To: Dave Page
 Cc: Andrew Dunstan; pgsql-hackers@postgresql.org; 
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Buildfarm alarms
 
 On Sun, Sep 24, 2006 at 11:51:49AM +0100, Dave Page wrote:
  wrong to the monitoring processes - what had happened was 
 that both had
  hung or got in an inifinite loop in ECPG-check, the machine 
 was running
  just fine
 
 Is this still an issue? Can you provide more information? 
 What happens if you
 run ecpg-check manually? Which test hangs?

Dt_test is the one that hangs - though in actual fact what is happening
is that it's crashing and popping up a 'do you wanna debug' dialogue
which doesn't get seen in a non-interactive buildfarm run. After saying
no to that, the complete list of failed tests is (see Snake/Bandicoot's
logs for more info):

testing connect/test1.pgc  ... FAILED (log)
testing compat_informix/dec_test.pgc   ... FAILED (output)
testing preproc/variable.pgc   ... FAILED (log, output)
testing pgtypeslib/dt_test.pgc ... FAILED (log, output)
testing pgtypeslib/num_test.pgc... FAILED (output)
testing pgtypeslib/num_test2.pgc   ... FAILED (output)

Regards, Dave.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Buildfarm alarms

2006-09-24 Thread Dave Page
 

 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED] 
 Sent: 24 September 2006 03:13
 To: Dave Page
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: Buildfarm alarms
 
 It could certainly be done. In general, I have generally 
 taken the view
 that owners have the responsibility for monitoring their own machines.
 I'll think about it some more.

We are monitoring the machine, however in this case nothing appeared
wrong to the monitoring processes - what had happened was that both had
hung or got in an inifinite loop in ECPG-check, the machine was running
just fine, and a glance at the process list showed everything I'd expect
to see during a normal run. A system for detecting lack of reports from
a member would definitely have helped in this case.

Regards, Dave

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Buildfarm alarms

2006-09-24 Thread Andrew Dunstan
Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
 It could certainly be done. In general, I have generally taken the view
 that owners have the responsibility for monitoring their own machines.

 Sure, but providing them tools to do that seems within buildfarm's
 purview.

 For some types of failure, the buildfarm script could make a local
 notification without bothering the server --- but a timeout on the
 server side would cover a wider variety of failures, including this
 machine is dead and ought to be removed from the farm.


Nothing gets removed. If a machine does not report on a branch for 30 days
it drops off the dashboard, but apart from that it is a retained historic
aretfact. This buildup in history has been gradually slowing down the
dashboard, in fact, but Ian Barwick tells me that he has rewritten my
lousy SQL to make it fast again, so we'll soon get that working better.

Anyway, I think we can do something fairly simply for these alarms. We'll
just have a special stanza in the config file, and a cron job that checks,
say, once a day, to see if we have exceeded the alarm period on any
machine/branch combination.

cheers

andrew




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Buildfarm alarms

2006-09-23 Thread Dave Page
Hi Andrew,

I'm just investigating a problem with beta 1 running on Windows 2K and
XP, and noticed that neither Snake or Bandicoot have built -HEAD for
nearly 3 weeks. I'm investigating why and will fix the problem, but it
strikes me that what would be useful is an alarm email from the server
to note that a run hasn't been reported for a while would have helped
spot this earlier. This could be configured with an admin-specified
maximum number of days between reports to allow for those machines that
connect far less frequently.

Does that sound feasible to you?

Regards, Dave.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Buildfarm alarms

2006-09-23 Thread Andrew Dunstan
Dave Page wrote:

 I'm just investigating a problem with beta 1 running on Windows 2K and
 XP, and noticed that neither Snake or Bandicoot have built -HEAD for
 nearly 3 weeks. I'm investigating why and will fix the problem, but it
 strikes me that what would be useful is an alarm email from the server
 to note that a run hasn't been reported for a while would have helped
 spot this earlier. This could be configured with an admin-specified
 maximum number of days between reports to allow for those machines that
 connect far less frequently.

 Does that sound feasible to you?




It could certainly be done. In general, I have generally taken the view
that owners have the responsibility for monitoring their own machines.
I'll think about it some more.

cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Buildfarm alarms

2006-09-23 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 It could certainly be done. In general, I have generally taken the view
 that owners have the responsibility for monitoring their own machines.

Sure, but providing them tools to do that seems within buildfarm's
purview.

For some types of failure, the buildfarm script could make a local
notification without bothering the server --- but a timeout on the
server side would cover a wider variety of failures, including this
machine is dead and ought to be removed from the farm.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match