Andrew Dunstan <[EMAIL PROTECTED]> writes: > OK, for anyone that wants to play, I have created an extract that > contains a summary of every non-CVS-related failure we've had. It's a > single table looking like this:
I did some analysis on this data. Attached is a text dump of a table
declared as
CREATE TABLE mreasons (
sysname text,
snapshot timestamp without time zone,
branch text,
reason text,
known boolean
);
where the sysname/snapshot/branch data is taken from your table,
"reason" is a brief sketch of the failure, and "known" indicates
whether the cause is known ... although as I went along it sort
of evolved into "does this seem worthy of more investigation?".
I looked at every failure back through early December. I'd intended to
go back further, but decided I'd hit a point of diminishing returns.
However, failures back to the beginning of July that matched grep
searches for recent symptoms are classified in the table.
The gross stats are: 2231 failures classified, 71 distinct reason
codes, 81 failures (with 18 reasons) that seem worthy of closer
investigation:
bfarm=# select reason,branch,max(snapshot) as latest, count(*) from mreasons
where not known group by 1,2 order by 1,2 ;
reason | branch
| latest | count
------------------------------------------------------------------+---------------+---------------------+-------
Input/output error - possible hardware problem | HEAD
| 2007-03-06 10:30:01 | 1
No rule to make target | HEAD
| 2007-02-08 15:30:01 | 6
No rule to make target |
REL8_0_STABLE | 2007-02-28 03:15:02 | 9
No rule to make target |
REL8_2_STABLE | 2006-12-17 20:00:01 | 1
could not open relation with OID | HEAD
| 2007-03-16 16:45:01 | 2
could not open relation with OID |
REL8_1_STABLE | 2006-08-29 23:30:07 | 2
createlang not found? |
REL8_1_STABLE | 2007-02-28 02:50:00 | 1
irreproducible contrib/sslinfo build failure, likely not our bug | HEAD
| 2007-02-03 07:03:02 | 1
irreproducible opr_sanity failure | HEAD
| 2006-12-18 19:15:02 | 2
libintl.h rejected by configure | HEAD
| 2007-01-11 20:35:00 | 3
libintl.h rejected by configure |
REL8_0_STABLE | 2007-03-01 20:28:04 | 22
postmaster failed to start |
REL7_4_STABLE | 2007-02-28 22:23:20 | 1
postmaster failed to start |
REL8_0_STABLE | 2007-02-28 22:30:44 | 1
random Solaris configure breakage | HEAD
| 2007-01-14 05:30:00 | 1
random Windows breakage | HEAD
| 2007-03-16 09:48:31 | 3
random Windows breakage |
REL8_0_STABLE | 2007-03-15 03:15:09 | 7
segfault during bootstrap | HEAD
| 2007-03-12 23:03:03 | 1
server does not shut down | HEAD
| 2007-01-08 03:03:03 | 3
tablespace is not empty | HEAD
| 2007-02-24 15:00:10 | 6
tablespace is not empty |
REL8_1_STABLE | 2007-01-25 02:30:01 | 2
unexpected statement_timeout failure | HEAD
| 2007-01-25 05:05:06 | 1
unexplained tsearch2 crash | HEAD
| 2007-01-10 22:05:02 | 1
weird DST-transition-like timestamp test failure | HEAD
| 2007-02-04 07:25:04 | 1
weird assembler failure, likely not our bug | HEAD
| 2006-12-26 17:02:01 | 1
weird assembler failure, likely not our bug |
REL8_2_STABLE | 2007-02-03 23:47:01 | 1
weird install failure | HEAD
| 2007-01-25 12:35:00 | 1
(26 rows)
I think I know the cause of the recent 'could not open relation with
OID' failures in HEAD, but the rest of these maybe need a look.
Any volunteers?
Also, for completeness, the causes I wrote off as not interesting
(anymore, in some cases):
bfarm=# select reason,max(snapshot) as latest, count(*) from mreasons where
known group by 1 order by 1 ;
reason |
latest | count
----------------------------------------------------------------------+---------------------+-------
DST transition test failure |
2007-03-13 04:04:47 | 26
ISO-week-patch regression test breakage |
2007-02-16 15:00:08 | 23
No rule to make Makefile.port |
2007-03-02 12:30:02 | 40
Out of disk space |
2007-02-16 22:30:01 | 67
Out of semaphores |
2007-02-20 02:03:31 | 14
Python not installed |
2007-02-19 22:45:05 | 2
Solaris random conn-refused bug |
2007-03-06 01:20:00 | 37
TCP socket already in use |
2007-01-09 07:03:04 | 13
Too many clients |
2007-02-26 06:06:02 | 90
Too many open files in system |
2007-02-27 20:30:59 | 17
another icc crash |
2007-02-03 10:50:01 | 1
apparently a malloc bug |
2007-03-04 23:00:20 | 27
bogus system clock setting |
1997-12-21 15:20:11 | 6
breakage from changing := to = in makefiles |
2007-02-10 02:15:01 | 4
broken GUC patch |
2007-03-13 15:15:01 | 92
broken float8 hacking |
2007-01-06 20:00:09 | 120
broken fsync-revoke patch |
2007-01-17 16:21:01 | 77
broken inet hacking |
2007-01-03 00:05:01 | 4
broken log_error patch |
2007-01-28 08:15:01 | 15
broken money patch |
2007-01-03 19:05:01 | 78
broken pg_regress change for msvc support |
2007-01-19 22:03:00 | 46
broken plpython patch |
2007-01-25 14:21:00 | 22
broken sys_siglist patch |
2007-01-28 06:06:02 | 18
bug in btree page split patch |
2007-02-08 11:35:03 | 7
buildfarm pilot error |
2007-01-19 03:28:07 | 69
cache flush bug in operator-family patch |
2006-12-31 10:30:03 | 8
ccache failure |
2007-01-25 23:00:34 | 2
could not create shared memory |
2007-02-13 07:00:05 | 32
ecpg regression test teething pains |
2007-02-03 13:30:02 | 516
failure to update PL expected files for may/can/might rewording |
2007-02-01 20:15:01 | 8
failure to update contrib expected files for may/can/might rewording |
2007-02-01 21:15:02 | 11
failure to update expected files for may/can/might rewording |
2007-02-01 19:35:02 | 3
icc "internal error" |
2007-03-16 16:30:01 | 29
image not found (possibly related to too-many-open-files) |
2006-10-25 08:05:02 | 1
largeobject test bugs |
2007-02-17 23:35:03 | 4
ld segfaulted |
2007-03-16 15:30:02 | 3
missing BYTE_ORDER definition for Solaris |
2007-01-10 14:18:23 | 1
pg_regress patch breakage |
2007-02-08 18:30:01 | 1
plancache test race condition |
2007-03-16 11:15:01 | 5
pltcl regression test broken by ORDER BY semantics tightening |
2007-01-09 03:15:01 | 9
previous contrib test still running |
2007-02-13 20:49:33 | 21
random Solaris breakage |
2007-01-05 17:20:01 | 1
random Windows breakage |
2006-12-27 03:15:07 | 1
random Windows permission-denied failures |
2007-02-12 11:00:09 | 5
random ccache breakage |
2007-01-04 01:34:33 | 1
readline misconfiguration |
2007-02-12 17:19:41 | 33
row-ordering discrepancy in rowtypes test |
2007-02-10 03:00:02 | 3
stats test failed |
2007-03-14 13:00:02 | 319
threaded Python library |
2007-01-10 04:05:02 | 6
undefined symbol pg_mic2ascii |
2007-02-03 01:13:40 | 101
unexpected signal 9 |
2006-12-31 06:30:02 | 15
unportable uuid patch |
2007-01-31 17:30:01 | 16
use of // comment |
2007-02-16 09:23:02 | 1
xml code teething problems |
2007-02-16 16:01:05 | 79
(54 rows)
Some of these might possibly be interesting to other people ...
regards, tom lane
bin4d7U6AAlQx.bin
Description: mreasons.dump.gz
---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
