Andrew Dunstan <[EMAIL PROTECTED]> writes:
> OK, for anyone that wants to play, I have created an extract that 
> contains a summary of every non-CVS-related failure we've had. It's a 
> single table looking like this:

I did some analysis on this data.  Attached is a text dump of a table
declared as

CREATE TABLE mreasons (
    sysname text,
    snapshot timestamp without time zone,
    branch text,
    reason text,
    known boolean
);

where the sysname/snapshot/branch data is taken from your table,
"reason" is a brief sketch of the failure, and "known" indicates
whether the cause is known ... although as I went along it sort
of evolved into "does this seem worthy of more investigation?".

I looked at every failure back through early December.  I'd intended to
go back further, but decided I'd hit a point of diminishing returns.
However, failures back to the beginning of July that matched grep
searches for recent symptoms are classified in the table.

The gross stats are: 2231 failures classified, 71 distinct reason
codes, 81 failures (with 18 reasons) that seem worthy of closer
investigation:

bfarm=# select reason,branch,max(snapshot) as latest, count(*) from mreasons 
where not known group by 1,2 order by 1,2 ;
                              reason                              |    branch   
  |       latest        | count 
------------------------------------------------------------------+---------------+---------------------+-------
 Input/output error - possible hardware problem                   | HEAD        
  | 2007-03-06 10:30:01 |     1
 No rule to make target                                           | HEAD        
  | 2007-02-08 15:30:01 |     6
 No rule to make target                                           | 
REL8_0_STABLE | 2007-02-28 03:15:02 |     9
 No rule to make target                                           | 
REL8_2_STABLE | 2006-12-17 20:00:01 |     1
 could not open relation with OID                                 | HEAD        
  | 2007-03-16 16:45:01 |     2
 could not open relation with OID                                 | 
REL8_1_STABLE | 2006-08-29 23:30:07 |     2
 createlang not found?                                            | 
REL8_1_STABLE | 2007-02-28 02:50:00 |     1
 irreproducible contrib/sslinfo build failure, likely not our bug | HEAD        
  | 2007-02-03 07:03:02 |     1
 irreproducible opr_sanity failure                                | HEAD        
  | 2006-12-18 19:15:02 |     2
 libintl.h rejected by configure                                  | HEAD        
  | 2007-01-11 20:35:00 |     3
 libintl.h rejected by configure                                  | 
REL8_0_STABLE | 2007-03-01 20:28:04 |    22
 postmaster failed to start                                       | 
REL7_4_STABLE | 2007-02-28 22:23:20 |     1
 postmaster failed to start                                       | 
REL8_0_STABLE | 2007-02-28 22:30:44 |     1
 random Solaris configure breakage                                | HEAD        
  | 2007-01-14 05:30:00 |     1
 random Windows breakage                                          | HEAD        
  | 2007-03-16 09:48:31 |     3
 random Windows breakage                                          | 
REL8_0_STABLE | 2007-03-15 03:15:09 |     7
 segfault during bootstrap                                        | HEAD        
  | 2007-03-12 23:03:03 |     1
 server does not shut down                                        | HEAD        
  | 2007-01-08 03:03:03 |     3
 tablespace is not empty                                          | HEAD        
  | 2007-02-24 15:00:10 |     6
 tablespace is not empty                                          | 
REL8_1_STABLE | 2007-01-25 02:30:01 |     2
 unexpected statement_timeout failure                             | HEAD        
  | 2007-01-25 05:05:06 |     1
 unexplained tsearch2 crash                                       | HEAD        
  | 2007-01-10 22:05:02 |     1
 weird DST-transition-like timestamp test failure                 | HEAD        
  | 2007-02-04 07:25:04 |     1
 weird assembler failure, likely not our bug                      | HEAD        
  | 2006-12-26 17:02:01 |     1
 weird assembler failure, likely not our bug                      | 
REL8_2_STABLE | 2007-02-03 23:47:01 |     1
 weird install failure                                            | HEAD        
  | 2007-01-25 12:35:00 |     1
(26 rows)

I think I know the cause of the recent 'could not open relation with
OID' failures in HEAD, but the rest of these maybe need a look.
Any volunteers?

Also, for completeness, the causes I wrote off as not interesting
(anymore, in some cases):

bfarm=# select reason,max(snapshot) as latest, count(*) from mreasons where 
known group by 1 order by 1 ;
                                reason                                |       
latest        | count 
----------------------------------------------------------------------+---------------------+-------
 DST transition test failure                                          | 
2007-03-13 04:04:47 |    26
 ISO-week-patch regression test breakage                              | 
2007-02-16 15:00:08 |    23
 No rule to make Makefile.port                                        | 
2007-03-02 12:30:02 |    40
 Out of disk space                                                    | 
2007-02-16 22:30:01 |    67
 Out of semaphores                                                    | 
2007-02-20 02:03:31 |    14
 Python not installed                                                 | 
2007-02-19 22:45:05 |     2
 Solaris random conn-refused bug                                      | 
2007-03-06 01:20:00 |    37
 TCP socket already in use                                            | 
2007-01-09 07:03:04 |    13
 Too many clients                                                     | 
2007-02-26 06:06:02 |    90
 Too many open files in system                                        | 
2007-02-27 20:30:59 |    17
 another icc crash                                                    | 
2007-02-03 10:50:01 |     1
 apparently a malloc bug                                              | 
2007-03-04 23:00:20 |    27
 bogus system clock setting                                           | 
1997-12-21 15:20:11 |     6
 breakage from changing := to = in makefiles                          | 
2007-02-10 02:15:01 |     4
 broken GUC patch                                                     | 
2007-03-13 15:15:01 |    92
 broken float8 hacking                                                | 
2007-01-06 20:00:09 |   120
 broken fsync-revoke patch                                            | 
2007-01-17 16:21:01 |    77
 broken inet hacking                                                  | 
2007-01-03 00:05:01 |     4
 broken log_error patch                                               | 
2007-01-28 08:15:01 |    15
 broken money patch                                                   | 
2007-01-03 19:05:01 |    78
 broken pg_regress change for msvc support                            | 
2007-01-19 22:03:00 |    46
 broken plpython patch                                                | 
2007-01-25 14:21:00 |    22
 broken sys_siglist patch                                             | 
2007-01-28 06:06:02 |    18
 bug in btree page split patch                                        | 
2007-02-08 11:35:03 |     7
 buildfarm pilot error                                                | 
2007-01-19 03:28:07 |    69
 cache flush bug in operator-family patch                             | 
2006-12-31 10:30:03 |     8
 ccache failure                                                       | 
2007-01-25 23:00:34 |     2
 could not create shared memory                                       | 
2007-02-13 07:00:05 |    32
 ecpg regression test teething pains                                  | 
2007-02-03 13:30:02 |   516
 failure to update PL expected files for may/can/might rewording      | 
2007-02-01 20:15:01 |     8
 failure to update contrib expected files for may/can/might rewording | 
2007-02-01 21:15:02 |    11
 failure to update expected files for may/can/might rewording         | 
2007-02-01 19:35:02 |     3
 icc "internal error"                                                 | 
2007-03-16 16:30:01 |    29
 image not found (possibly related to too-many-open-files)            | 
2006-10-25 08:05:02 |     1
 largeobject test bugs                                                | 
2007-02-17 23:35:03 |     4
 ld segfaulted                                                        | 
2007-03-16 15:30:02 |     3
 missing BYTE_ORDER definition for Solaris                            | 
2007-01-10 14:18:23 |     1
 pg_regress patch breakage                                            | 
2007-02-08 18:30:01 |     1
 plancache test race condition                                        | 
2007-03-16 11:15:01 |     5
 pltcl regression test broken by ORDER BY semantics tightening        | 
2007-01-09 03:15:01 |     9
 previous contrib test still running                                  | 
2007-02-13 20:49:33 |    21
 random Solaris breakage                                              | 
2007-01-05 17:20:01 |     1
 random Windows breakage                                              | 
2006-12-27 03:15:07 |     1
 random Windows permission-denied failures                            | 
2007-02-12 11:00:09 |     5
 random ccache breakage                                               | 
2007-01-04 01:34:33 |     1
 readline misconfiguration                                            | 
2007-02-12 17:19:41 |    33
 row-ordering discrepancy in rowtypes test                            | 
2007-02-10 03:00:02 |     3
 stats test failed                                                    | 
2007-03-14 13:00:02 |   319
 threaded Python library                                              | 
2007-01-10 04:05:02 |     6
 undefined symbol pg_mic2ascii                                        | 
2007-02-03 01:13:40 |   101
 unexpected signal 9                                                  | 
2006-12-31 06:30:02 |    15
 unportable uuid patch                                                | 
2007-01-31 17:30:01 |    16
 use of // comment                                                    | 
2007-02-16 09:23:02 |     1
 xml code teething problems                                           | 
2007-02-16 16:01:05 |    79
(54 rows)

Some of these might possibly be interesting to other people ...

                        regards, tom lane

Attachment: bin4d7U6AAlQx.bin
Description: mreasons.dump.gz

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to