Hey Tom, Here are some recent logs from our system. Unfortunately, I didn't think to grab the logs at the time I killed those processes, and now they are gone. I found those processes by using ps, and then I killed them with a simple kill *processid*. Here are samples of our current log files:
FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode LOG: autovacuum launcher started LOG: database system is ready to accept connections PANIC: right sibling's left-link doesn't match: block 175337 links to 243096 instead of expected 29675 in index "dbmail_headervalue_3" STATEMENT: INSERT INTO dbmail_headervalue (headername_id, physmessage_id, headervalue) VALUES (4,12335778,'from [76.13.13.25] by n6.bullet.mail.ac4.yahoo.com with NNFMP; 25 Sep 2008 04:01:36 -0000') LOG: server process (PID 13888) was terminated by signal 6: Aborted LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: the database system is in recovery mode WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode LOG: all server processes terminated; reinitializing LOG: database system was interrupted; last known up at 2008-09-25 09:12:41 MDT LOG: database system was not properly shut down; automatic recovery in progress FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode ... FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode LOG: redo starts at 3A/2D0DEA78 LOG: record with zero length at 3A/2D1B8D68 LOG: redo done at 3A/2D1B8D3C LOG: last completed transaction was at log time 2008-09-25 09:12:45.204162-06 FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode ... FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode LOG: redo starts at 3A/2D1B8DA8 LOG: unexpected pageaddr 3A/2520A000 in log file 58, segment 45, offset 2138112 LOG: redo done at 3A/2D208660 LOG: last completed transaction was at log time 2008-09-25 09:12:47.971207-06 FATAL: the database system is in recovery mode FATAL: the database system is in recovery mode ... LOG: unexpected EOF on client connection LOG: unexpected EOF on client connection ERROR: missing chunk number 0 for toast value 554365 STATEMENT: SELECT messageblk, is_header FROM dbmail_messageblks WHERE physmessage_id = 12111760 ORDER BY messageblk_idnr LOG: unexpected EOF on client connection LOG: unexpected EOF on client connection To be honest, I don't know if all of these logs are relevant or not. I half way suspect that nagios causes the "unexpected EOF on client connection" notices, but I can't be certain. You also asked how it is being unstable. It drops connections seemingly at random. The error received when a connection is dropped is the following: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. Please let me know if there are any other questions I can answer for you. Thanks, BJ On Thu, Sep 25, 2008 at 7:24 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > "BJ Taylor" <[EMAIL PROTECTED]> writes: > > We are using version 8.3.1. And to be precise, when I started the > vacuum > > (analyze), I started it as a cron job to run daily around midnight. The > > next day I came in and checked on it and it was still running. Not > thinking > > that it would take more than a full 24 hours to run, I let it be, and the > > next day I came in and the server started acting weird. I believe the > > vacuum process continued to run, and a second vacuum process was started. > > The server became unstable, and refused incoming connections. > > Unstable how? What error did you get on the refused connections? What > was showing up in the postmaster log? > > > At which > > point, I killed all vacuum processes, and restarted postgresql. > > How did you do that killing exactly? > > > I believe > > it was somewhere during this process that the database became corrupted. > I > > am not certain what happens when two vacuum processes run at the same > time. > > Nothing of interest, it's done all the time. > > > That may have been the problem, or it may not have. Or it may have been > > that I killed the vacuum process in the middle of what it was doing. One > > way or another, the problem that we have now, is that we are unable to > get a > > dump of the database for backups, and the database seems less stable than > it > > was previously (dropping connections, and refusing connections seemingly > at > > random). > > Again, what errors are you getting exactly, and what shows up in the > postmaster log? > > regards, tom lane > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin >