[HACKERS] Can't Restart ver 8.0b3

2004-12-22 Thread Dave Hartwig
Anyone,

I would not describe my situation as desprate (yet), but it is painful,
and I am looking for some information and a degree of confidence. I have
yet to communicate w/ our sysadmin about backup/restore.

We are using version 8.0 beta 3, on Red Hat Linux, to develop a product
that we plan to released in several months.

The problem began when I tried to make some table modifications.  A
CREATE statement hung. I canceled.  I then noticed many backend
processes hung idle in transaction.  I had everyone exit their
database activities, but several hung transactions still remained.  I
then did a pg_ctl stop followed by a pg_ctl start.  Here is the log
file activity:

(I got lots of these.  Probably people exiting their clients)
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection

(Then this; probably after pg_ctl stop)
LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down

(Now this after pg_ctl start)
LOG:  could not create IPv6 socket: Address family not supported by
protocol
LOG:  database system was shut down at 2004-12-21 16:30:25 EST
LOG:  checkpoint record is at 0/278B3430
LOG:  redo record is at 0/278B3430; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 884736; next OID: 306834
PANIC:  could not access status of transaction 884736
DETAIL:  could not read from file
/usr/local/pgsql8b3/data/pg_clog/ at offset 221184: Success
LOG:  startup process (PID 17774) was terminated by signal 6
LOG:  aborting startup due to startup process failure

It turns out that /usr/local/pgsql8b3/data/pg_clog/ exists, is
read/writable by postgres, and is exactly 221184 bytes long.  So I
started pursuing recovery documents.  Came across pg_resetxlog.   First
I tried without the force option.  It complained that it would not work.
Then, using the -f option; that changed nothing.   Then I began using
the -f, -x and -l options: as follows:

First some useful information for you:

$ ls -l /usr/local/pgsql8b3/data/pg_clog/
-rw---1 postgres postgres   221184 Dec 21 16:12 
 and
$ls -l /usr/local/pgsql8b3/data/pg_xlog
-rw---1 postgres postgres 16777216 Dec 22 06:16
000100010068
drwx--2 postgres postgres 4096 Sep 28 04:57 archive_status

Like the doc describes, I tried:
$ postgres pg_resetxlog -f -x 0x10 -l 0x1,0x1,0x65
/usr/local/pgsql8b3/data

 (Now I get this in the log)
LOG:  could not create IPv6 socket: Address family not supported by
protocol
LOG:  database system was shut down at 2004-12-21 20:11:12 EST
LOG:  checkpoint record is at 1/641C
LOG:  redo record is at 1/641C; undo record is at 1/641C;
shutdown TRUE
LOG:  next transaction ID: 1048576; next OID: 306834
PANIC:  could not access status of transaction 1048576
DETAIL:  could not open file /usr/local/pgsql8b3/data/pg_clog/0001: No
such file or directory
LOG:  startup process (PID 304) was terminated by signal 6
LOG:  aborting startup due to startup process failure

Which is correct, there is no file:
/usr/local/pgsql8b3/data/pg_clog/0001.

Also, this is the output of pg_controldata after running pg_resetxlog:

pg_control version number:74
Catalog version number:   200408031
Database system identifier:   4708838347263258567
Database cluster state:   in production
pg_control last modified: Wed 22 Dec 2004 06:17:55 AM EST
Current log file ID:  1
Next log file segment:105
Latest checkpoint location:   1/681C
Prior checkpoint location:0/0
Latest checkpoint's REDO location:1/681C
Latest checkpoint's UNDO location:1/681C
Latest checkpoint's TimeLineID:   1
Latest checkpoint's NextXID:  1048576
Latest checkpoint's NextOID:  306834
Time of latest checkpoint:Wed 22 Dec 2004 06:16:27 AM EST
Database block size:  8192
Blocks per segment of large relation: 131072
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum number of function arguments: 32
Date/time type storage:   floating-point numbers
Maximum length of locale name:128
LC_COLLATE:   en_US.UTF-8
LC_CTYPE: en_US.UTF-8


Can someone help me make sense of all this?

Thanks,
David

---(end of 

Re: [HACKERS] Can't Restart ver 8.0b3

2004-12-22 Thread Tom Lane
Dave Hartwig [EMAIL PROTECTED] writes:
 LOG:  next transaction ID: 884736; next OID: 306834
 PANIC:  could not access status of transaction 884736
 DETAIL:  could not read from file
 /usr/local/pgsql8b3/data/pg_clog/ at offset 221184: Success
 LOG:  startup process (PID 17774) was terminated by signal 6
 LOG:  aborting startup due to startup process failure

Hmm ... do we have a problem when the next XID is exactly at a page
boundary?  I'll look into that.

 LOG:  next transaction ID: 1048576; next OID: 306834
 PANIC:  could not access status of transaction 1048576
 DETAIL:  could not open file /usr/local/pgsql8b3/data/pg_clog/0001: No
 such file or directory

IIRC, pg_resetxlog doesn't have any provision to create new pg_clog
segments.  Which is probably an oversight, but it's easy enough to
do it by hand.  Do something like
dd bs=8k count=1 /dev/zero /usr/local/pgsql8b3/data/pg_clog/0001
and everything should be fine.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Can't Restart ver 8.0b3

2004-12-22 Thread Dave Hartwig
Bingo!  Thanks Tom.

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 22, 2004 11:33 AM
To: Dave Hartwig
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Can't Restart ver 8.0b3 

Dave Hartwig [EMAIL PROTECTED] writes:
 LOG:  next transaction ID: 884736; next OID: 306834
 PANIC:  could not access status of transaction 884736
 DETAIL:  could not read from file
 /usr/local/pgsql8b3/data/pg_clog/ at offset 221184: Success
 LOG:  startup process (PID 17774) was terminated by signal 6
 LOG:  aborting startup due to startup process failure

Hmm ... do we have a problem when the next XID is exactly at a page
boundary?  I'll look into that.

 LOG:  next transaction ID: 1048576; next OID: 306834
 PANIC:  could not access status of transaction 1048576
 DETAIL:  could not open file /usr/local/pgsql8b3/data/pg_clog/0001:
No
 such file or directory

IIRC, pg_resetxlog doesn't have any provision to create new pg_clog
segments.  Which is probably an oversight, but it's easy enough to
do it by hand.  Do something like
dd bs=8k count=1 /dev/zero
/usr/local/pgsql8b3/data/pg_clog/0001
and everything should be fine.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Can't Restart ver 8.0b3

2004-12-22 Thread Tom Lane
I wrote:
 Dave Hartwig [EMAIL PROTECTED] writes:
 LOG:  next transaction ID: 884736; next OID: 306834
 PANIC:  could not access status of transaction 884736
 DETAIL:  could not read from file
 /usr/local/pgsql8b3/data/pg_clog/ at offset 221184: Success
 LOG:  startup process (PID 17774) was terminated by signal 6
 LOG:  aborting startup due to startup process failure

 Hmm ... do we have a problem when the next XID is exactly at a page
 boundary?  I'll look into that.

Indeed, this is a bug I introduced into 8.0 awhile back :-(.  Many
thanks for the trouble report.

 IIRC, pg_resetxlog doesn't have any provision to create new pg_clog
 segments.  Which is probably an oversight, but it's easy enough to
 do it by hand.  Do something like
   dd bs=8k count=1 /dev/zero /usr/local/pgsql8b3/data/pg_clog/0001
 and everything should be fine.

There isn't any need to change pg_resetxlog, since the postmaster
can handle creating new segments at need.  But I'd modified StartupCLOG
in a way that assumed the current clog page already exists, which is
wrong in this boundary case.  Fixed for RC3.  In the meantime, creating
an all-zeroes page as per the above suggestion should get you out of
trouble.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org