Anyone,

I would not describe my situation as desprate (yet), but it is painful,
and I am looking for some information and a degree of confidence. I have
yet to communicate w/ our sysadmin about backup/restore.

We are using version 8.0 beta 3, on Red Hat Linux, to develop a product
that we plan to released in several months.

The problem began when I tried to make some table modifications.  A
CREATE statement hung. I canceled.  I then noticed many backend
processes hung "idle in transaction".  I had everyone exit their
database activities, but several hung transactions still remained.  I
then did a "pg_ctl stop" followed by a "pg_ctl start".  Here is the log
file activity:

    (I got lots of these.  Probably people exiting their clients)
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection

    (Then this; probably after "pg_ctl stop")
LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down

    (Now this after "pg_ctl start")
LOG:  could not create IPv6 socket: Address family not supported by
protocol
LOG:  database system was shut down at 2004-12-21 16:30:25 EST
LOG:  checkpoint record is at 0/278B3430
LOG:  redo record is at 0/278B3430; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 884736; next OID: 306834
PANIC:  could not access status of transaction 884736
DETAIL:  could not read from file
"/usr/local/pgsql8b3/data/pg_clog/0000" at offset 221184: Success
LOG:  startup process (PID 17774) was terminated by signal 6
LOG:  aborting startup due to startup process failure

It turns out that "/usr/local/pgsql8b3/data/pg_clog/0000" exists, is
read/writable by postgres, and is exactly 221184 bytes long.  So I
started pursuing recovery documents.  Came across pg_resetxlog.   First
I tried without the force option.  It complained that it would not work.
Then, using the -f option; that changed nothing.   Then I began using
the -f, -x and -l options: as follows:

First some useful information for you:

$ ls -l /usr/local/pgsql8b3/data/pg_clog/
-rw-------    1 postgres postgres   221184 Dec 21 16:12 0000
     and
$ls -l /usr/local/pgsql8b3/data/pg_xlog
-rw-------    1 postgres postgres 16777216 Dec 22 06:16
000000010000000100000068
drwx------    2 postgres postgres     4096 Sep 28 04:57 archive_status

Like the doc describes, I tried:
$ postgres pg_resetxlog -f -x 0x100000 -l 0x1,0x1,0x65
/usr/local/pgsql8b3/data

     (Now I get this in the log)
LOG:  could not create IPv6 socket: Address family not supported by
protocol
LOG:  database system was shut down at 2004-12-21 20:11:12 EST
LOG:  checkpoint record is at 1/6400001C
LOG:  redo record is at 1/6400001C; undo record is at 1/6400001C;
shutdown TRUE
LOG:  next transaction ID: 1048576; next OID: 306834
PANIC:  could not access status of transaction 1048576
DETAIL:  could not open file "/usr/local/pgsql8b3/data/pg_clog/0001": No
such file or directory
LOG:  startup process (PID 304) was terminated by signal 6
LOG:  aborting startup due to startup process failure

Which is correct, there is no file:
"/usr/local/pgsql8b3/data/pg_clog/0001".

Also, this is the output of pg_controldata after running pg_resetxlog:

pg_control version number:            74
Catalog version number:               200408031
Database system identifier:           4708838347263258567
Database cluster state:               in production
pg_control last modified:             Wed 22 Dec 2004 06:17:55 AM EST
Current log file ID:                  1
Next log file segment:                105
Latest checkpoint location:           1/6800001C
Prior checkpoint location:            0/0
Latest checkpoint's REDO location:    1/6800001C
Latest checkpoint's UNDO location:    1/6800001C
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          1048576
Latest checkpoint's NextOID:          306834
Time of latest checkpoint:            Wed 22 Dec 2004 06:16:27 AM EST
Database block size:                  8192
Blocks per segment of large relation: 131072
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum number of function arguments: 32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_US.UTF-8
LC_CTYPE:                             en_US.UTF-8


Can someone help me make sense of all this?

Thanks,
David

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to