We're running a number of warm standby instances on one server.  One of them 
stalled on Saturday.  When I found that this morning, I confirmed that the 
files were in the directory from which it should be pulling the WAL files.  The 
logs showed normal processing up to the stall, with no messages afterwards.  I 
tried a restart and it resumed warm standby status and caught up quickly.
 
It seems like this is probably a PostgreSQL bug of some sort, although maybe 
someone can spot a problem in our recovery script.  If it happens again, what 
information should I gather before the restart to help find the cause?
 
-Kevin
 
PGBACKUP:/var/pgsql/data/county/ozaukee/data/pg_log # tail 
postgresql-2007-07-07_000000.log
[2007-07-07 09:44:41.392 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D2" from archive
[2007-07-07 12:24:53.597 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D3" from archive
[2007-07-07 12:24:53.984 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D4" from archive
[2007-07-07 12:24:54.351 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D5" from archive
[2007-07-07 14:10:42.208 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D6" from archive
[2007-07-07 14:10:42.634 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D7" from archive
[2007-07-07 15:23:41.717 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D8" from archive
[2007-07-07 18:24:26.933 CDT] 5962 LOG:  restored log file 
"000000010000000C000000D9" from archive
[2007-07-07 18:24:27.692 CDT] 5962 LOG:  restored log file 
"000000010000000C000000DA" from archive
[2007-07-07 18:24:28.051 CDT] 5962 LOG:  restored log file 
"000000010000000C000000DB" from archive
PGBACKUP:/var/pgsql/data/county/ozaukee/data/pg_log # cat 
postgresql-2007-07-09_000000.log
[2007-07-09 08:21:50.200 CDT] 5904 LOG:  received fast shutdown request
[2007-07-09 08:21:50.201 CDT] 5962 FATAL:  could not restore file 
"000000010000000C000000DC" from archive: return code 15
[2007-07-09 08:21:50.485 CDT] 5904 LOG:  startup process (PID 5962) exited with 
exit code 1
[2007-07-09 08:21:50.485 CDT] 5904 LOG:  aborting startup due to startup 
process failure
[2007-07-09 08:21:50.555 CDT] 5960 LOG:  logger shutting down
PGBACKUP:/var/pgsql/data/county/ozaukee/data/pg_log # cat 
postgresql-2007-07-09_082151.log | grep -v 'starting up'
[2007-07-09 08:21:51.718 CDT] 19076 LOG:  database system was interrupted while 
in recovery at log time 2007-07-07 15:00:02 CDT
[2007-07-09 08:21:51.718 CDT] 19076 HINT:  If this has occurred more than once 
some data may be corrupted and you may need to choose an earlier recovery 
target.
[2007-07-09 08:21:51.718 CDT] 19076 LOG:  starting archive recovery
[2007-07-09 08:21:51.725 CDT] 19076 LOG:  restore_command = 
"/usr/local/backup/recovery.sh %f %p"
[2007-07-09 08:21:52.079 CDT] 19076 LOG:  restored log file 
"000000010000000C000000D9" from archive
[2007-07-09 08:21:52.079 CDT] 19076 LOG:  checkpoint record is at C/D9000020
[2007-07-09 08:21:52.079 CDT] 19076 LOG:  redo record is at C/D9000020; undo 
record is at 0/0; shutdown FALSE
[2007-07-09 08:21:52.079 CDT] 19076 LOG:  next transaction ID: 0/10700115; next 
OID: 1387338
[2007-07-09 08:21:52.079 CDT] 19076 LOG:  next MultiXactId: 1; next 
MultiXactOffset: 0
[2007-07-09 08:21:52.079 CDT] 19076 LOG:  automatic recovery in progress
[2007-07-09 08:21:52.081 CDT] 19076 LOG:  redo starts at C/D9000068
[2007-07-09 08:21:52.429 CDT] 19076 LOG:  restored log file 
"000000010000000C000000DA" from archive
[2007-07-09 08:21:52.860 CDT] 19076 LOG:  restored log file 
"000000010000000C000000DB" from archive


PGBACKUP:/var/pgsql/data/county/ozaukee/data # grep -Ev '^([[:space:]]+)?($|#)' 
postgresql.conf
listen_addresses = '*'          # what IP address(es) to listen on;
port = 5445                             # (change requires restart)
max_connections = 50                    # (change requires restart)
shared_buffers = 160MB                  # min 128kB or max_connections*16kB
temp_buffers = 10MB                     # min 800kB
work_mem = 30MB                         # min 64kB
maintenance_work_mem = 160MB            # min 1MB
max_fsm_pages = 204800          # min max_fsm_relations*16, 6 bytes each
bgwriter_lru_percent = 20.0             # 0-100% of LRU buffers scanned/round
bgwriter_lru_maxpages = 200             # 0-1000 buffers max written/round
bgwriter_all_percent = 10.0             # 0-100% of all buffers scanned/round
bgwriter_all_maxpages = 600             # 0-1000 buffers max written/round
wal_buffers = 1MB                       # min 32kB
checkpoint_segments = 10                # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30min              # range 30s-1h
archive_timeout = 3600          # force a logfile segment switch after this
seq_page_cost = 0.1                     # measured on an arbitrary scale
random_page_cost = 0.1                  # same scale as above
effective_cache_size = 3GB
redirect_stderr = on                    # Enable capturing of stderr into log
log_line_prefix = '[%m] %p %q<%u %d %r> '                       # Special 
values:
stats_block_level = on
stats_row_level = on
autovacuum = on                         # enable autovacuum subprocess?
autovacuum_naptime = 10s                # time between autovacuum runs
autovacuum_vacuum_threshold = 1 # min # of tuple updates before
autovacuum_analyze_threshold = 1        # min # of tuple updates before
datestyle = 'iso, mdy'
lc_messages = 'C'                       # locale for system error message
lc_monetary = 'C'                       # locale for monetary formatting
lc_numeric = 'C'                        # locale for number formatting
lc_time = 'C'                           # locale for time formatting
escape_string_warning = off
standard_conforming_strings = on
sql_inheritance = off


PGBACKUP:/var/pgsql/data/county/ozaukee/data # pg_controldata 
/var/pgsql/data/county/ozaukee/data/
pg_control version number:            822
Catalog version number:               200611241
Database system identifier:           5062210045799347511
Database cluster state:               in archive recovery
pg_control last modified:             Mon 09 Jul 2007 08:21:52 AM CDT
Current log file ID:                  8
Next log file segment:                150
Latest checkpoint location:           C/D9000020
Prior checkpoint location:            C/D9000020
Latest checkpoint's REDO location:    C/D9000020
Latest checkpoint's UNDO location:    C/D9000020
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/10700115
Latest checkpoint's NextOID:          1387338
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Time of latest checkpoint:            Sat 07 Jul 2007 03:00:02 PM CDT
Minimum recovery ending location:     8/9500A530
Maximum data alignment:               4
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               64-bit integers
Maximum length of locale name:        128
LC_COLLATE:                           C
LC_CTYPE:                             C


PGBACKUP:/var/pgsql/data/county/ozaukee/data # cat /usr/local/backup/recovery.sh
#! /bin/bash
# Pick out county name from the back of the path.
# The value of $PWD will be: /var/pgsql/data/county/<countyName>/data
countyName=`dirname $PWD`
countyName=`basename $countyName`
while [ ! -f /var/pgsql/data/county/$countyName/wal-files/$1.gz \
     -a ! -f /var/pgsql/data/county/$countyName/DONE \
     -o   -f /var/pgsql/data/county/$countyName/wal-files/rsync-in-progress ]
do
  if ! ( echo "$1" | grep -Eq '^[0-9A-F]{24}$' ) ; then
    exit 1
  fi
  sleep 10   #         /* wait for ~10 sec */
done
gunzip < /var/pgsql/data/county/$countyName/wal-files/$1.gz > "$2"


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Reply via email to