[GENERAL] Streaming Replication Error

Andrew Hannon Mon, 30 Apr 2012 14:24:34 -0700

Hello,

We were auditing our logs on one of our PG 9.0.6 standby servers that we use 
for nightly snapshotting. The high-level process is:


1. Stop PG
2. Snapshot
3. Start PG

Where "Snapshot" includes several steps to ensure data/filesystem integrity. 
The archive command on the master continues throughout this process, so the 
standby does have all of the log files. When we restart the cluster, we see the 
typical startup message about restoring files from the archive. However, we 
have noticed that occasionally the following occurs:

LOG:  restored log file "00000001000044560000007F" from archive
LOG:  restored log file "000000010000445600000080" from archive
cp: cannot stat `/ebs-raid0/archive/000000010000445600000081': No such file or 
directory
LOG:  unexpected pageaddr 4454/74000000 in log file 17494, segment 129, offset 0
cp: cannot stat `/ebs-raid0/archive/000000010000445600000081': No such file or 
directory
LOG:  streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 
000000010000445600000091 has already been removed
        
LOG:  restored log file "000000010000445600000091" from archive
LOG:  restored log file "000000010000445600000092" from archive
LOG:  restored log file "000000010000445600000093" from archive
…
LOG:  restored log file "000000010000445700000092" from archive
cp: cannot stat `/ebs-raid0/archive/000000010000445700000093': No such file or 
directory
LOG:  streaming replication successfully connected to primary

------

The concerning bit here is that we receive the FATAL message "requested WAL 
segment 000000010000445600000091 has already been removed" after streaming 
replication connects successfully, which seems to trigger an additional 
sequence of log restores.

The questions we have are:

1. Is our data intact? PG eventually starts up, and it seems like once the 
streaming suffers the FATAL error, it falls back to performing log restores.
2. What triggers this error? Too much time between log recovery, streaming 
startup and a low wal_keep_segments value (currently 128)?

Thank you very much,

Andrew Hannon
-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Streaming Replication Error

Reply via email to