On Sun, Oct 31, 2010 at 5:31 PM, Greg Smith <g...@2ndquadrant.com> wrote: > Which is confusing because that file is certainly on the master still, and > hasn't even been considered archived yet much less removed: > > [mas...@pyramid pg_log]$ ls -l $PGDATA/pg_xlog > -rw------- 1 master master 16777216 Oct 31 16:29 000000010000000000000000 > drwx------ 2 master master 4096 Oct 4 12:28 archive_status > [mas...@pyramid pg_log]$ ls -l $PGDATA/pg_xlog/archive_status/ > total 0 > > So why isn't SR handing that data over? Is there some weird unhandled > corner case this exposes, but that wasn't encountered by the systems the > tutorial was tried out on? I'm not familiar enough with the SR internals to > reason out what's going wrong myself yet. Wanted to validate that Matt's > report wasn't a unique one though, with a bit more detail included about the > state the system gets into, and one potential fix (increasing > wal_keep_segments) already tried without improvement.
There seem to be two cases in the code that can generate that error. One, attempting to open the file returns ENOENT. Two, after the data has been read, the last-removed position returned by XLogGetLastRemoved precedes the data we think we just read, implying that it was overwritten while we were in the process of reading it. Does your installation have debugging symbols? Can you figure out which case is triggering (inside XLogRead) and what the values of log, seg, lastRemovedLog, and lastRemovedSeg are? Even if you lack debugging symbols, if you have gdb, you might be able figure out which case is triggering by looking at whether XLogGetLastRemoved gets called before the error message is printed (put a breakpoint on that function). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers