On Oct25, 2011, at 14:51 , Simon Riggs wrote: > On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <f...@phlo.org> wrote: > >> What I don't understand is how this affects the CLOG. How does >> oldestActiveXID >> factor into CLOG initialization? > > It is an entirely different error.
Ah, OK. I assumed that you believe the wrong oldestActiveXID computation solved both the SUBTRANS-related *and* the CLOG-related errors, since you said "We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly" at the start of the mail. > Chris' clog error was caused by a file read error. The file was > opened, we did a seek within the file and then the call to read() > failed to return a complete page from the file. > > The xid shown is 22811359, which is the nextxid in the control file. > > So StartupClog() must have failed trying to read the clog page from disk. Yep. > That isn't a Hot Standby problem, a recovery problem nor is it certain > its a PostgreSQL problem. It's very likely that it's a PostgreSQL problem, though. It's probably not a pilot error since it happens even for backups taken with pg_basebackup(), so the only explanation other than a PostgreSQL bug is broken hardware or a pretty serious kernel/filesystem bug. > OTOH SlruPhysicalReadPage() does cope gracefully with missing clog > files during recovery, so maybe we can think of a way to make recovery > cope with a SLRU_READ_FAILED error gracefully also. Any ideas? As long as we don't understand how the CLOG-related errors happen in the first place, I think it's a bad idea to silence them. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers