Hi, Abhijit and I investigated a customer problem which has showed that crash recovery + unlogged relations don't always work well together:
A condensed version of how crash recovery works is: StartupXLOG() { ... if (ControlFile->state != DB_SHUTDOWNED) InRecovery = true; if (InRecovery) { ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP); .... /* perform crash recovery till the end of WAL */ ... CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE); ... } PreallocXlogFiles(EndOfLog); if (InRecovery) ResetUnloggedRelations(UNLOGGED_RELATION_INIT); ... /* finish startup */ } the second important part is: CreateCheckPoint(flags) { ... if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) shutdown = true; ... if (shutdown) ControlFile->state = DB_SHUTDOWNED; UpdateControlFile(); ... } If you consider a crash due to ENOSPC the problem is that first crash recovery is performed. Then a checkpoint is performed which is marked as END_OF_RECOVERY - which marks the database as being properly shut down! So far, while not pretty, so good. The problem is that if we crash after the CreateCheckPoint(), e.g. because of xlog preallocation or the new files created in ResetUnloggedRelations(), the server will restart *but* will *not* perform crash recovery anymore as pg_control marked as DB_SHUTDOWNED! That leaves you with a database which has all the _init forks, but not the main forks... Leading to scary an unexpected errors. Should somebody google this: The problem can be fixed by forcing the server into crash recovery again using an immediate shutdown. Personally I think it's quite the mistake that END_OF_RECOVERY checkpoints are treated as shutdown checkpoints. The claimed reason that: * * Note that we write a shutdown checkpoint rather than an on-line * one. This is not particularly critical, but since we may be * assigning a new TLI, using a shutdown checkpoint allows us to have * the rule that TLI only changes in shutdown checkpoints, which * allows some extra error checking in xlog_redo. * and /* * An end-of-recovery checkpoint is really a shutdown checkpoint, just * issued at a different time. */ isn't very convincing as those checks could just as well be saved in a flags argument in the checkpoint. The likelihood of this confusion causing further bugs (IIRC it already has caused a couple) seems high. What I like even less is that pg_control is actually marked as DB_SHUTDOWNED due to END_OF_RECOVERY. That's just plain wrong. Obviously the database was *NOT* shutdown peacefully. I don't see active bugs due it besides this, but I think it's likely to either have or create futher ones. Because at least the former is something that obviously we can't (and don't want) to change in the back branches I think the solution for this particular problem is to simply move the ResetUnloggedRelations() call a couple lines up to before the CreateCheckPoint(). That should fix this. Comments, other opinions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers