[HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

Andres Freund Fri, 12 Sep 2014 04:25:05 -0700

Hi,

Abhijit and I investigated a customer problem which has showed that crash 
recovery +
unlogged relations don't always work well together:


A condensed version of how crash recovery works is:

StartupXLOG()
{
...
    if (ControlFile->state != DB_SHUTDOWNED)
       InRecovery = true;
    
    if (InRecovery)
    {
        ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
        ....
        /* perform crash recovery till the end of WAL */
        ...
        CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE);
        ...
    }
    
    PreallocXlogFiles(EndOfLog);
    
    if (InRecovery)
        ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
...
    /* finish startup */
}

the second important part is:

CreateCheckPoint(flags)
{
...
    if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
        shutdown = true;
...
    if (shutdown)
        ControlFile->state = DB_SHUTDOWNED;
    UpdateControlFile();
...
}

If you consider a crash due to ENOSPC the problem is that first crash
recovery is performed. Then a checkpoint is performed which is marked as
END_OF_RECOVERY - which marks the database as being properly shut down!
So far, while not pretty, so good. The problem is that if we crash after
the CreateCheckPoint(), e.g. because of xlog preallocation or the new
files created in ResetUnloggedRelations(), the server will restart *but*
will *not* perform crash recovery anymore as pg_control marked as
DB_SHUTDOWNED!

That leaves you with a database which has all the _init forks, but not
the main forks... Leading to scary an unexpected errors.

Should somebody google this: The problem can be fixed by forcing the
server into crash recovery again using an immediate shutdown.


Personally I think it's quite the mistake that END_OF_RECOVERY
checkpoints are treated as shutdown checkpoints. The claimed reason
that:
     *
     * Note that we write a shutdown checkpoint rather than an on-line
     * one. This is not particularly critical, but since we may be
     * assigning a new TLI, using a shutdown checkpoint allows us to have
     * the rule that TLI only changes in shutdown checkpoints, which
     * allows some extra error checking in xlog_redo.
     *
and
    /*
     * An end-of-recovery checkpoint is really a shutdown checkpoint, just
     * issued at a different time.
     */

isn't very convincing as those checks could just as well be saved in a
flags argument in the checkpoint. The likelihood of this confusion
causing further bugs (IIRC it already has caused a couple) seems high.

What I like even less is that pg_control is actually marked as
DB_SHUTDOWNED due to END_OF_RECOVERY. That's just plain wrong. Obviously
the database was *NOT* shutdown peacefully. I don't see active bugs due
it besides this, but I think it's likely to either have or create futher
ones.


Because at least the former is something that obviously we can't (and
don't want) to change in the back branches I think the solution for this
particular problem is to simply move the ResetUnloggedRelations() call a
couple lines up to before the CreateCheckPoint(). That should fix this.

Comments, other opinions?

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

Reply via email to