Last week we have seen a problem with some horribly configured machine. The disk filled up (bad FSM ;) ) and once this happened the sysadmi killed the system (-9). After two days PostgreSQL has still not started up and they tried to restart it again and again making sure that the consistency check was started over an over again (thus causing more and more downtime). From the admi point of view there was no way to find out whether the machine was actually dead or still recovering. Here is a small patch which issues a log message indicating that the recovery process can take ages. Maybe this can prevent some admis from interrupting the recovery process.

Wait, are you saying that the time was spent in the rm_cleanup phase? That sounds unbelievable. Surely the time was spent in the redo phase, no?

it was a seek heavy workload, with backtraces like this one

redo was done fast ...

[2008-02-07 22:24:50 CET ]LOG: database system was interrupted while in recovery at 2008-02-04 11:09:04 CET [2008-02-07 22:24:50 CET ]HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
[2008-02-07 22:24:50 CET ]LOG: checkpoint record is at 35/3BAA131C
[2008-02-07 22:24:50 CET ]LOG: redo record is at 35/3BA11AB4; undo record is at 0/0; shutdown FALSE [2008-02-07 22:24:50 CET ]LOG: next transaction ID: 194549334; next OID: 16586886 [2008-02-07 22:24:50 CET ]LOG: next MultiXactId: 1; next MultiXactOffset: 0 [2008-02-07 22:24:50 CET ]LOG: database system was not properly shut down; automatic recovery in progress
[2008-02-07 22:24:50 CET ]LOG: redo starts at 35/3BA11AB4
[2008-02-07 22:24:53 CET ]LOG: record with zero length at 35/3C8317C8
[2008-02-07 22:24:53 CET ]LOG: redo done at 35/3C8317A0
note that redo was finished fast ...

In our case, the recovery process took 3.5 days !!

That's a ridiculously long time. Was this a normal recovery, not a PITR archive recovery? Any idea why the recovery took so long? Given the max. checkpoint timeout of 1h, I would expect that the recovery would take a maximum of few hours even with an extremely write-heavy workload.

there was no PITR in place - just a normal DB ...
my first idea when they called me was that it must be related to checkpoint_segments - maybe some 2mio segments and some insanely long timeout. but no, this was not the case. segments were at 12 and the timeout was just several minutes. basically from "outside" everything was looking fine ...

we used a binary copy of the data on two boxes (one for debugging). the entire process worked like a charm - it just took ages.
we have seen a lot of random I/O here.

this was quite a small machine with insanely small memory settings.
no errors have been issued during the process - all fine; just long ...

the DB version is 8.1.11. The entire DB is 116gb. It is more or less a table along with a 65 GB Gist index.

