Hans-Juergen Schoenig wrote:
Last week we have seen a problem with some horribly configured machine.
The disk filled up (bad FSM ;) ) and once this happened the sysadmi killed the
system (-9).
After two days PostgreSQL has still not started up and they tried to restart it
again and again making sure that the consistency check was started over an over
again (thus causing more and more downtime).
From the admi point of view there was no way to find out whether the machine
was actually dead or still recovering.
Here is a small patch which issues a log message indicating that the recovery
process can take ages.
Maybe this can prevent some admis from interrupting the recovery process.
Wait, are you saying that the time was spent in the rm_cleanup phase?
That sounds unbelievable. Surely the time was spent in the redo phase, no?
In our case, the recovery process took 3.5 days !!
That's a ridiculously long time. Was this a normal recovery, not a PITR
archive recovery? Any idea why the recovery took so long? Given the max.
checkpoint timeout of 1h, I would expect that the recovery would take a
maximum of few hours even with an extremely write-heavy workload.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly