Re: [PATCHES] Endless recovery

Heikki Linnakangas Mon, 11 Feb 2008 02:17:07 -0800

Hans-Juergen Schoenig wrote:

Last week we have seen a problem with some horribly configured machine.
The disk filled up (bad FSM ;) ) and once this happened the sysadmi killed thesystem (-9).After two days PostgreSQL has still not started up and they tried to restart itagain and again making sure that the consistency check was started over an overagain (thus causing more and more downtime).From the admi point of view there was no way to find out whether the machinewas actually dead or still recovering.
Here is a small patch which issues a log message indicating that the recoveryprocess can take ages.
Maybe this can prevent some admis from interrupting the recovery process.

Wait, are you saying that the time was spent in the rm_cleanup phase?That sounds unbelievable. Surely the time was spent in the redo phase, no?

In our case, the recovery process took 3.5 days !!

That's a ridiculously long time. Was this a normal recovery, not a PITRarchive recovery? Any idea why the recovery took so long? Given the max.checkpoint timeout of 1h, I would expect that the recovery would take amaximum of few hours even with an extremely write-heavy workload.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [PATCHES] Endless recovery

Reply via email to