On Thu, 3 Apr 2008, Tom Lane wrote:

"the system stopped checkpointing" does not strike me as a routine occurrence that we should be making provisions for DBAs to watch for. What, pray tell, is the DBA supposed to do when and if he notices that?

Schedule downtime rather than wait for it to happen unpredictably when the inevitable crash happens.

I'd much rather be spending our time and effort on understanding what
broke for you, and fixing the code so it doesn't happen again.

(Here I start laughing all over again as I recall Robert's talk, which we really need to get you in particular the video of) Their situation had possible causes that included a bit flipping in bad memory, which is pretty hard to code around (unless you're a Core Wars veteran). I'm familiar with that part of the checkpoint code path, and everything I was able to think of when hearing the outline of events was already considered and rejected as not being a likely cause. This patch comes out of pragmatic acceptance that, sometimes, stuff will happen you can't easily explain, but that doesn't mean it's not worth keeping an eye on it anyway so it doesn't sneak up on you again.

Anyway, I think this whole thing would be better handled by a larger internals view that this whole codebase could use a dose of anyway. What I really want is an interface like this:

psql> select pg_internals('last_checkpoint_time');

and then start sprinkling exports of those probe points in some popular places people would like to look at.

I will apologize now for suggesting this, followed by not having enough time to code it in the near future.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Reply via email to