On Thu, 3 Apr 2008, Tom Lane wrote:
"the system stopped checkpointing" does not strike me as a routine
occurrence that we should be making provisions for DBAs to watch for.
What, pray tell, is the DBA supposed to do when and if he notices that?
Schedule downtime rather than wait for it to happen unpredictably when the
inevitable crash happens.
I'd much rather be spending our time and effort on understanding what
broke for you, and fixing the code so it doesn't happen again.
(Here I start laughing all over again as I recall Robert's talk, which we
really need to get you in particular the video of) Their situation had
possible causes that included a bit flipping in bad memory, which is
pretty hard to code around (unless you're a Core Wars veteran). I'm
familiar with that part of the checkpoint code path, and everything I was
able to think of when hearing the outline of events was already considered
and rejected as not being a likely cause. This patch comes out of
pragmatic acceptance that, sometimes, stuff will happen you can't easily
explain, but that doesn't mean it's not worth keeping an eye on it anyway
so it doesn't sneak up on you again.
Anyway, I think this whole thing would be better handled by a larger
internals view that this whole codebase could use a dose of anyway. What
I really want is an interface like this:
psql> select pg_internals('last_checkpoint_time');
and then start sprinkling exports of those probe points in some popular
places people would like to look at.
I will apologize now for suggesting this, followed by not having enough
time to code it in the near future.
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
Sent via pgsql-patches mailing list (email@example.com)
To make changes to your subscription: