Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-05 Thread Joshua D. Drake
On Sat, 05 Apr 2008 16:37:15 +0100 Heikki Linnakangas [EMAIL PROTECTED] wrote: May I just say that every person that is currently talking on this thread is offtopic? Move it to -hackers please. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-05 Thread Heikki Linnakangas
Robert Treat wrote: 1) Alert if checkpointing stops occuring within a reasonable time frame (note there are failure cases and normal use cases where this might occur) (also note I'll agree, this isn't common, but the results are pretty disatrous if it does happen) What are the normal use

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-05 Thread Alvaro Herrera
Heikki Linnakangas wrote: Robert Treat wrote: 2) Can be graphed over time (using rrdtool and others) for trending checkpoint activity Hmm. You'd need the historical data to do that properly. In particular, if two checkpoints happen between the polling interval, you'd miss that. Yes,

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes: On Thu, 3 Apr 2008, Tom Lane wrote: I'd much rather be spending our time and effort on understanding what broke for you, and fixing the code so it doesn't happen again. [ shit happens... ] Completely fair, but I still don't see how this particular patch

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Greg Smith
On Fri, 4 Apr 2008, Tom Lane wrote: (And you still didn't tell me what the actual failure case was.) Database stops checkpointing. WAL files pile up. In the middle of backup, system finally dies, and when it starts recovery there's a bad record in the WAL files--which there are now

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes: On Fri, 4 Apr 2008, Tom Lane wrote: (And you still didn't tell me what the actual failure case was.) Database stops checkpointing. WAL files pile up. In the middle of backup, system finally dies, and when it starts recovery there's a bad record in the

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Greg Smith
On Fri, 4 Apr 2008, Tom Lane wrote: The actual advice I'd give to a DBA faced with such a case is to kill -ABRT the bgwriter and send the stack trace to -hackers. And that's a perfect example of where they're trying to get to. They didn't notice the problem until after the crash. The

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes: ... If they'd have noticed it while the server was up, perhaps because the last checkpoint value hadn't changed in a long time (which seems like it might be available via stats even if, as you say, the background writer is out of its mind at that point),

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Alvaro Herrera
Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: ... If they'd have noticed it while the server was up, perhaps because the last checkpoint value hadn't changed in a long time (which seems like it might be available via stats even if, as you say, the background writer is out of its

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Robert Treat
On Friday 04 April 2008 01:59, Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: On Thu, 3 Apr 2008, Tom Lane wrote: I'd much rather be spending our time and effort on understanding what broke for you, and fixing the code so it doesn't happen again. [ shit happens... ] Completely

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: ... If they'd have noticed it while the server was up, perhaps because the last checkpoint value hadn't changed in a long time (which seems like it might be available via stats even if, as you

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-04 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: These kind of things can be monitored externally very easily, say by Nagios, when the values are available via the database. If you have to troll the logs, it's quite a bit harder to do it. I'm not sure about the right values to export -- last

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Alvaro Herrera
Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. I suggest using GetCurrentTimestamp() directly instead of time_t and converting. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting,

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Heikki Linnakangas
Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 03 Apr 2008 23:21:49 +0100 Heikki Linnakangas [EMAIL PROTECTED] wrote: Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? For knowing how long checkpoints are taking.

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? Does this implementation even work? It looks to me like the globalStats.last_checkpoint_start/done fields will go back to zero the

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Andrew Dunstan
Joshua D. Drake wrote: Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? For knowing how long checkpoints are taking. If they are taking too long you may need to adjust your bgwriter settings, and it is a

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes: Heikki Linnakangas [EMAIL PROTECTED] wrote: Why is that useful? For knowing how long checkpoints are taking. If they are taking too long you may need to adjust your bgwriter settings, and it is a serious drag to parse postgresql logs for this info.

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Andrew Dunstan
Robert Treat wrote: On Thursday 03 April 2008 19:08, Andrew Dunstan wrote: Joshua D. Drake wrote: Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? For knowing how long checkpoints are

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Robert Treat
On Thursday 03 April 2008 19:08, Andrew Dunstan wrote: Joshua D. Drake wrote: Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? For knowing how long checkpoints are taking. If they are taking too long you may need

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 03 Apr 2008 20:29:18 -0400 Tom Lane [EMAIL PROTECTED] wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Heikki Linnakangas [EMAIL PROTECTED] wrote: Why is that useful? For knowing how long checkpoints are taking. If they are taking

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 03 Apr 2008 20:45:37 -0400 Andrew Dunstan [EMAIL PROTECTED] wrote: Exposing everything into the log files isn't always sufficient (says the guy who maintains a remote admin tool) It should be now that you can have machine readable

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes: I would agree with this. We would need a history of checkpoints that didn't reset until we told it to. Indeed, but the submitted patch has nought whatsoever to do with that. It exposes some instantaneous state. You could perhaps *build* a log facility

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 03 Apr 2008 21:26:46 -0400 Tom Lane [EMAIL PROTECTED] wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I would agree with this. We would need a history of checkpoints that didn't reset until we told it to. Indeed, but the submitted

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Theo Schlossnagle
On Apr 3, 2008, at 7:08 PM, Andrew Dunstan wrote: Joshua D. Drake wrote: Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? For knowing how long checkpoints are taking. If they are taking too long you may need to

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Andrew Dunstan
Joshua D. Drake wrote: Exposing everything into the log files isn't always sufficient (says the guy who maintains a remote admin tool) It should be now that you can have machine readable logs (says the guy who literally spent weeks making that happen) ;-) And how does the

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 03 Apr 2008 21:44:00 -0400 Andrew Dunstan [EMAIL PROTECTED] wrote: I think there is quite possibly a good case for keeping some diagnostics in a table or tables, on a rolling basis, maybe. But then that's a facility that needs to be

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Tom Lane
Theo Schlossnagle [EMAIL PROTECTED] writes: Heikki: It it useful for knowing when the last checkpoint occurred. I guess I'm wondering why that's important. In the current bgwriter design, the system spends half its time checkpointing (or in general checkpoint_completion_target % of the

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Andrew Dunstan
Theo Schlossnagle wrote: Has this feature been discussed on -hackers? I don't recall it (and my memory has plenty of holes in it), but I'm sure that after attending my talk last Sunday Theo hasn't sent in a patch for an undiscussed feature ;-) Andrew: I don't think this feature has

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Robert Treat
On Thursday 03 April 2008 21:14, Joshua D. Drake wrote: On Thu, 03 Apr 2008 20:29:18 -0400 Tom Lane [EMAIL PROTECTED] wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Heikki Linnakangas [EMAIL PROTECTED] wrote: Why is that useful? For knowing how long checkpoints are taking. If

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: 3. As of PG 8.3, the bgwriter tries very hard to make the elapsed time of a checkpoint be just about checkpoint_timeout * checkpoint_completion_target, regardless of load factors. So unless your settings are completely

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 03 Apr 2008 22:33:15 -0400 Tom Lane [EMAIL PROTECTED] wrote: JD seems to be on record that the existing logging mechanism sucks and he needs something else. That's fine, but I think it means that we need to improve logging in general, not

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Theo Schlossnagle
On Apr 3, 2008, at 10:33 PM, Tom Lane wrote: Theo claimed he had a reason for wanting to know the latest checkpoint time, *without* any intention of time-extended tracking of that; but he didn't say what it was. If there is a credible reason for that then it might justify a patch of this

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Tom Lane
Theo Schlossnagle [EMAIL PROTECTED] writes: On Apr 3, 2008, at 10:33 PM, Tom Lane wrote: Theo claimed he had a reason for wanting to know the latest checkpoint time, *without* any intention of time-extended tracking of that; but he didn't say what it was. We had a recent event where the

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Greg Smith
On Thu, 3 Apr 2008, Robert Treat wrote: You can plug a single item graphed over time into things like rrdtool to get good trending information. And it's often easier to do this using sql interfaces to get the data than pulling it out of log files (almost like the db was designed for that :-)

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Bruce Momjian
Greg Smith wrote: On Thu, 3 Apr 2008, Robert Treat wrote: You can plug a single item graphed over time into things like rrdtool to get good trending information. And it's often easier to do this using sql interfaces to get the data than pulling it out of log files (almost like the db

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Greg Smith
On Thu, 3 Apr 2008, Joshua D. Drake wrote: For knowing how long checkpoints are taking. If they are taking too long you may need to adjust your bgwriter settings, and it is a serious drag to parse postgresql logs for this info. There's some disconnect here between what I think you want here

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Greg Smith
On Thu, 3 Apr 2008, Tom Lane wrote: As of PG 8.3, the bgwriter tries very hard to make the elapsed time of a checkpoint be just about checkpoint_timeout * checkpoint_completion_target, regardless of load factors. In the cases where the timing on checkpoint writes are timeout driven. When

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Robert Treat
On Friday 04 April 2008 00:09, Greg Smith wrote: On Thu, 3 Apr 2008, Robert Treat wrote: You can plug a single item graphed over time into things like rrdtool to get good trending information. And it's often easier to do this using sql interfaces to get the data than pulling it out of log

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes: I have to add, given that we already provide the time of last checkpoint information via pg_controldata, I don't understand why people are against making that information accesible to remote clients. So, I can expect to see a patch next week that

Re: [PATCHES] Expose checkpoint start/finish times into SQL.

2008-04-03 Thread Greg Smith
On Thu, 3 Apr 2008, Tom Lane wrote: the system stopped checkpointing does not strike me as a routine occurrence that we should be making provisions for DBAs to watch for. What, pray tell, is the DBA supposed to do when and if he notices that? Schedule downtime rather than wait for it to