Re: [HACKERS] Logging idle checkpoints
At Thu, 5 Oct 2017 13:41:42 +0200, Alvaro Herrera wrote in <20171005114142.dupjeqe2cnplhgkx@alvherre.pgsql> > Kyotaro HORIGUCHI wrote: > > > # This reminded me of a concern. I'd like to count vacuums that > > # are required but skipped by lock-failure, or killed by other > > # backend. > > We clearly need to improve the stats and logs related to vacuuming work > executed, both by autovacuum and manually invoked. One other item I > have in my head is to report numbers related to the truncation phase of > a vacuum run, since in some cases it causes horrible and hard to > diagnose problems. (Also, add an reloption to stop vacuum from doing > the truncation phase at all -- for some usage patterns that is a serious > problem.) > > However, please do open a new thread about it. Thanks! Will do after a bit time of organization of the thougts. reagareds, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
Kyotaro HORIGUCHI wrote: > # This reminded me of a concern. I'd like to count vacuums that > # are required but skipped by lock-failure, or killed by other > # backend. We clearly need to improve the stats and logs related to vacuuming work executed, both by autovacuum and manually invoked. One other item I have in my head is to report numbers related to the truncation phase of a vacuum run, since in some cases it causes horrible and hard to diagnose problems. (Also, add an reloption to stop vacuum from doing the truncation phase at all -- for some usage patterns that is a serious problem.) However, please do open a new thread about it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
At Tue, 3 Oct 2017 08:22:27 -0400, Stephen Frost wrote in <2017100317.gj4...@tamriel.snowman.net> > Greetings, > > * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote: > > At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier > > wrote in > > > > > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost wrote: > > > Since their introduction in > > > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the > > > number of checkpoint requests, not if a checkpoint has been actually > > > executed or not, I am not sure that this should be changed after 10 > > > years. So, to put it in other words, wouldn't we want a way to track > > > checkpoints that are *executed*, meaning that we could increment a > > > counter after doing the skip checks in CreateRestartPoint() and > > > CreateCheckPoint()? > > > > This sounds reasonable to me. > > I agree that tracking executed checkpoints is valuable, but, and perhaps > I'm missing something, isn't that the same as tracking non-skipped > checkpoints? I suppose we could have both, if we really feel the need, > provided that doesn't result in more work or effort being done than > simply keeping the count. I'd hate to end up in a situation where we're > writing things out unnecessairly just to keep track of checkpoints that > were requested but ultimately skipped because there wasn't anything to > do. I'm fine with counting both executed and skipped. But perhaps the time of lastest checkpoint fits the concern better, like vacuum. It is seen in control file but not in system views. If we have count skipped checkpoints, I'd like to see the time (or LSN) of last checkpoint in system views. checkpoints_timed | bigint | | | checkpoints_req | bigint | | | + checkpoints_skipped | bigint + last_checkpint| timestamp with time zone or LSN? # This reminded me of a concern. I'd like to count vacuums that # are required but skipped by lock-failure, or killed by other # backend. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
Greetings, * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote: > At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier > wrote in > > > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost wrote: > > > I certainly don't care for the idea of adding log messages saying we > > > aren't doing anything just to match a count that's incorrectly claiming > > > that checkpoints are happening when they aren't. > > > > > > The down-thread suggestion of keeping track of skipped checkpoints might > > > be interesting, but I'm not entirely convinced it really is. We have > > > time to debate that, of course, but I don't really see how that's > > > helpful. At the moment, it seems like the suggestion to add that column > > > is based on the assumption that we're going to start logging skipped > > > checkpoints and having that column would allow us to match up the count > > > between the new column and the "skipped checkpoint" messages in the logs > > > and I can not help but feel that this is a ridiculous amount of effort > > > being put into the analysis of something that *didn't* happen. > > > > Being able to look at how many checkpoints are skipped can be used as > > a tuning indicator of max_wal_size and checkpoint_timeout, or in short > > increase them if those remain idle. > > We ususally adjust the GUCs based on how often checkpoint is > *executed* and how many of the executed checkpoints have been > triggered by xlog progress (or with shorter interval than > timeout). It seems enough. Counting skipped checkpoints gives > just a rough estimate of how long the system was getting no > substantial updates. I doubt that users get something valuable by > counting skipped checkpoints. Yeah, I tend to agree. I don't really see how counting skipped checkpoints helps to size max_wal_size or even checkpoint_timeout. The whole point here is that nothing is happening and if nothing is happening then there's no real need to adjust max_wal_size or checkpoint_timeout or, well, much of anything really.. > > Since their introduction in > > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the > > number of checkpoint requests, not if a checkpoint has been actually > > executed or not, I am not sure that this should be changed after 10 > > years. So, to put it in other words, wouldn't we want a way to track > > checkpoints that are *executed*, meaning that we could increment a > > counter after doing the skip checks in CreateRestartPoint() and > > CreateCheckPoint()? > > This sounds reasonable to me. I agree that tracking executed checkpoints is valuable, but, and perhaps I'm missing something, isn't that the same as tracking non-skipped checkpoints? I suppose we could have both, if we really feel the need, provided that doesn't result in more work or effort being done than simply keeping the count. I'd hate to end up in a situation where we're writing things out unnecessairly just to keep track of checkpoints that were requested but ultimately skipped because there wasn't anything to do. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Logging idle checkpoints
At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier wrote in > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost wrote: > > I certainly don't care for the idea of adding log messages saying we > > aren't doing anything just to match a count that's incorrectly claiming > > that checkpoints are happening when they aren't. > > > > The down-thread suggestion of keeping track of skipped checkpoints might > > be interesting, but I'm not entirely convinced it really is. We have > > time to debate that, of course, but I don't really see how that's > > helpful. At the moment, it seems like the suggestion to add that column > > is based on the assumption that we're going to start logging skipped > > checkpoints and having that column would allow us to match up the count > > between the new column and the "skipped checkpoint" messages in the logs > > and I can not help but feel that this is a ridiculous amount of effort > > being put into the analysis of something that *didn't* happen. > > Being able to look at how many checkpoints are skipped can be used as > a tuning indicator of max_wal_size and checkpoint_timeout, or in short > increase them if those remain idle. We ususally adjust the GUCs based on how often checkpoint is *executed* and how many of the executed checkpoints have been triggered by xlog progress (or with shorter interval than timeout). It seems enough. Counting skipped checkpoints gives just a rough estimate of how long the system was getting no substantial updates. I doubt that users get something valuable by counting skipped checkpoints. > Since their introduction in > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the > number of checkpoint requests, not if a checkpoint has been actually > executed or not, I am not sure that this should be changed after 10 > years. So, to put it in other words, wouldn't we want a way to track > checkpoints that are *executed*, meaning that we could increment a > counter after doing the skip checks in CreateRestartPoint() and > CreateCheckPoint()? This sounds reasonable to me. CreateRestartPoint() is already returning ckpt_performed, it is used to let checkpointer retry in 15 seconds rather than waiting the next checkpoint_timeout. Checkpoint might deserve the same treatment on skipping. By the way RestartCheckPoint emits DEBUG2 messages on skipping. Although restartpoint has different characteristics from checkpoint, if we change the message level for CreateCheckPoint (currently DEBUG1), CreateRestartPoint might should get the same change. (Elsewise at least they ought to have the same message level?) regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost wrote: > I certainly don't care for the idea of adding log messages saying we > aren't doing anything just to match a count that's incorrectly claiming > that checkpoints are happening when they aren't. > > The down-thread suggestion of keeping track of skipped checkpoints might > be interesting, but I'm not entirely convinced it really is. We have > time to debate that, of course, but I don't really see how that's > helpful. At the moment, it seems like the suggestion to add that column > is based on the assumption that we're going to start logging skipped > checkpoints and having that column would allow us to match up the count > between the new column and the "skipped checkpoint" messages in the logs > and I can not help but feel that this is a ridiculous amount of effort > being put into the analysis of something that *didn't* happen. Being able to look at how many checkpoints are skipped can be used as a tuning indicator of max_wal_size and checkpoint_timeout, or in short increase them if those remain idle. Since their introduction in 335feca4, m_timed_checkpoints and m_requested_checkpoints track the number of checkpoint requests, not if a checkpoint has been actually executed or not, I am not sure that this should be changed after 10 years. So, to put it in other words, wouldn't we want a way to track checkpoints that are *executed*, meaning that we could increment a counter after doing the skip checks in CreateRestartPoint() and CreateCheckPoint()? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
Vik, all, * Vik Fearing (vik.fear...@2ndquadrant.com) wrote: > I recently had a sad because I noticed that checkpoint counts were > increasing in pg_stat_bgwriter, but weren't accounted for in my logs > with log_checkpoints enabled. > After some searching, I found that it was the idle checkpoints that > weren't being logged. I think this is a missed trick in 6ef2eba3f57. > Attached is a one-liner fix. I realize how imminent we are from > releasing v10 but I hope there is still time for such a minor issue as this. Idle checkpoints aren't, well, really checkpoints though. If anything, seems like we shouldn't be including skipped checkpoints in the pg_stat_bgwriter count because we aren't actually doing a checkpoint. I certainly don't care for the idea of adding log messages saying we aren't doing anything just to match a count that's incorrectly claiming that checkpoints are happening when they aren't. The down-thread suggestion of keeping track of skipped checkpoints might be interesting, but I'm not entirely convinced it really is. We have time to debate that, of course, but I don't really see how that's helpful. At the moment, it seems like the suggestion to add that column is based on the assumption that we're going to start logging skipped checkpoints and having that column would allow us to match up the count between the new column and the "skipped checkpoint" messages in the logs and I can not help but feel that this is a ridiculous amount of effort being put into the analysis of something that *didn't* happen. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Logging idle checkpoints
On 2017-10-02 07:43:31 +0900, Michael Paquier wrote: > On Mon, Oct 2, 2017 at 7:41 AM, Andres Freund wrote: > > On 2017-10-02 07:39:18 +0900, Michael Paquier wrote: > >> On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund wrote: > >> > On 2017-10-02 00:19:33 +0200, Vik Fearing wrote: > >> > I'd be ok with applying this now, or in 10.1 - but I do think we should > >> > fix this before 11. If nobody protests I'll push later today, so we can > >> > get some bf cycles for the very remote case that this causes problems. > >> > >> This point has been discussed during review and removed from the patch > >> (adding Stephen in the loop here): > >> https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com > > > > I find that reasoning unconvincing. log_checkpoints is enabled after > > all. And we're not talking about 10 log messages a second. There's > > plenty systems that analyze the logs that'd possibly be affected by > > this. > > No real objections from here, actually. Vik, because there was some, even though mild, objections, I'd rather not push this right now. Stephen deserves a chance to reply. So this'll have to wait for 10.1, sorry :( - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
On Mon, Oct 2, 2017 at 7:41 AM, Andres Freund wrote: > On 2017-10-02 07:39:18 +0900, Michael Paquier wrote: >> On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund wrote: >> > On 2017-10-02 00:19:33 +0200, Vik Fearing wrote: >> > I'd be ok with applying this now, or in 10.1 - but I do think we should >> > fix this before 11. If nobody protests I'll push later today, so we can >> > get some bf cycles for the very remote case that this causes problems. >> >> This point has been discussed during review and removed from the patch >> (adding Stephen in the loop here): >> https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com > > I find that reasoning unconvincing. log_checkpoints is enabled after > all. And we're not talking about 10 log messages a second. There's > plenty systems that analyze the logs that'd possibly be affected by > this. No real objections from here, actually. >> Actually, shouldn't we make BgWriterStats a bit smarter? We could add >> a counter for skipped checkpoints in v11 (too late for v10). > > Wouldn't hurt, but seems orthogonal. Sure. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
On 2017-10-02 07:39:18 +0900, Michael Paquier wrote: > On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund wrote: > > On 2017-10-02 00:19:33 +0200, Vik Fearing wrote: > > I'd be ok with applying this now, or in 10.1 - but I do think we should > > fix this before 11. If nobody protests I'll push later today, so we can > > get some bf cycles for the very remote case that this causes problems. > > This point has been discussed during review and removed from the patch > (adding Stephen in the loop here): > https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com I find that reasoning unconvincing. log_checkpoints is enabled after all. And we're not talking about 10 log messages a second. There's plenty systems that analyze the logs that'd possibly be affected by this. > Actually, shouldn't we make BgWriterStats a bit smarter? We could add > a counter for skipped checkpoints in v11 (too late for v10). Wouldn't hurt, but seems orthogonal. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund wrote: > On 2017-10-02 00:19:33 +0200, Vik Fearing wrote: > I'd be ok with applying this now, or in 10.1 - but I do think we should > fix this before 11. If nobody protests I'll push later today, so we can > get some bf cycles for the very remote case that this causes problems. This point has been discussed during review and removed from the patch (adding Stephen in the loop here): https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com Actually, shouldn't we make BgWriterStats a bit smarter? We could add a counter for skipped checkpoints in v11 (too late for v10). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
Hi, On 2017-10-02 00:19:33 +0200, Vik Fearing wrote: > I recently had a sad because I noticed that checkpoint counts were > increasing in pg_stat_bgwriter, but weren't accounted for in my logs > with log_checkpoints enabled. > > After some searching, I found that it was the idle checkpoints that > weren't being logged. I think this is a missed trick in 6ef2eba3f57. > > Attached is a one-liner fix. I realize how imminent we are from > releasing v10 but I hope there is still time for such a minor issue as this. > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index dd028a12a4..75f6bd4cc1 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -8724,7 +8724,7 @@ CreateCheckPoint(int flags) > WALInsertLockRelease(); > LWLockRelease(CheckpointLock); > END_CRIT_SECTION(); > - ereport(DEBUG1, > + ereport(log_checkpoints ? LOG : DEBUG1, > (errmsg("checkpoint skipped because > system is idle"))); > return; > } I'd be ok with applying this now, or in 10.1 - but I do think we should fix this before 11. If nobody protests I'll push later today, so we can get some bf cycles for the very remote case that this causes problems. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers