At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier <michael.paqu...@gmail.com> wrote in <cab7npqq3q1j_wbc7ypxk39do0rgvbm4-nyp2gmrcj7pfpjx...@mail.gmail.com> > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost <sfr...@snowman.net> wrote: > > I certainly don't care for the idea of adding log messages saying we > > aren't doing anything just to match a count that's incorrectly claiming > > that checkpoints are happening when they aren't. > > > > The down-thread suggestion of keeping track of skipped checkpoints might > > be interesting, but I'm not entirely convinced it really is. We have > > time to debate that, of course, but I don't really see how that's > > helpful. At the moment, it seems like the suggestion to add that column > > is based on the assumption that we're going to start logging skipped > > checkpoints and having that column would allow us to match up the count > > between the new column and the "skipped checkpoint" messages in the logs > > and I can not help but feel that this is a ridiculous amount of effort > > being put into the analysis of something that *didn't* happen. > > Being able to look at how many checkpoints are skipped can be used as > a tuning indicator of max_wal_size and checkpoint_timeout, or in short > increase them if those remain idle.
We ususally adjust the GUCs based on how often checkpoint is *executed* and how many of the executed checkpoints have been triggered by xlog progress (or with shorter interval than timeout). It seems enough. Counting skipped checkpoints gives just a rough estimate of how long the system was getting no substantial updates. I doubt that users get something valuable by counting skipped checkpoints. > Since their introduction in > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the > number of checkpoint requests, not if a checkpoint has been actually > executed or not, I am not sure that this should be changed after 10 > years. So, to put it in other words, wouldn't we want a way to track > checkpoints that are *executed*, meaning that we could increment a > counter after doing the skip checks in CreateRestartPoint() and > CreateCheckPoint()? This sounds reasonable to me. CreateRestartPoint() is already returning ckpt_performed, it is used to let checkpointer retry in 15 seconds rather than waiting the next checkpoint_timeout. Checkpoint might deserve the same treatment on skipping. By the way RestartCheckPoint emits DEBUG2 messages on skipping. Although restartpoint has different characteristics from checkpoint, if we change the message level for CreateCheckPoint (currently DEBUG1), CreateRestartPoint might should get the same change. (Elsewise at least they ought to have the same message level?) regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers