Re: [HACKERS] Logging idle checkpoints

2017-10-05 Thread Kyotaro HORIGUCHI
At Thu, 5 Oct 2017 13:41:42 +0200, Alvaro Herrera  
wrote in <20171005114142.dupjeqe2cnplhgkx@alvherre.pgsql>
> Kyotaro HORIGUCHI wrote:
> 
> > # This reminded me of a concern. I'd like to count vacuums that
> > # are required but skipped by lock-failure, or killed by other
> > # backend.
> 
> We clearly need to improve the stats and logs related to vacuuming work
> executed, both by autovacuum and manually invoked.  One other item I
> have in my head is to report numbers related to the truncation phase of
> a vacuum run, since in some cases it causes horrible and hard to
> diagnose problems.  (Also, add an reloption to stop vacuum from doing
> the truncation phase at all -- for some usage patterns that is a serious
> problem.)
> 
> However, please do open a new thread about it.

Thanks! Will do after a bit time of organization of the thougts.

reagareds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-05 Thread Alvaro Herrera
Kyotaro HORIGUCHI wrote:

> # This reminded me of a concern. I'd like to count vacuums that
> # are required but skipped by lock-failure, or killed by other
> # backend.

We clearly need to improve the stats and logs related to vacuuming work
executed, both by autovacuum and manually invoked.  One other item I
have in my head is to report numbers related to the truncation phase of
a vacuum run, since in some cases it causes horrible and hard to
diagnose problems.  (Also, add an reloption to stop vacuum from doing
the truncation phase at all -- for some usage patterns that is a serious
problem.)

However, please do open a new thread about it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-04 Thread Kyotaro HORIGUCHI
At Tue, 3 Oct 2017 08:22:27 -0400, Stephen Frost  wrote in 
<2017100317.gj4...@tamriel.snowman.net>
> Greetings,
> 
> * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> > At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier 
> >  wrote in 
> > 
> > > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost  wrote:
> > > Since their introduction in
> > > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the
> > > number of checkpoint requests, not if a checkpoint has been actually
> > > executed or not, I am not sure that this should be changed after 10
> > > years. So, to put it in other words, wouldn't we want a way to track
> > > checkpoints that are *executed*, meaning that we could increment a
> > > counter after doing the skip checks in CreateRestartPoint() and
> > > CreateCheckPoint()?
> > 
> > This sounds reasonable to me.
> 
> I agree that tracking executed checkpoints is valuable, but, and perhaps
> I'm missing something, isn't that the same as tracking non-skipped
> checkpoints? I suppose we could have both, if we really feel the need,
> provided that doesn't result in more work or effort being done than
> simply keeping the count.  I'd hate to end up in a situation where we're
> writing things out unnecessairly just to keep track of checkpoints that
> were requested but ultimately skipped because there wasn't anything to
> do.

I'm fine with counting both executed and skipped. But perhaps the
time of lastest checkpoint fits the concern better, like
vacuum. It is seen in control file but not in system views. If we
have count skipped checkpoints, I'd like to see the time (or LSN)
of last checkpoint in system views.

  checkpoints_timed | bigint   |   |  | 
  checkpoints_req   | bigint   |   |  | 
+ checkpoints_skipped   | bigint
+ last_checkpint| timestamp with time zone or LSN?


# This reminded me of a concern. I'd like to count vacuums that
# are required but skipped by lock-failure, or killed by other
# backend.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-03 Thread Stephen Frost
Greetings,

* Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier 
>  wrote in 
> 
> > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost  wrote:
> > > I certainly don't care for the idea of adding log messages saying we
> > > aren't doing anything just to match a count that's incorrectly claiming
> > > that checkpoints are happening when they aren't.
> > >
> > > The down-thread suggestion of keeping track of skipped checkpoints might
> > > be interesting, but I'm not entirely convinced it really is.  We have
> > > time to debate that, of course, but I don't really see how that's
> > > helpful.  At the moment, it seems like the suggestion to add that column
> > > is based on the assumption that we're going to start logging skipped
> > > checkpoints and having that column would allow us to match up the count
> > > between the new column and the "skipped checkpoint" messages in the logs
> > > and I can not help but feel that this is a ridiculous amount of effort
> > > being put into the analysis of something that *didn't* happen.
> > 
> > Being able to look at how many checkpoints are skipped can be used as
> > a tuning indicator of max_wal_size and checkpoint_timeout, or in short
> > increase them if those remain idle.
> 
> We ususally adjust the GUCs based on how often checkpoint is
> *executed* and how many of the executed checkpoints have been
> triggered by xlog progress (or with shorter interval than
> timeout). It seems enough. Counting skipped checkpoints gives
> just a rough estimate of how long the system was getting no
> substantial updates. I doubt that users get something valuable by
> counting skipped checkpoints.

Yeah, I tend to agree.  I don't really see how counting skipped
checkpoints helps to size max_wal_size or even checkpoint_timeout.  The
whole point here is that nothing is happening and if nothing is
happening then there's no real need to adjust max_wal_size or
checkpoint_timeout or, well, much of anything really..

> > Since their introduction in
> > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the
> > number of checkpoint requests, not if a checkpoint has been actually
> > executed or not, I am not sure that this should be changed after 10
> > years. So, to put it in other words, wouldn't we want a way to track
> > checkpoints that are *executed*, meaning that we could increment a
> > counter after doing the skip checks in CreateRestartPoint() and
> > CreateCheckPoint()?
> 
> This sounds reasonable to me.

I agree that tracking executed checkpoints is valuable, but, and perhaps
I'm missing something, isn't that the same as tracking non-skipped
checkpoints?  I suppose we could have both, if we really feel the need,
provided that doesn't result in more work or effort being done than
simply keeping the count.  I'd hate to end up in a situation where we're
writing things out unnecessairly just to keep track of checkpoints that
were requested but ultimately skipped because there wasn't anything to
do.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logging idle checkpoints

2017-10-02 Thread Kyotaro HORIGUCHI
At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier  
wrote in 
> On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost  wrote:
> > I certainly don't care for the idea of adding log messages saying we
> > aren't doing anything just to match a count that's incorrectly claiming
> > that checkpoints are happening when they aren't.
> >
> > The down-thread suggestion of keeping track of skipped checkpoints might
> > be interesting, but I'm not entirely convinced it really is.  We have
> > time to debate that, of course, but I don't really see how that's
> > helpful.  At the moment, it seems like the suggestion to add that column
> > is based on the assumption that we're going to start logging skipped
> > checkpoints and having that column would allow us to match up the count
> > between the new column and the "skipped checkpoint" messages in the logs
> > and I can not help but feel that this is a ridiculous amount of effort
> > being put into the analysis of something that *didn't* happen.
> 
> Being able to look at how many checkpoints are skipped can be used as
> a tuning indicator of max_wal_size and checkpoint_timeout, or in short
> increase them if those remain idle.

We ususally adjust the GUCs based on how often checkpoint is
*executed* and how many of the executed checkpoints have been
triggered by xlog progress (or with shorter interval than
timeout). It seems enough. Counting skipped checkpoints gives
just a rough estimate of how long the system was getting no
substantial updates. I doubt that users get something valuable by
counting skipped checkpoints.

> Since their introduction in
> 335feca4, m_timed_checkpoints and m_requested_checkpoints track the
> number of checkpoint requests, not if a checkpoint has been actually
> executed or not, I am not sure that this should be changed after 10
> years. So, to put it in other words, wouldn't we want a way to track
> checkpoints that are *executed*, meaning that we could increment a
> counter after doing the skip checks in CreateRestartPoint() and
> CreateCheckPoint()?

This sounds reasonable to me.

CreateRestartPoint() is already returning ckpt_performed, it is
used to let checkpointer retry in 15 seconds rather than waiting
the next checkpoint_timeout. Checkpoint might deserve the same
treatment on skipping.

By the way RestartCheckPoint emits DEBUG2 messages on skipping.
Although restartpoint has different characteristics from
checkpoint, if we change the message level for CreateCheckPoint
(currently DEBUG1), CreateRestartPoint might should get the same
change.  (Elsewise at least they ought to have the same message
level?)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-02 Thread Michael Paquier
On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost  wrote:
> I certainly don't care for the idea of adding log messages saying we
> aren't doing anything just to match a count that's incorrectly claiming
> that checkpoints are happening when they aren't.
>
> The down-thread suggestion of keeping track of skipped checkpoints might
> be interesting, but I'm not entirely convinced it really is.  We have
> time to debate that, of course, but I don't really see how that's
> helpful.  At the moment, it seems like the suggestion to add that column
> is based on the assumption that we're going to start logging skipped
> checkpoints and having that column would allow us to match up the count
> between the new column and the "skipped checkpoint" messages in the logs
> and I can not help but feel that this is a ridiculous amount of effort
> being put into the analysis of something that *didn't* happen.

Being able to look at how many checkpoints are skipped can be used as
a tuning indicator of max_wal_size and checkpoint_timeout, or in short
increase them if those remain idle. Since their introduction in
335feca4, m_timed_checkpoints and m_requested_checkpoints track the
number of checkpoint requests, not if a checkpoint has been actually
executed or not, I am not sure that this should be changed after 10
years. So, to put it in other words, wouldn't we want a way to track
checkpoints that are *executed*, meaning that we could increment a
counter after doing the skip checks in CreateRestartPoint() and
CreateCheckPoint()?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-02 Thread Stephen Frost
Vik, all,

* Vik Fearing (vik.fear...@2ndquadrant.com) wrote:
> I recently had a sad because I noticed that checkpoint counts were
> increasing in pg_stat_bgwriter, but weren't accounted for in my logs
> with log_checkpoints enabled.

> After some searching, I found that it was the idle checkpoints that
> weren't being logged.  I think this is a missed trick in 6ef2eba3f57.

> Attached is a one-liner fix.  I realize how imminent we are from
> releasing v10 but I hope there is still time for such a minor issue as this.

Idle checkpoints aren't, well, really checkpoints though.  If anything,
seems like we shouldn't be including skipped checkpoints in the
pg_stat_bgwriter count because we aren't actually doing a checkpoint.

I certainly don't care for the idea of adding log messages saying we
aren't doing anything just to match a count that's incorrectly claiming
that checkpoints are happening when they aren't.

The down-thread suggestion of keeping track of skipped checkpoints might
be interesting, but I'm not entirely convinced it really is.  We have
time to debate that, of course, but I don't really see how that's
helpful.  At the moment, it seems like the suggestion to add that column
is based on the assumption that we're going to start logging skipped
checkpoints and having that column would allow us to match up the count
between the new column and the "skipped checkpoint" messages in the logs
and I can not help but feel that this is a ridiculous amount of effort
being put into the analysis of something that *didn't* happen.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logging idle checkpoints

2017-10-01 Thread Andres Freund
On 2017-10-02 07:43:31 +0900, Michael Paquier wrote:
> On Mon, Oct 2, 2017 at 7:41 AM, Andres Freund  wrote:
> > On 2017-10-02 07:39:18 +0900, Michael Paquier wrote:
> >> On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund  wrote:
> >> > On 2017-10-02 00:19:33 +0200, Vik Fearing wrote:
> >> > I'd be ok with applying this now, or in 10.1 - but I do think we should
> >> > fix this before 11.  If nobody protests I'll push later today, so we can
> >> > get some bf cycles for the very remote case that this causes problems.
> >>
> >> This point has been discussed during review and removed from the patch
> >> (adding Stephen in the loop here):
> >> https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com
> >
> > I find that reasoning unconvincing. log_checkpoints is enabled after
> > all. And we're not talking about 10 log messages a second. There's
> > plenty systems that analyze the logs that'd possibly be affected by
> > this.
> 
> No real objections from here, actually.

Vik, because there was some, even though mild, objections, I'd rather
not push this right now. Stephen deserves a chance to reply.  So this'll
have to wait for 10.1, sorry :(

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-01 Thread Michael Paquier
On Mon, Oct 2, 2017 at 7:41 AM, Andres Freund  wrote:
> On 2017-10-02 07:39:18 +0900, Michael Paquier wrote:
>> On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund  wrote:
>> > On 2017-10-02 00:19:33 +0200, Vik Fearing wrote:
>> > I'd be ok with applying this now, or in 10.1 - but I do think we should
>> > fix this before 11.  If nobody protests I'll push later today, so we can
>> > get some bf cycles for the very remote case that this causes problems.
>>
>> This point has been discussed during review and removed from the patch
>> (adding Stephen in the loop here):
>> https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com
>
> I find that reasoning unconvincing. log_checkpoints is enabled after
> all. And we're not talking about 10 log messages a second. There's
> plenty systems that analyze the logs that'd possibly be affected by
> this.

No real objections from here, actually.

>> Actually, shouldn't we make BgWriterStats a bit smarter? We could add
>> a counter for skipped checkpoints in v11 (too late for v10).
>
> Wouldn't hurt, but seems orthogonal.

Sure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-01 Thread Andres Freund
On 2017-10-02 07:39:18 +0900, Michael Paquier wrote:
> On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund  wrote:
> > On 2017-10-02 00:19:33 +0200, Vik Fearing wrote:
> > I'd be ok with applying this now, or in 10.1 - but I do think we should
> > fix this before 11.  If nobody protests I'll push later today, so we can
> > get some bf cycles for the very remote case that this causes problems.
> 
> This point has been discussed during review and removed from the patch
> (adding Stephen in the loop here):
> https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com

I find that reasoning unconvincing. log_checkpoints is enabled after
all. And we're not talking about 10 log messages a second. There's
plenty systems that analyze the logs that'd possibly be affected by
this.


> Actually, shouldn't we make BgWriterStats a bit smarter? We could add
> a counter for skipped checkpoints in v11 (too late for v10).

Wouldn't hurt, but seems orthogonal.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-01 Thread Michael Paquier
On Mon, Oct 2, 2017 at 7:27 AM, Andres Freund  wrote:
> On 2017-10-02 00:19:33 +0200, Vik Fearing wrote:
> I'd be ok with applying this now, or in 10.1 - but I do think we should
> fix this before 11.  If nobody protests I'll push later today, so we can
> get some bf cycles for the very remote case that this causes problems.

This point has been discussed during review and removed from the patch
(adding Stephen in the loop here):
https://www.postgresql.org/message-id/CAOuzzgq8pHneMHy6JiNiG6Xm5V=cm+k2wcd2w-sctpjdg7x...@mail.gmail.com
Actually, shouldn't we make BgWriterStats a bit smarter? We could add
a counter for skipped checkpoints in v11 (too late for v10).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging idle checkpoints

2017-10-01 Thread Andres Freund
Hi,

On 2017-10-02 00:19:33 +0200, Vik Fearing wrote:
> I recently had a sad because I noticed that checkpoint counts were
> increasing in pg_stat_bgwriter, but weren't accounted for in my logs
> with log_checkpoints enabled.
> 
> After some searching, I found that it was the idle checkpoints that
> weren't being logged.  I think this is a missed trick in 6ef2eba3f57.
> 
> Attached is a one-liner fix.  I realize how imminent we are from
> releasing v10 but I hope there is still time for such a minor issue as this.


> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index dd028a12a4..75f6bd4cc1 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -8724,7 +8724,7 @@ CreateCheckPoint(int flags)
>   WALInsertLockRelease();
>   LWLockRelease(CheckpointLock);
>   END_CRIT_SECTION();
> - ereport(DEBUG1,
> + ereport(log_checkpoints ? LOG : DEBUG1,
>   (errmsg("checkpoint skipped because 
> system is idle")));
>   return;
>   }

I'd be ok with applying this now, or in 10.1 - but I do think we should
fix this before 11.  If nobody protests I'll push later today, so we can
get some bf cycles for the very remote case that this causes problems.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers