Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-27 Thread Kyotaro HORIGUCHI
At Fri, 23 Dec 2016 11:02:11 +0900, Michael Paquier  
wrote in 
> On Fri, Dec 23, 2016 at 8:13 AM, Robert Haas  wrote:
> > On Thu, Dec 22, 2016 at 2:34 PM, Andres Freund  wrote:
> >> On 2016-12-22 08:32:56 -0800, Andres Freund wrote:
> >>> I plan to commit this later today.  Hope I got the reviewers roughly 
> >>> right.
> >>
> >> And pushed. Thanks for the work on this everyone.
> >
> > Cool.  Also, +1 for the important/unimportant terminology.  I like that.
> 
> Thanks for the commit.

Thanks for commiting.

By the way this issue seems beeing in the ToDo list.

https://wiki.postgresql.org/wiki/Todo#Point-In-Time_Recovery_.28PITR.29

>Consider avoiding WAL switching via archive_timeout if there
>has been no database activity
> - archive_timeout behavior for no activity
> - Re: archive_timeout behavior for no activity

So I marked it as "done".

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-22 Thread Michael Paquier
On Fri, Dec 23, 2016 at 8:13 AM, Robert Haas  wrote:
> On Thu, Dec 22, 2016 at 2:34 PM, Andres Freund  wrote:
>> On 2016-12-22 08:32:56 -0800, Andres Freund wrote:
>>> I plan to commit this later today.  Hope I got the reviewers roughly right.
>>
>> And pushed. Thanks for the work on this everyone.
>
> Cool.  Also, +1 for the important/unimportant terminology.  I like that.

Thanks for the commit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-22 Thread Robert Haas
On Thu, Dec 22, 2016 at 2:34 PM, Andres Freund  wrote:
> On 2016-12-22 08:32:56 -0800, Andres Freund wrote:
>> I plan to commit this later today.  Hope I got the reviewers roughly right.
>
> And pushed. Thanks for the work on this everyone.

Cool.  Also, +1 for the important/unimportant terminology.  I like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-22 Thread Andres Freund
Hi,

On 2016-12-22 08:32:56 -0800, Andres Freund wrote:
> I plan to commit this later today.  Hope I got the reviewers roughly right.

And pushed. Thanks for the work on this everyone.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-22 Thread Andres Freund
Hi,

On 2016-12-21 13:28:54 -0800, Andres Freund wrote:
> A mime-type of invalid/octet-stream? That's an, uh, odd choice.
>
> Working on committing this (tomorrow morning, not tonight).  There's
> some relatively minor things I want to change:
>
> - I don't like the name XLogSetFlags() - it's completely unclear what
>   that those flags refer to - it could just as well be replay
>   related. XLogSetRecordFlags()?
> - Similarly I don't like the name "progress LSN" much. What does
>   "progress" really mean in that". Maybe "consistency LSN"?
> - It's currently required to avoid triggering archive timeouts and
>   checkpoints triggering each other, but I'm nervous marking all xlog
>   switches as unimportant. I think it'd be better to only mark timeout
>   triggered switches as such.

Here's an updated version of this. Besides the above (with "consistency
LSN" now named "lastImportantAt" instead of the previous
lastProgressAt), I changed how the skipping works in the bgwriter: I
don't see any need to involve the checkpoint location there. This also
allows to drop GetLastCheckpointPtr().  Besides that I did a fair amount
of comment-smithing.

I plan to commit this later today.  Hope I got the reviewers roughly right.

Regards,

Andres
>From c5c9e9ce114d5c058e171caaa172ccb2ac066f13 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 22 Dec 2016 08:31:32 -0800
Subject: [PATCH] Skip checkpoints, archiving on idle systems.

Some background activity (like checkpoints, archive timeout, standby
snapshots) is not supposed to happen on an idle system. Unfortunately
so far it was not easy to determine when a system is idle, which
defeated some of the attempts to avoid redundant activity on an idle
system.

To make that easier, allow to make individual WAL insertions as not
being "important". By checking whether any important activity happened
since the last time an activity was performed, it now is easy to check
whether some action needs to be repeated.

Use the new facility for checkpoints, archive timeout and standby
snapshots.

Author: Michael Paquier, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI
Bug: #13685
Discussion:
https://www.postgresql.org/message-id/20151016203031.3019.72...@wrigleys.postgresql.org
https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkyv_45d...@mail.gmail.com
Backpatch: -
---
 doc/src/sgml/config.sgml  |  10 +--
 src/backend/access/heap/heapam.c  |  10 +--
 src/backend/access/transam/xact.c |   2 +-
 src/backend/access/transam/xlog.c | 118 +++---
 src/backend/access/transam/xlogfuncs.c|   2 +-
 src/backend/access/transam/xloginsert.c   |  24 --
 src/backend/postmaster/bgwriter.c |   8 +-
 src/backend/postmaster/checkpointer.c |  45 
 src/backend/replication/logical/message.c |   2 +-
 src/backend/storage/ipc/standby.c |  11 ++-
 src/include/access/xlog.h |  12 ++-
 src/include/access/xlog_internal.h|   4 +-
 src/include/access/xloginsert.h   |   2 +-
 13 files changed, 173 insertions(+), 77 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1b98c416e0..b6b20a368e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2852,12 +2852,10 @@ include_dir 'conf.d'
 parameter is greater than zero, the server will switch to a new
 segment file whenever this many seconds have elapsed since the last
 segment file switch, and there has been any database activity,
-including a single checkpoint.  (Increasing
-checkpoint_timeout will reduce unnecessary
-checkpoints on an idle system.)
-Note that archived files that are closed early
-due to a forced switch are still the same length as completely full
-files.  Therefore, it is unwise to use a very short
+including a single checkpoint (checkpoints are skipped if there is
+no database activity).  Note that archived files that are closed
+early due to a forced switch are still the same length as completely
+full files.  Therefore, it is unwise to use a very short
 archive_timeout  it will bloat your archive
 storage.  archive_timeout settings of a minute or so are
 usually reasonable.  You should consider using streaming replication,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b019bc1a0d..ea579a00be 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2507,7 +2507,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			heaptup->t_len - SizeofHeapTupleHeader);
 
 		/* filtering by origin on a row level is much more efficient */
-		XLogIncludeOrigin();
+		XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
 
 		recptr = XLogInsert(RM_HEAP_ID, info);
 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-22 Thread Amit Kapila
On Thu, Dec 22, 2016 at 3:10 AM, Andres Freund  wrote:
> On 2016-12-21 16:35:28 -0500, Robert Haas wrote:
>> On Wed, Dec 21, 2016 at 4:28 PM, Andres Freund  wrote:
>> > - Similarly I don't like the name "progress LSN" much. What does
>> >   "progress" really mean in that". Maybe "consistency LSN"?
>>
>> Whoa.  -1 from me for "consistency LSN".  Consistency has to with
>> whether the cluster has recovered up to the minimum recovery point or
>> whatever -- that is -- questions like "am i going to run into torn
>> pages?" and "should I expect some heap tuples to maybe be missing
>> index tuples, or the other way around?".
>
> That's imo pretty much what progress LSN currently describes. Have there
> been any records which are important for durability/consistency and
> hence need to be archived and such.
>
>
>> What I think "progress LSN"
>> is getting at -- actually fairly well -- is whether we're getting
>> anything *important* done, not whether we are consistent.  I don't
>> mind changing the name, but not to consistency LSN.
>
> Well, progress could just as well be replay. Or the actual insertion
> point. Or up to where we've written out. Or synced out. Or
> replicated
>
> Open to other suggestions - I'm not really happy with consistency LSN,
> but definitely unhappy with progress LSN.
>

last_essential_LSN?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread Michael Paquier
On Thu, Dec 22, 2016 at 6:41 AM, David Steele  wrote:
> On 12/21/16 4:28 PM, Andres Freund wrote:
>
>> Working on committing this (tomorrow morning, not tonight).  There's
>> some relatively minor things I want to change:

Thanks for looking at this patch.

>> - I don't like the name XLogSetFlags() - it's completely unclear what
>>   that those flags refer to - it could just as well be replay
>>   related. XLogSetRecordFlags()?
>
> That sounds a bit more clear.

Fine for me.

>> - Similarly I don't like the name "progress LSN" much. What does
>>   "progress" really mean in that". Maybe "consistency LSN"?
>
> Yes, please.  I think that really cuts to the core of what the patch is
> about.  Progress made perfect sense to me, but consistency is always the
> goal, and what we are saying here is that this is the last xlog record that
> is required to achieve consistency.  Anything that happens to be after it is
> informational only.

Fine as well.

>> - It's currently required to avoid triggering archive timeouts and
>>   checkpoints triggering each other, but I'm nervous marking all xlog
>>   switches as unimportant. I think it'd be better to only mark timeout
>>   triggered switches as such.
>
> That seems fine to me.  If the system is truly idle that might trigger one
> more xlog switch that is needed, but it seems like a reasonable compromise.

On a long-running embedded system the difference won't matter much. So
I guess I'm fine with this bit as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread Michael Paquier
On Thu, Dec 22, 2016 at 6:28 AM, Andres Freund  wrote:
> A mime-type of invalid/octet-stream? That's an, uh, odd choice.

Indeed. I am not sure what kind of accident happened here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread David Steele

On 12/21/16 4:40 PM, Andres Freund wrote:

On 2016-12-21 16:35:28 -0500, Robert Haas wrote:



What I think "progress LSN"
is getting at -- actually fairly well -- is whether we're getting
anything *important* done, not whether we are consistent.  I don't
mind changing the name, but not to consistency LSN.


Well, progress could just as well be replay. Or the actual insertion
point. Or up to where we've written out. Or synced out. Or
replicated

Open to other suggestions - I'm not really happy with consistency LSN,
but definitely unhappy with progress LSN.


MinConsistencyLSN?

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread David G. Johnston
On Wed, Dec 21, 2016 at 2:40 PM, Andres Freund  wrote:

> That's imo pretty much what progress LSN currently describes. Have there
> been any records which are important for durability/consistency and
> hence need to be archived and such.
>

The above, to me, describes a "milestone LSN"...​

David J.


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread David Steele

Hi Andres,

On 12/21/16 4:28 PM, Andres Freund wrote:


Working on committing this (tomorrow morning, not tonight).  There's
some relatively minor things I want to change:

- I don't like the name XLogSetFlags() - it's completely unclear what
  that those flags refer to - it could just as well be replay
  related. XLogSetRecordFlags()?


That sounds a bit more clear.


- Similarly I don't like the name "progress LSN" much. What does
  "progress" really mean in that". Maybe "consistency LSN"?


Yes, please.  I think that really cuts to the core of what the patch is 
about.  Progress made perfect sense to me, but consistency is always the 
goal, and what we are saying here is that this is the last xlog record 
that is required to achieve consistency.  Anything that happens to be 
after it is informational only.



- It's currently required to avoid triggering archive timeouts and
  checkpoints triggering each other, but I'm nervous marking all xlog
  switches as unimportant. I think it'd be better to only mark timeout
  triggered switches as such.


That seems fine to me.  If the system is truly idle that might trigger 
one more xlog switch that is needed, but it seems like a reasonable 
compromise.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread Andres Freund
On 2016-12-21 16:35:28 -0500, Robert Haas wrote:
> On Wed, Dec 21, 2016 at 4:28 PM, Andres Freund  wrote:
> > - Similarly I don't like the name "progress LSN" much. What does
> >   "progress" really mean in that". Maybe "consistency LSN"?
> 
> Whoa.  -1 from me for "consistency LSN".  Consistency has to with
> whether the cluster has recovered up to the minimum recovery point or
> whatever -- that is -- questions like "am i going to run into torn
> pages?" and "should I expect some heap tuples to maybe be missing
> index tuples, or the other way around?".

That's imo pretty much what progress LSN currently describes. Have there
been any records which are important for durability/consistency and
hence need to be archived and such.


> What I think "progress LSN"
> is getting at -- actually fairly well -- is whether we're getting
> anything *important* done, not whether we are consistent.  I don't
> mind changing the name, but not to consistency LSN.

Well, progress could just as well be replay. Or the actual insertion
point. Or up to where we've written out. Or synced out. Or
replicated

Open to other suggestions - I'm not really happy with consistency LSN,
but definitely unhappy with progress LSN.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread Robert Haas
On Wed, Dec 21, 2016 at 4:28 PM, Andres Freund  wrote:
> - Similarly I don't like the name "progress LSN" much. What does
>   "progress" really mean in that". Maybe "consistency LSN"?

Whoa.  -1 from me for "consistency LSN".  Consistency has to with
whether the cluster has recovered up to the minimum recovery point or
whatever -- that is -- questions like "am i going to run into torn
pages?" and "should I expect some heap tuples to maybe be missing
index tuples, or the other way around?".  What I think "progress LSN"
is getting at -- actually fairly well -- is whether we're getting
anything *important* done, not whether we are consistent.  I don't
mind changing the name, but not to consistency LSN.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-21 Thread Andres Freund
Hi,

A mime-type of invalid/octet-stream? That's an, uh, odd choice.

Working on committing this (tomorrow morning, not tonight).  There's
some relatively minor things I want to change:

- I don't like the name XLogSetFlags() - it's completely unclear what
  that those flags refer to - it could just as well be replay
  related. XLogSetRecordFlags()?
- Similarly I don't like the name "progress LSN" much. What does
  "progress" really mean in that". Maybe "consistency LSN"?
- It's currently required to avoid triggering archive timeouts and
  checkpoints triggering each other, but I'm nervous marking all xlog
  switches as unimportant. I think it'd be better to only mark timeout
  triggered switches as such.

Otherwise this seems to look good.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-03 Thread Amit Kapila
On Fri, Dec 2, 2016 at 9:50 AM, Michael Paquier
 wrote:
> On Wed, Nov 30, 2016 at 7:53 PM, Amit Kapila  wrote:
>>> + * Switch segment only when WAL has done some progress since the
>> + * > last time a segment has switched because of a timeout.
>>
>>> + if (GetProgressRecPtr() > last_switch_lsn)
>>
>> Either the above comment is wrong or the code after it has a problem.
>> last_switch_lsn aka XLogCtl->lastSegSwitchLSN is updated not only for
>> a timeout but also when there is a lot of WAL activity which makes WAL
>> Write to cross a segment boundary.
>
> Right, this should be reworded a bit better to mention both. Done as attached.
>

+ * Switch segment only when WAL has done some progress since the
+ * last time a segment has switched because of a timeout or because
+ * of some WAL activity.

I think it could be better written as below, but it is up to you to
retain your version or use below one.

Switch segment only when WAL has done some progress since the last
time a segment has switched due to timeout or WAL activity.  Apart
from that patch looks good to me.

Note to Committer:  As discussed above [1], this patch skips logging
for LogAccessExclusiveLocks which can be called from multiple places,
so for clarity purpose either we should document it or skip it only
when absolutely necessary.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1KJAXA3PdxH4T1QJKBNOvyUK8UKm_GCvTuT%2BFC5jpjmjg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-01 Thread Michael Paquier
On Wed, Nov 30, 2016 at 7:53 PM, Amit Kapila  wrote:
> On Mon, Nov 21, 2016 at 11:08 AM, Michael Paquier
>  wrote:
>> On Tue, Nov 15, 2016 at 9:59 PM, Amit Kapila  wrote:
>>> On Tue, Nov 15, 2016 at 9:27 AM, Kyotaro HORIGUCHI
>>>  wrote:
 The term "WAL activity' is used in the comment for
 GetProgressRecPtr. Its meaning is not clear but not well
 defined. Might need a bit detailed explanation about that or "WAL
 activity tracking". What do you think about this?

>>>
>>> I would have written it as below:
>>>
>>> GetProgressRecPtr -- Returns the WAL progress.  WAL progress is
>>> determined by scanning each WALinsertion lock by taking directly the
>>> light-weight lock associated to it.
>>
>> Not sure if that's better.. What about something as fancy as that?
>>  /*
>> - * Get the time of the last xlog segment switch
>> + * GetProgressRecPtr -- Returns the newest WAL progress position.  WAL
>> + * progress is determined by scanning each WALinsertion lock by taking
>> + * directly the light-weight lock associated to it.  The result of this
>> + * routine can be compared with the last checkpoint LSN to check if
>> + * a checkpoint can be skipped or not.
>> + *
>> It may be worth mentioning that the result of this routine is
>> basically used for checkpoint skip logic.
>>
>
> That's okay, but I think you are using it to skip switch segment stuff
> as well.  Today, again going through patch, I noticed small anomaly
>
>> + * Switch segment only when WAL has done some progress since the
> + * > last time a segment has switched because of a timeout.
>
>> + if (GetProgressRecPtr() > last_switch_lsn)
>
> Either the above comment is wrong or the code after it has a problem.
> last_switch_lsn aka XLogCtl->lastSegSwitchLSN is updated not only for
> a timeout but also when there is a lot of WAL activity which makes WAL
> Write to cross a segment boundary.

Right, this should be reworded a bit better to mention both. Done as attached.
-- 
Michael


hs-checkpoints-v19.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-30 Thread Amit Kapila
On Mon, Nov 21, 2016 at 11:08 AM, Michael Paquier
 wrote:
> On Tue, Nov 15, 2016 at 9:59 PM, Amit Kapila  wrote:
>> On Tue, Nov 15, 2016 at 9:27 AM, Kyotaro HORIGUCHI
>>  wrote:
>>> The term "WAL activity' is used in the comment for
>>> GetProgressRecPtr. Its meaning is not clear but not well
>>> defined. Might need a bit detailed explanation about that or "WAL
>>> activity tracking". What do you think about this?
>>>
>>
>> I would have written it as below:
>>
>> GetProgressRecPtr -- Returns the WAL progress.  WAL progress is
>> determined by scanning each WALinsertion lock by taking directly the
>> light-weight lock associated to it.
>
> Not sure if that's better.. What about something as fancy as that?
>  /*
> - * Get the time of the last xlog segment switch
> + * GetProgressRecPtr -- Returns the newest WAL progress position.  WAL
> + * progress is determined by scanning each WALinsertion lock by taking
> + * directly the light-weight lock associated to it.  The result of this
> + * routine can be compared with the last checkpoint LSN to check if
> + * a checkpoint can be skipped or not.
> + *
> It may be worth mentioning that the result of this routine is
> basically used for checkpoint skip logic.
>

That's okay, but I think you are using it to skip switch segment stuff
as well.  Today, again going through patch, I noticed small anomaly

> + * Switch segment only when WAL has done some progress since the
+ * > last time a segment has switched because of a timeout.

> + if (GetProgressRecPtr() > last_switch_lsn)

Either the above comment is wrong or the code after it has a problem.
last_switch_lsn aka XLogCtl->lastSegSwitchLSN is updated not only for
a timeout but also when there is a lot of WAL activity which makes WAL
Write to cross a segment boundary.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-28 Thread Michael Paquier
On Tue, Nov 22, 2016 at 6:27 PM, Kyotaro HORIGUCHI
 wrote:
> I have marked this as ready for committer again.

And moved to next CF for now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-22 Thread Kyotaro HORIGUCHI
I almost forgot this.

At Mon, 21 Nov 2016 15:44:08 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20161121.154408.47398334.horiguchi.kyot...@lab.ntt.co.jp>
> Hello,
> 
> At Mon, 21 Nov 2016 14:41:27 +0900, Michael Paquier 
>  wrote in 
> 
> > On Mon, Nov 21, 2016 at 1:31 PM, Kyotaro HORIGUCHI
> >  wrote:
> > > So, all my original concern were cleared.
> > 
> > Cool. Perhaps this could be marked as ready for committer then?
> 
> ^^;
> 
> > > The last one is
> > > resetting by a checkpointer restart.. I'd like to remove that if
> > > Andres agrees.
> > 
> > Could you clarify this point? v18 makes sure that the last segment
> > switch stays in shared memory so as we could still skip the activity
> > of archive_timeout correctly.
> 
> I don't doubt that it works. (I don't comment on the comment:) My
> concern is complexity. I don't think we wish to save almost no
> harm behavior caused by a thing rarely happens.  But, if you and
> others on this thread don't mind the complexity, It's not worth
> asserting myself more.
> 
> So, after a day waiting, I'll mark this as ready for committer
> again.

I have marked this as ready for committer again.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-20 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 21 Nov 2016 14:41:27 +0900, Michael Paquier  
wrote in 
> On Mon, Nov 21, 2016 at 1:31 PM, Kyotaro HORIGUCHI
>  wrote:
> > So, all my original concern were cleared.
> 
> Cool. Perhaps this could be marked as ready for committer then?

^^;

> > The last one is
> > resetting by a checkpointer restart.. I'd like to remove that if
> > Andres agrees.
> 
> Could you clarify this point? v18 makes sure that the last segment
> switch stays in shared memory so as we could still skip the activity
> of archive_timeout correctly.

I don't doubt that it works. (I don't comment on the comment:) My
concern is complexity. I don't think we wish to save almost no
harm behavior caused by a thing rarely happens.  But, if you and
others on this thread don't mind the complexity, It's not worth
asserting myself more.

So, after a day waiting, I'll mark this as ready for committer
again.

reagards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-20 Thread Michael Paquier
On Mon, Nov 21, 2016 at 1:31 PM, Kyotaro HORIGUCHI
 wrote:
> So, all my original concern were cleared.

Cool. Perhaps this could be marked as ready for committer then?

> The last one is
> resetting by a checkpointer restart.. I'd like to remove that if
> Andres agrees.

Could you clarify this point? v18 makes sure that the last segment
switch stays in shared memory so as we could still skip the activity
of archive_timeout correctly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-20 Thread Michael Paquier
On Tue, Nov 15, 2016 at 9:59 PM, Amit Kapila  wrote:
> On Tue, Nov 15, 2016 at 9:27 AM, Kyotaro HORIGUCHI
>  wrote:
>> The term "WAL activity' is used in the comment for
>> GetProgressRecPtr. Its meaning is not clear but not well
>> defined. Might need a bit detailed explanation about that or "WAL
>> activity tracking". What do you think about this?
>>
>
> I would have written it as below:
>
> GetProgressRecPtr -- Returns the WAL progress.  WAL progress is
> determined by scanning each WALinsertion lock by taking directly the
> light-weight lock associated to it.

Not sure if that's better.. What about something as fancy as that?
 /*
- * Get the time of the last xlog segment switch
+ * GetProgressRecPtr -- Returns the newest WAL progress position.  WAL
+ * progress is determined by scanning each WALinsertion lock by taking
+ * directly the light-weight lock associated to it.  The result of this
+ * routine can be compared with the last checkpoint LSN to check if
+ * a checkpoint can be skipped or not.
+ *
It may be worth mentioning that the result of this routine is
basically used for checkpoint skip logic.
-- 
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index adab2f8..d2a8ec2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2826,17 +2826,16 @@ include_dir 'conf.d'
 parameter is greater than zero, the server will switch to a new
 segment file whenever this many seconds have elapsed since the last
 segment file switch, and there has been any database activity,
-including a single checkpoint.  (Increasing
-checkpoint_timeout will reduce unnecessary
-checkpoints on an idle system.)
-Note that archived files that are closed early
-due to a forced switch are still the same length as completely full
-files.  Therefore, it is unwise to use a very short
-archive_timeout  it will bloat your archive
-storage.  archive_timeout settings of a minute or so are
-usually reasonable.  You should consider using streaming replication,
-instead of archiving, if you want data to be copied off the master
-server more quickly than that.
+including a single checkpoint.  Checkpoints can however be skipped
+if there is no database activity, making this parameter a safe
+setting for environments which are idle for a long period of time.
+Note that archived files that are closed early due to a forced switch
+are still the same length as completely full files.  Therefore, it is
+unwise to use a very short archive_timeout  it will
+bloat your archive storage.  archive_timeout settings of
+a minute or so are usually reasonable.  You should consider using
+streaming replication, instead of archiving, if you want data to
+be copied off the master server more quickly than that.
 This parameter can only be set in the
 postgresql.conf file or on the server command line.

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b019bc1..ac40731 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2507,7 +2507,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
heaptup->t_len - 
SizeofHeapTupleHeader);
 
/* filtering by origin on a row level is much more efficient */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
recptr = XLogInsert(RM_HEAP_ID, info);
 
@@ -2846,7 +2846,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, 
int ntuples,
XLogRegisterBufData(0, tupledata, totaldatalen);
 
/* filtering by origin on a row level is much more 
efficient */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
recptr = XLogInsert(RM_HEAP2_ID, info);
 
@@ -3308,7 +3308,7 @@ l1:
}
 
/* filtering by origin on a row level is much more efficient */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE);
 
@@ -6035,7 +6035,7 @@ heap_finish_speculative(Relation relation, HeapTuple 
tuple)
XLogBeginInsert();
 
/* We want the same filtering on this as on a plain insert */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
XLogRegisterData((char *) , SizeOfHeapConfirm);
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
@@ -7703,7 +7703,7 @@ log_heap_update(Relation reln, Buffer oldbuf,
}
 
/* filtering by origin on a row level is much more efficient */
-

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-20 Thread Kyotaro HORIGUCHI
Thank you very much for the testing on the nice machine.

At Fri, 18 Nov 2016 20:35:43 -0800, Michael Paquier  
wrote in 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-19 Thread David Steele
On 11/18/16 12:38 PM, David Steele wrote:

> On 11/14/16 4:29 AM, Michael Paquier wrote:
>> On Mon, Nov 14, 2016 at 6:10 PM, Kyotaro HORIGUCHI
>>> If I'm not missing something, at the worst we have a checkpoint
>>> after a checkpointer restart that should have been supressed. Is
>>> it worth picking it up for the complexity?
> 
> That's the way I read it as well.  It's not clear to me how the
> checkpointer would get restarted under normal circumstances.
> 
> I did a kill on the checkpointer and it was ignored.  After a kill -9
> the checkpointer process came back but also switched the xlog.  Is this
> the expected behavior?

Ah, never mind.  I can see this caused a restart and recovery so the
archive timeout was reset and a switch occurred after timeout.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-18 Thread Michael Paquier
On Fri, Nov 18, 2016 at 7:00 PM, Amit Kapila  wrote:
> Okay, I have done some performance tests with this patch and found that it 
> doesn't have any noticeable impact which is good.  Details of performance 
> tests is below:
> Machine configuration:
> 2 sockets, 28 cores (56 including Hyper-Threading)
> RAM = 64GB
> Data directory is configured on the magnetic disk and WAL on SSD.

Nice spec!

> The conclusion from my tests is that this patch is okay as far as performance 
> is concerned.

Thank you a lot for doing those additional tests!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-18 Thread Amit Kapila
On Mon, Nov 14, 2016 at 9:33 AM, Michael Paquier 
wrote:
> On Mon, Nov 14, 2016 at 12:49 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Sat, 12 Nov 2016 10:28:56 +0530, Amit Kapila 
wrote in 
>>> I think it is good to check the performance impact of this patch on
>>> many core m/c.  Is it possible for you to once check with Alexander
>>> Korotkov to see if he can provide you access to his powerful m/c which
>>> has 70 cores (if I remember correctly)?
>
> I heard about a number like that, and there is no reason to not do
> tests to be sure.
>

Okay, I have done some performance tests with this patch and found that it
doesn't have any noticeable impact which is good.  Details of performance
tests is below:
Machine configuration:
2 sockets, 28 cores (56 including Hyper-Threading)
RAM = 64GB
Data directory is configured on the magnetic disk and WAL on SSD.

Non-default postgresql.conf parameters
shared_buffers=8GB
max_connections=200
bgwriter_delay=10ms
checkpoint_completion_target=0

Keeping above parameters as fixed, I have varied checkpoint_timeout for
various tests.  Each of the below results is a median of 3, 15min pgbench
TPC-B tests.  All the tests are performed at 64 and or 128 client-count
(Client Count = number of concurrent sessions and threads (ex. -c 8 -j
8)).  All the tests are done for pgbench scale factor - 300 which means
data fits in shared buffers.


checkpoint_timeout=30s
client_count/patch_ver 64 128
HEAD 5176 6853
Patch 4963 6556
checkpoint_timeout=60s
client_count/patch_ver
64 128
HEAD 4962 6894
Patch 5228 6814
checkpoint_timeout=120s
client_count/patch_ver
64 128
HEAD 5443 7308
Patch 5453 6937
checkpoint_timeout=150s
client_count/patch_ver
128
HEAD 7316
Patch 7188


In above results, you can see that in some cases (example, for
checkpoint_time=30s, @128-client count) TPS with the patch is slightly
lower(1~5%), but I find that as a run-to-run variation, because on
repeating the tests, I could not see such regression.  The reason of
keeping low values for checkpoint_timeout and bgwriter_delay is to test if
there is any impact due to new locking introduced in checkpointer and
bgwriter.  The conclusion from my tests is that this patch is okay as far
as performance is concerned.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-18 Thread David Steele
On 11/14/16 4:29 AM, Michael Paquier wrote:
> On Mon, Nov 14, 2016 at 6:10 PM, Kyotaro HORIGUCHI
>  wrote:
>> It applies the master and compiled cleanly and no error by
>> regtest.  (I didn't confirmed that the problem is still fixed but
>> seemingly no problem)
> 
> Thanks for double-checking.

Also looks good to me.  I like curinsert_flags and XLOG_SKIP_PROGRESS
better than the old names.

>> If I'm not missing something, at the worst we have a checkpoint
>> after a checkpointer restart that should have been supressed. Is
>> it worth picking it up for the complexity?

That's the way I read it as well.  It's not clear to me how the
checkpointer would get restarted under normal circumstances.

I did a kill on the checkpointer and it was ignored.  After a kill -9
the checkpointer process came back but also switched the xlog.  Is this
the expected behavior?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-15 Thread Amit Kapila
On Tue, Nov 15, 2016 at 9:27 AM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Mon, 14 Nov 2016 16:53:35 +0530, Amit Kapila  
> wrote in
>> >
>>
>> The progress parameter is used not only for checkpoint activity but by
>> bgwriter as well for logging standby snapshot.  If you want to keep
>> this record under no_progress category (which I don't endorse), then
>> it might be better to add a comment, so that it will be easier for the
>> readers of this code to understand the reason.
>
> I rather agree to that. But how detailed it should be?
>
>>LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks)
>>{
>>...
>>   XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
>>   /* Needs XLOG_SKIP_PROGRESS because called from LogStandbySnapshot */
>>   XLogSetFlags(XLOG_SKIP_PROGRESS);
>
> or
>
>>   /*
>>* Needs XLOG_SKIP_PROGRESS because called from LogStandbySnapshot.
>>* See the comment for LogCurrentRunningXact for the detail.
>>*/
>
> or more detiled?
>

I think referring to a place where we have explained why skipping XLOG
progress is okay for this or related WAL records (like comments for
struct WALInsertLock)  will be more suitable.  Also, maybe it is worth
mentioning that this code will skip updating XLOG progress even when
we want to log AccessExclusiveLocks for operations other than a
snapshot.


> The term "WAL activity' is used in the comment for
> GetProgressRecPtr. Its meaning is not clear but not well
> defined. Might need a bit detailed explanation about that or "WAL
> activity tracking". What do you think about this?
>

I would have written it as below:

GetProgressRecPtr -- Returns the WAL progress.  WAL progress is
determined by scanning each WALinsertion lock by taking directly the
light-weight lock associated to it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-14 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 14 Nov 2016 16:53:35 +0530, Amit Kapila  wrote 
in 
> On Mon, Nov 14, 2016 at 9:33 AM, Michael Paquier
> >>> Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks?
> >>> This function is called not only in LogStandbySnapshot(), but during
> >>> DDL operations as well where lockmode >= AccessExclusiveLock.
> >>
> >> This does not remove any record from WAL. So theoretically any
> >> kind of record can be NO_PROGRESS, but practically as long as
> >> checkpoints are not unreasonably suppressed. Any explicit
> >> database operation must be accompanied with at least commit
> >> record that triggers checkpoint. NO_PROGRESSing there doesn't
> >> seem to me to harm database durability for this reason.
> >>
> 
> By this theory, you can even mark the insert record as no progress
> which is not good.

Of course. So we carefully choose the kinds of records to be
so. If we mark all xlog records to be SKIP_PROGRESS,
archive_timeout gets useless and as its result vacuum may leave
certain number of records not removed for maybe problematic time.

> >> The objective of this patch is skipping WALs on completely-idle
> >> state and the NO_PROGRESSing is necessary to do its work. Of
> >> course we can distinguish exclusive lock with PROGRESS and
> >> without PROGRESS but it is unnecessary complexity.
> >
> > The point that applies here is that logging the exclusive lock
> > information is necessary for the *standby* recovery conflicts, not the
> > primary  which is why it should not influence the checkpoint activity
> > that is happening on the primary. So marking this record with
> > NO_PROGRESS is actually fine to me.
> >
> 
> The progress parameter is used not only for checkpoint activity but by
> bgwriter as well for logging standby snapshot.  If you want to keep
> this record under no_progress category (which I don't endorse), then
> it might be better to add a comment, so that it will be easier for the
> readers of this code to understand the reason.

I rather agree to that. But how detailed it should be?

>LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks)
>{
>...
>   XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
>   /* Needs XLOG_SKIP_PROGRESS because called from LogStandbySnapshot */
>   XLogSetFlags(XLOG_SKIP_PROGRESS);

or 

>   /*
>* Needs XLOG_SKIP_PROGRESS because called from LogStandbySnapshot.
>* See the comment for LogCurrentRunningXact for the detail.
>*/

or more detiled?

The term "WAL activity' is used in the comment for
GetProgressRecPtr. Its meaning is not clear but not well
defined. Might need a bit detailed explanation about that or "WAL
activity tracking". What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-14 Thread Amit Kapila
On Mon, Nov 14, 2016 at 9:33 AM, Michael Paquier
 wrote:
> On Mon, Nov 14, 2016 at 12:49 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Sat, 12 Nov 2016 10:28:56 +0530, Amit Kapila  
>> wrote in 
>>> I think it is good to check the performance impact of this patch on
>>> many core m/c.  Is it possible for you to once check with Alexander
>>> Korotkov to see if he can provide you access to his powerful m/c which
>>> has 70 cores (if I remember correctly)?
>
> I heard about a number like that, and there is no reason to not do
> tests to be sure. With that many cores we are more likely going to see
> the limitation of the number of XLOG insert slots popping up as a
> bottleneck, but that's just an assumption without any numbers.
> Alexander (added in CC), could it be possible to get an access to this
> machine?
>
>>> @@ -1035,6 +1038,7 @@ LogAccessExclusiveLocks(int nlocks,
>>> xl_standby_lock *locks)
>>>   XLogBeginInsert();
>>>   XLogRegisterData((char *) , offsetof(xl_standby_locks, locks));
>>>   XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
>>> + XLogSetFlags(XLOG_NO_PROGRESS);
>>>
>>>
>>> Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks?
>>> This function is called not only in LogStandbySnapshot(), but during
>>> DDL operations as well where lockmode >= AccessExclusiveLock.
>>
>> This does not remove any record from WAL. So theoretically any
>> kind of record can be NO_PROGRESS, but practically as long as
>> checkpoints are not unreasonably suppressed. Any explicit
>> database operation must be accompanied with at least commit
>> record that triggers checkpoint. NO_PROGRESSing there doesn't
>> seem to me to harm database durability for this reason.
>>

By this theory, you can even mark the insert record as no progress
which is not good.

>> The objective of this patch is skipping WALs on completely-idle
>> state and the NO_PROGRESSing is necessary to do its work. Of
>> course we can distinguish exclusive lock with PROGRESS and
>> without PROGRESS but it is unnecessary complexity.
>
> The point that applies here is that logging the exclusive lock
> information is necessary for the *standby* recovery conflicts, not the
> primary  which is why it should not influence the checkpoint activity
> that is happening on the primary. So marking this record with
> NO_PROGRESS is actually fine to me.
>

The progress parameter is used not only for checkpoint activity but by
bgwriter as well for logging standby snapshot.  If you want to keep
this record under no_progress category (which I don't endorse), then
it might be better to add a comment, so that it will be easier for the
readers of this code to understand the reason.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-14 Thread Michael Paquier
On Mon, Nov 14, 2016 at 6:10 PM, Kyotaro HORIGUCHI
 wrote:
> It applies the master and compiled cleanly and no error by
> regtest.  (I didn't confirmed that the problem is still fixed but
> seemingly no problem)

Thanks for double-checking.

> If I'm not missing something, at the worst we have a checkpoint
> after a checkpointer restart that should have been supressed. Is
> it worth picking it up for the complexity?

I think so, that's not that much code if you think about it as there
is already a routine to get the timestamp of the lastly switched
segment that gets used by checkpointer.c.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-14 Thread Kyotaro HORIGUCHI
Hello,

It applies the master and compiled cleanly and no error by
regtest.  (I didn't confirmed that the problem is still fixed but
seemingly no problem)

At Mon, 14 Nov 2016 15:09:09 +0900, Michael Paquier  
wrote in 
> On Sat, Nov 12, 2016 at 9:01 PM, Andres Freund  wrote:
> > On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:
> >
> >> + * This takes also
> >> + * advantage to avoid 8-byte torn reads on some platforms by using the
> >> + * fact that each insert lock is located on the same cache line.
> >
> > Something residing on the same cache line doens't provide that guarantee
> > on all platforms.
> 
> OK. Let's remove it then.
> 
> >> + * XXX: There is still room for more improvements here, particularly
> >> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
> >> + * update the progress LSN as those relations are reset during crash
> >> + * recovery so enforcing buffers of such relations to be flushed for
> >> + * example in the case of a load only on unlogged relations is a waste
> >> + * of disk write.
> >
> > Commit records still have to be written, everything else doesn't write
> > WAL. So I'm doubtful this matters much?
> 
> Hm, okay. In most cases this may not matter... Let's rip that off.
> 
> >> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr 
> >> fpw_lsn)
> >>   inserted = true;
> >>   }
> >>
> >> + /*
> >> +  * Update the LSN progress positions. At least one WAL insertion lock
> >> +  * is already taken appropriately before doing that, and it is 
> >> simpler
> >> +  * to do that here when the WAL record data and type are at hand.
> >
> > But we don't use the "WAL record data and type"?
> 
> Yes, at some point this patch did so...
> 
> >> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
> >> + * other words any activity not referring to standby logging or segment
> >> + * switches. Finding the last activity position is done by scanning each
> >> + * WAL insertion lock by taking directly the light-weight lock associated
> >> + * to it.
> >> + */
> >
> > I'd prefer not to list the specific records here - that's just
> > guaranteed to get out of date. Why not say something "any activity not
> > requiring a checkpoint to be triggered" or such?
> 
> OK. Makes sense to minimize maintenance.
> 
> >> +  * If this isn't a shutdown or forced checkpoint, and if there has 
> >> been no
> >> +  * WAL activity, skip the checkpoint.  The idea here is to avoid 
> >> inserting
> >> +  * duplicate checkpoints when the system is idle. That wastes log 
> >> space,
> >> +  * and more importantly it exposes us to possible loss of both 
> >> current and
> >> +  * previous checkpoint records if the machine crashes just as we're 
> >> writing
> >> +  * the update.
> >
> > Shouldn't this mention archiving and also that we also ignore some forms
> > of WAL activity?
> 
> I have reworded that as:
> "If this isn't a shutdown or forced checkpoint, and if there has been
> no WAL activity requiring a checkpoint, skip it."
> 
> >> -/* Should the in-progress insertion log the origin? */
> >> -static bool include_origin = false;
> >> +/* status flags of the in-progress insertion */
> >> +static uint8 status_flags = 0;
> >
> > that seems a bit too generic a name. 'curinsert_flags'?
> 
> OK.
> 
> >>   /*
> >> -  * only log if enough time has passed and some xlog 
> >> record has
> >> -  * been inserted.
> >> +  * Only log if enough time has passed, that some WAL 
> >> activity
> >> +  * has happened since last checkpoint, and that some 
> >> new WAL
> >> +  * records have been inserted since the last time we 
> >> came here.
> >
> > I think that sentence needs some polish.
> 
> Let's do this better:
> /*
> -* only log if enough time has passed and some xlog record has
> -* been inserted.
> +* Only log if one of the following conditions is satisfied since
> +* the last time we came here::
> +* - timeout has been reached.
> +* - WAL activity has happened since last checkpoint.
> +* - New WAL records have been inserted.
>  */
> 
> >>*/
> >>   if (now >= timeout &&
> >> - last_snapshot_lsn != GetXLogInsertRecPtr())
> >> + GetLastCheckpointRecPtr() < 
> >> current_progress_lsn &&
> >> + last_progress_lsn < current_progress_lsn)
> >>   {
> >
> > Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
> > Don't we need to do the comparisons here (and when doing the checkpoint
> > itself) with the REDO pointer 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-13 Thread Michael Paquier
On Sat, Nov 12, 2016 at 9:01 PM, Andres Freund  wrote:
> On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:
>
>> + * This takes also
>> + * advantage to avoid 8-byte torn reads on some platforms by using the
>> + * fact that each insert lock is located on the same cache line.
>
> Something residing on the same cache line doens't provide that guarantee
> on all platforms.

OK. Let's remove it then.

>> + * XXX: There is still room for more improvements here, particularly
>> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
>> + * update the progress LSN as those relations are reset during crash
>> + * recovery so enforcing buffers of such relations to be flushed for
>> + * example in the case of a load only on unlogged relations is a waste
>> + * of disk write.
>
> Commit records still have to be written, everything else doesn't write
> WAL. So I'm doubtful this matters much?

Hm, okay. In most cases this may not matter... Let's rip that off.

>> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr 
>> fpw_lsn)
>>   inserted = true;
>>   }
>>
>> + /*
>> +  * Update the LSN progress positions. At least one WAL insertion lock
>> +  * is already taken appropriately before doing that, and it is simpler
>> +  * to do that here when the WAL record data and type are at hand.
>
> But we don't use the "WAL record data and type"?

Yes, at some point this patch did so...

>> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
>> + * other words any activity not referring to standby logging or segment
>> + * switches. Finding the last activity position is done by scanning each
>> + * WAL insertion lock by taking directly the light-weight lock associated
>> + * to it.
>> + */
>
> I'd prefer not to list the specific records here - that's just
> guaranteed to get out of date. Why not say something "any activity not
> requiring a checkpoint to be triggered" or such?

OK. Makes sense to minimize maintenance.

>> +  * If this isn't a shutdown or forced checkpoint, and if there has 
>> been no
>> +  * WAL activity, skip the checkpoint.  The idea here is to avoid 
>> inserting
>> +  * duplicate checkpoints when the system is idle. That wastes log 
>> space,
>> +  * and more importantly it exposes us to possible loss of both current 
>> and
>> +  * previous checkpoint records if the machine crashes just as we're 
>> writing
>> +  * the update.
>
> Shouldn't this mention archiving and also that we also ignore some forms
> of WAL activity?

I have reworded that as:
"If this isn't a shutdown or forced checkpoint, and if there has been
no WAL activity requiring a checkpoint, skip it."

>> -/* Should the in-progress insertion log the origin? */
>> -static bool include_origin = false;
>> +/* status flags of the in-progress insertion */
>> +static uint8 status_flags = 0;
>
> that seems a bit too generic a name. 'curinsert_flags'?

OK.

>>   /*
>> -  * only log if enough time has passed and some xlog 
>> record has
>> -  * been inserted.
>> +  * Only log if enough time has passed, that some WAL 
>> activity
>> +  * has happened since last checkpoint, and that some 
>> new WAL
>> +  * records have been inserted since the last time we 
>> came here.
>
> I think that sentence needs some polish.

Let's do this better:
/*
-* only log if enough time has passed and some xlog record has
-* been inserted.
+* Only log if one of the following conditions is satisfied since
+* the last time we came here::
+* - timeout has been reached.
+* - WAL activity has happened since last checkpoint.
+* - New WAL records have been inserted.
 */

>>*/
>>   if (now >= timeout &&
>> - last_snapshot_lsn != GetXLogInsertRecPtr())
>> + GetLastCheckpointRecPtr() < 
>> current_progress_lsn &&
>> + last_progress_lsn < current_progress_lsn)
>>   {
>
> Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
> Don't we need to do the comparisons here (and when doing the checkpoint
> itself) with the REDO pointer of the last checkpoint?

Hm? The progress pointer is pointing to the lastly inserted LSN, which
is not the position of the REDO pointer, but the one of the checkpoint
record. Doing a comparison of the REDO pointer would be a moot
operation, because as the checkpoint completes, the progress LSN will
be updated as well. Or do you mean that the progress LSN should *not*
be updated for a checkpoint record? It seems to me that it should
but...

>> diff --git a/src/backend/postmaster/checkpointer.c 
>> b/src/backend/postmaster/checkpointer.c
>> 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-13 Thread Michael Paquier
On Mon, Nov 14, 2016 at 12:49 PM, Kyotaro HORIGUCHI
 wrote:
> At Sat, 12 Nov 2016 10:28:56 +0530, Amit Kapila  
> wrote in 
>> I think it is good to check the performance impact of this patch on
>> many core m/c.  Is it possible for you to once check with Alexander
>> Korotkov to see if he can provide you access to his powerful m/c which
>> has 70 cores (if I remember correctly)?

I heard about a number like that, and there is no reason to not do
tests to be sure. With that many cores we are more likely going to see
the limitation of the number of XLOG insert slots popping up as a
bottleneck, but that's just an assumption without any numbers.
Alexander (added in CC), could it be possible to get an access to this
machine?

>> @@ -1035,6 +1038,7 @@ LogAccessExclusiveLocks(int nlocks,
>> xl_standby_lock *locks)
>>   XLogBeginInsert();
>>   XLogRegisterData((char *) , offsetof(xl_standby_locks, locks));
>>   XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
>> + XLogSetFlags(XLOG_NO_PROGRESS);
>>
>>
>> Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks?
>> This function is called not only in LogStandbySnapshot(), but during
>> DDL operations as well where lockmode >= AccessExclusiveLock.
>
> This does not remove any record from WAL. So theoretically any
> kind of record can be NO_PROGRESS, but practically as long as
> checkpoints are not unreasonably suppressed. Any explicit
> database operation must be accompanied with at least commit
> record that triggers checkpoint. NO_PROGRESSing there doesn't
> seem to me to harm database durability for this reason.
>
> The objective of this patch is skipping WALs on completely-idle
> state and the NO_PROGRESSing is necessary to do its work. Of
> course we can distinguish exclusive lock with PROGRESS and
> without PROGRESS but it is unnecessary complexity.

The point that applies here is that logging the exclusive lock
information is necessary for the *standby* recovery conflicts, not the
primary  which is why it should not influence the checkpoint activity
that is happening on the primary. So marking this record with
NO_PROGRESS is actually fine to me.

> But rethinking about the above, the namging of "XLOG_NO_PROGRESS"
> might be inappropriate. "XLOG_NO_CKPT_TRIGGER" or any sainer name
> would be needed.

I got fond of NO_PROGRESS to be honest with the time, even if I don't
like much the negative meaning it has... Perhaps something like
XLOG_SKIP_PROGRESS would hold more meaning.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-13 Thread Kyotaro HORIGUCHI
Hello, thank you for the comment.

At Sat, 12 Nov 2016 10:28:56 +0530, Amit Kapila  wrote 
in 
> On Tue, Nov 8, 2016 at 5:18 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello,
> >
> >> on something else than LW_EXCLUSIVE compared to now. To keep things
> >> more simple I' would still favor using WALInsertLocks for this patch,
> >> that looks more consistent, and also because there is no noticeable
> >> difference.
> >
> > Ok, the patch looks fine. So there's nothing for me but to accept
> > the current shape since the discussion about performance seems
> > not to be settled with out performance measurement with machines
> > with many cores.
> >
> 
> I think it is good to check the performance impact of this patch on
> many core m/c.  Is it possible for you to once check with Alexander
> Korotkov to see if he can provide you access to his powerful m/c which
> has 70 cores (if I remember correctly)?
> 
> 
> @@ -1035,6 +1038,7 @@ LogAccessExclusiveLocks(int nlocks,
> xl_standby_lock *locks)
>   XLogBeginInsert();
>   XLogRegisterData((char *) , offsetof(xl_standby_locks, locks));
>   XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
> + XLogSetFlags(XLOG_NO_PROGRESS);
> 
> 
> Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks?
> This function is called not only in LogStandbySnapshot(), but during
> DDL operations as well where lockmode >= AccessExclusiveLock.

This does not remove any record from WAL. So theoretically any
kind of record can be NO_PROGRESS, but practically as long as
checkpoints are not unreasonably suppressed. Any explicit
database operation must be accompanied with at least commit
record that triggers checkpoint. NO_PROGRESSing there doesn't
seem to me to harm database durability for this reason.

The objective of this patch is skipping WALs on completely-idle
state and the NO_PROGRESSing is necessary to do its work. Of
course we can distinguish exclusive lock with PROGRESS and
without PROGRESS but it is unnecessary complexity.


But rethinking about the above, the namging of "XLOG_NO_PROGRESS"
might be inappropriate. "XLOG_NO_CKPT_TRIGGER" or any sainer name
would be needed.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-12 Thread Andres Freund
Hi,

On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:

> + * This takes also
> + * advantage to avoid 8-byte torn reads on some platforms by using the
> + * fact that each insert lock is located on the same cache line.

Something residing on the same cache line doens't provide that guarantee
on all platforms.


> + * XXX: There is still room for more improvements here, particularly
> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
> + * update the progress LSN as those relations are reset during crash
> + * recovery so enforcing buffers of such relations to be flushed for
> + * example in the case of a load only on unlogged relations is a waste
> + * of disk write.

Commit records still have to be written, everything else doesn't write
WAL. So I'm doubtful this matters much?

> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
>   inserted = true;
>   }
>  
> + /*
> +  * Update the LSN progress positions. At least one WAL insertion lock
> +  * is already taken appropriately before doing that, and it is simpler
> +  * to do that here when the WAL record data and type are at hand.

But we don't use the "WAL record data and type"?


> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
> + * other words any activity not referring to standby logging or segment
> + * switches. Finding the last activity position is done by scanning each
> + * WAL insertion lock by taking directly the light-weight lock associated
> + * to it.
> + */

I'd prefer not to list the specific records here - that's just
guaranteed to get out of date. Why not say something "any activity not
requiring a checkpoint to be triggered" or such?


> +  * If this isn't a shutdown or forced checkpoint, and if there has been 
> no
> +  * WAL activity, skip the checkpoint.  The idea here is to avoid 
> inserting
> +  * duplicate checkpoints when the system is idle. That wastes log space,
> +  * and more importantly it exposes us to possible loss of both current 
> and
> +  * previous checkpoint records if the machine crashes just as we're 
> writing
> +  * the update.

Shouldn't this mention archiving and also that we also ignore some forms
of WAL activity?

> -/* Should the in-progress insertion log the origin? */
> -static bool include_origin = false;
> +/* status flags of the in-progress insertion */
> +static uint8 status_flags = 0;

that seems a bit too generic a name. 'curinsert_flags'?

>  /*

> @@ -317,19 +317,23 @@ BackgroundWriterMain(void)
>   {
>   TimestampTz timeout = 0;
>   TimestampTz now = GetCurrentTimestamp();
> + XLogRecPtr  current_progress_lsn = 
> GetProgressRecPtr();
>  
>   timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
>   
>   LOG_SNAPSHOT_INTERVAL_MS);
>  
>   /*
> -  * only log if enough time has passed and some xlog 
> record has
> -  * been inserted.
> +  * Only log if enough time has passed, that some WAL 
> activity
> +  * has happened since last checkpoint, and that some 
> new WAL
> +  * records have been inserted since the last time we 
> came here.

I think that sentence needs some polish.


>*/
>   if (now >= timeout &&
> - last_snapshot_lsn != GetXLogInsertRecPtr())
> + GetLastCheckpointRecPtr() < 
> current_progress_lsn &&
> + last_progress_lsn < current_progress_lsn)
>   {

Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
Don't we need to do the comparisons here (and when doing the checkpoint
itself) with the REDO pointer of the last checkpoint?


> diff --git a/src/backend/postmaster/checkpointer.c 
> b/src/backend/postmaster/checkpointer.c
> index 397267c..7ecc00e 100644
> --- a/src/backend/postmaster/checkpointer.c
> +++ b/src/backend/postmaster/checkpointer.c
> @@ -164,6 +164,7 @@ static double ckpt_cached_elapsed;
>  
>  static pg_time_t last_checkpoint_time;
>  static pg_time_t last_xlog_switch_time;
> +static XLogRecPtr last_xlog_switch_lsn = InvalidXLogRecPtr;

Hm. Is it a good idea to use a static for this? Did you consider
checkpointer restarts?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-11 Thread Amit Kapila
On Tue, Nov 8, 2016 at 5:18 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
>> on something else than LW_EXCLUSIVE compared to now. To keep things
>> more simple I' would still favor using WALInsertLocks for this patch,
>> that looks more consistent, and also because there is no noticeable
>> difference.
>
> Ok, the patch looks fine. So there's nothing for me but to accept
> the current shape since the discussion about performance seems
> not to be settled with out performance measurement with machines
> with many cores.
>

I think it is good to check the performance impact of this patch on
many core m/c.  Is it possible for you to once check with Alexander
Korotkov to see if he can provide you access to his powerful m/c which
has 70 cores (if I remember correctly)?


@@ -1035,6 +1038,7 @@ LogAccessExclusiveLocks(int nlocks,
xl_standby_lock *locks)
  XLogBeginInsert();
  XLogRegisterData((char *) , offsetof(xl_standby_locks, locks));
  XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
+ XLogSetFlags(XLOG_NO_PROGRESS);


Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks?
This function is called not only in LogStandbySnapshot(), but during
DDL operations as well where lockmode >= AccessExclusiveLock.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-10 Thread Kyotaro HORIGUCHI
Thank you for the new patch.

At Fri, 11 Nov 2016 16:42:43 +0900, Michael Paquier  
wrote in 
> On Fri, Nov 11, 2016 at 12:28 AM, Stephen Frost  wrote:
> > We should probably include in here that we may skip a checkpoint if no
> > activity has happened, meaning that this is a safe setting to set for
> > environments which are idle for long periods.
> 
> OK, here is the interesting bit I just updated (I cut the diff a bit
> as the rest is just reformatting):
>  parameter is greater than zero, the server will switch to a new
>  segment file whenever this many seconds have elapsed since the last
>  segment file switch, and there has been any database activity,
> -including a single checkpoint.  (Increasing
> -checkpoint_timeout will reduce unnecessary
> -checkpoints on an idle system.)
> [...]
> +including a single checkpoint.  Checkpoints can however be skipped
> +if there is no database activity, making this parameter a safe
> +setting for environments which are idle for a long period of time.
> 
> > (I'm thinking embedded systems here).
> 
> (Those are most of my users :{) ).

Ok, (FWIW..,) it seems fine for me.

> On Fri, Nov 11, 2016 at 3:23 AM, David Steele  wrote:
> > On 11/10/16 1:03 PM, Stephen Frost wrote:
> >> Agreed. You certainly may wish to log checkpoints, even on an embedded
> >> or low I/o system, but logging that nothing is happening doesn't seem
> >> useful except perhaps for debugging.
> >
> > Sure, DEBUG1 or DEBUG2 makes sense.
> 
> OK. LOG was useful to avoid noise when debugging the thing, but DEBUG1
> is fine for me as well in the final version.

Agreed. DEBUG2 seems too deep for it.

Well, I think we had the final comment and it has been addressd
so I mark this as ready for committer soon.

Thank you all.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-10 Thread Michael Paquier
On Fri, Nov 11, 2016 at 12:28 AM, Stephen Frost  wrote:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> Thanks for the review! Waiting for a couple of days more is fine for
>> me. This won't change much. Attached is v15 with the fixes you
>> mentioned.
>
> I figured I'd go ahead and start looking into this (and it's pretty easy
> for me to discuss it with David, given he works in the same office ;).

Thanks!

> A couple initial comments:
>
>> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
>> index adab2f8..38c2385 100644
>> --- a/doc/src/sgml/config.sgml
>> +++ b/doc/src/sgml/config.sgml
>> @@ -2826,12 +2826,9 @@ include_dir 'conf.d'
>>  parameter is greater than zero, the server will switch to a new
>>  segment file whenever this many seconds have elapsed since the last
>>  segment file switch, and there has been any database activity,
>> -including a single checkpoint.  (Increasing
>> -checkpoint_timeout will reduce unnecessary
>> -checkpoints on an idle system.)
>> -Note that archived files that are closed early
>> -due to a forced switch are still the same length as completely full
>> -files.  Therefore, it is unwise to use a very short
>> +including a single checkpoint.  Note that archived files that are
>> +closed early due to a forced switch are still the same length as
>> +completely full files.  Therefore, it is unwise to use a very short
>>  archive_timeout  it will bloat your archive
>>  storage.  archive_timeout settings of a minute or so are
>>  usually reasonable.  You should consider using streaming 
>> replication,
>
> We should probably include in here that we may skip a checkpoint if no
> activity has happened, meaning that this is a safe setting to set for
> environments which are idle for long periods.

OK, here is the interesting bit I just updated (I cut the diff a bit
as the rest is just reformatting):
 parameter is greater than zero, the server will switch to a new
 segment file whenever this many seconds have elapsed since the last
 segment file switch, and there has been any database activity,
-including a single checkpoint.  (Increasing
-checkpoint_timeout will reduce unnecessary
-checkpoints on an idle system.)
[...]
+including a single checkpoint.  Checkpoints can however be skipped
+if there is no database activity, making this parameter a safe
+setting for environments which are idle for a long period of time.

> (I'm thinking embedded systems here).

(Those are most of my users :{) ).

On Fri, Nov 11, 2016 at 3:23 AM, David Steele  wrote:
> On 11/10/16 1:03 PM, Stephen Frost wrote:
>> Agreed. You certainly may wish to log checkpoints, even on an embedded
>> or low I/o system, but logging that nothing is happening doesn't seem
>> useful except perhaps for debugging.
>
> Sure, DEBUG1 or DEBUG2 makes sense.

OK. LOG was useful to avoid noise when debugging the thing, but DEBUG1
is fine for me as well in the final version.
-- 
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index adab2f8..d2a8ec2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2826,17 +2826,16 @@ include_dir 'conf.d'
 parameter is greater than zero, the server will switch to a new
 segment file whenever this many seconds have elapsed since the last
 segment file switch, and there has been any database activity,
-including a single checkpoint.  (Increasing
-checkpoint_timeout will reduce unnecessary
-checkpoints on an idle system.)
-Note that archived files that are closed early
-due to a forced switch are still the same length as completely full
-files.  Therefore, it is unwise to use a very short
-archive_timeout  it will bloat your archive
-storage.  archive_timeout settings of a minute or so are
-usually reasonable.  You should consider using streaming replication,
-instead of archiving, if you want data to be copied off the master
-server more quickly than that.
+including a single checkpoint.  Checkpoints can however be skipped
+if there is no database activity, making this parameter a safe
+setting for environments which are idle for a long period of time.
+Note that archived files that are closed early due to a forced switch
+are still the same length as completely full files.  Therefore, it is
+unwise to use a very short archive_timeout  it will
+bloat your archive storage.  archive_timeout settings of
+a minute or so are usually reasonable.  You should consider using
+streaming replication, instead of archiving, if you want data to
+be copied off the master server more quickly than that.
 This 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-10 Thread David Steele
On 11/10/16 1:03 PM, Stephen Frost wrote:
> On Thursday, November 10, 2016, Joshua D. Drake  > wrote:
> 
> On 11/10/2016 09:33 AM, David Steele wrote:
> 
> On 11/10/16 10:28 AM, Stephen Frost wrote:
> 
> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/transam/xlog.c
> 
> [...]
> 
> +   if (log_checkpoints)
> +   ereport(LOG,
> (errmsg("checkpoint skipped")));
> 
> 
> Do we really need to log that we're skipping a
> checkpoint..?  As the
> point of this is to avoid write activity on a system which
> is idle, it
> doesn't make sense to me to add a new cause for writes to
> happen when
> we're idle.
> 
> 
> log_checkpoints is not enabled by default, though, so if the
> user does
> enable it don't you think they would want to know when checkpoints
> *don't* happen?
> 
> 
> Yes but I don't know that it needs to be anywhere below DEBUG2 (vs
> log_checkpoints).
> 
> 
> Agreed. You certainly may wish to log checkpoints, even on an embedded
> or low I/o system, but logging that nothing is happening doesn't seem
> useful except perhaps for debugging. 

Sure, DEBUG1 or DEBUG2 makes sense.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-10 Thread Stephen Frost
On Thursday, November 10, 2016, Joshua D. Drake 
wrote:

> On 11/10/2016 09:33 AM, David Steele wrote:
>
>> On 11/10/16 10:28 AM, Stephen Frost wrote:
>>
>> diff --git a/src/backend/access/transam/xlog.c
 b/src/backend/access/transam/xlog.c

>>> [...]
>>>
 +   if (log_checkpoints)
 +   ereport(LOG, (errmsg("checkpoint
 skipped")));

>>>
>>> Do we really need to log that we're skipping a checkpoint..?  As the
>>> point of this is to avoid write activity on a system which is idle, it
>>> doesn't make sense to me to add a new cause for writes to happen when
>>> we're idle.
>>>
>>
>> log_checkpoints is not enabled by default, though, so if the user does
>> enable it don't you think they would want to know when checkpoints
>> *don't* happen?
>>
>
> Yes but I don't know that it needs to be anywhere below DEBUG2 (vs
> log_checkpoints).
>

Agreed. You certainly may wish to log checkpoints, even on an embedded or
low I/o system, but logging that nothing is happening doesn't seem useful
except perhaps for debugging.

Thanks!

Stephen


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-10 Thread Joshua D. Drake

On 11/10/2016 09:33 AM, David Steele wrote:

On 11/10/16 10:28 AM, Stephen Frost wrote:


diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c

[...]

+   if (log_checkpoints)
+   ereport(LOG, (errmsg("checkpoint skipped")));


Do we really need to log that we're skipping a checkpoint..?  As the
point of this is to avoid write activity on a system which is idle, it
doesn't make sense to me to add a new cause for writes to happen when
we're idle.


log_checkpoints is not enabled by default, though, so if the user does
enable it don't you think they would want to know when checkpoints
*don't* happen?


Yes but I don't know that it needs to be anywhere below DEBUG2 (vs 
log_checkpoints).


Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-10 Thread David Steele
On 11/10/16 10:28 AM, Stephen Frost wrote:

>> diff --git a/src/backend/access/transam/xlog.c 
>> b/src/backend/access/transam/xlog.c
> [...]
>> +if (log_checkpoints)
>> +ereport(LOG, (errmsg("checkpoint skipped")));
> 
> Do we really need to log that we're skipping a checkpoint..?  As the
> point of this is to avoid write activity on a system which is idle, it
> doesn't make sense to me to add a new cause for writes to happen when
> we're idle.

log_checkpoints is not enabled by default, though, so if the user does
enable it don't you think they would want to know when checkpoints
*don't* happen?

Or are you thinking the main use of this logging is to determine when
checkpoints are too close together and so skipped checkpoints aren't
very important?

Thanks,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-10 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> Thanks for the review! Waiting for a couple of days more is fine for
> me. This won't change much. Attached is v15 with the fixes you
> mentioned.

I figured I'd go ahead and start looking into this (and it's pretty easy
for me to discuss it with David, given he works in the same office ;).

A couple initial comments:

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index adab2f8..38c2385 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2826,12 +2826,9 @@ include_dir 'conf.d'
>  parameter is greater than zero, the server will switch to a new
>  segment file whenever this many seconds have elapsed since the last
>  segment file switch, and there has been any database activity,
> -including a single checkpoint.  (Increasing
> -checkpoint_timeout will reduce unnecessary
> -checkpoints on an idle system.)
> -Note that archived files that are closed early
> -due to a forced switch are still the same length as completely full
> -files.  Therefore, it is unwise to use a very short
> +including a single checkpoint.  Note that archived files that are
> +closed early due to a forced switch are still the same length as
> +completely full files.  Therefore, it is unwise to use a very short
>  archive_timeout  it will bloat your archive
>  storage.  archive_timeout settings of a minute or so are
>  usually reasonable.  You should consider using streaming replication,

We should probably include in here that we may skip a checkpoint if no
activity has happened, meaning that this is a safe setting to set for
environments which are idle for long periods (I'm thinking embedded
systems here).

> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
[...]
> + if (log_checkpoints)
> + ereport(LOG, (errmsg("checkpoint skipped")));

Do we really need to log that we're skipping a checkpoint..?  As the
point of this is to avoid write activity on a system which is idle, it
doesn't make sense to me to add a new cause for writes to happen when
we're idle.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-08 Thread Michael Paquier
On Tue, Nov 8, 2016 at 9:32 PM, David Steele  wrote:
> I had a bit of trouble parsing this paragraph:
>
> [...]
>
> So I did a little reworking:
>
> [...]
>
> If that still says what you think it should, then I believe it is clearer.

Thanks! I have included your suggestion.

> Also:
>
> +* last time a segment has switched because of a timeout.
> Segment
> +* switching because of other reasons, like manual
> trigerring of
>
> typo, should be "triggering".

Right.

> I don't see any further issues with this patch unless there are performance
> concerns about the locks taken in GetProgressRecPtr().  The locks seem
> reasonable to me but I'd like to see this committed so there's plenty of
> time to detect any regression before 10.0.
>
> As such, my vote is to mark this "Ready for Committer."  I'm fine with
> waiting a few days as Kyotaro suggested, or we can consider my review
> "additional comments" and do it now.

Thanks for the review! Waiting for a couple of days more is fine for
me. This won't change much. Attached is v15 with the fixes you
mentioned.
-- 
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index adab2f8..38c2385 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2826,12 +2826,9 @@ include_dir 'conf.d'
 parameter is greater than zero, the server will switch to a new
 segment file whenever this many seconds have elapsed since the last
 segment file switch, and there has been any database activity,
-including a single checkpoint.  (Increasing
-checkpoint_timeout will reduce unnecessary
-checkpoints on an idle system.)
-Note that archived files that are closed early
-due to a forced switch are still the same length as completely full
-files.  Therefore, it is unwise to use a very short
+including a single checkpoint.  Note that archived files that are
+closed early due to a forced switch are still the same length as
+completely full files.  Therefore, it is unwise to use a very short
 archive_timeout  it will bloat your archive
 storage.  archive_timeout settings of a minute or so are
 usually reasonable.  You should consider using streaming replication,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b019bc1..ac40731 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2507,7 +2507,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
heaptup->t_len - 
SizeofHeapTupleHeader);
 
/* filtering by origin on a row level is much more efficient */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
recptr = XLogInsert(RM_HEAP_ID, info);
 
@@ -2846,7 +2846,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, 
int ntuples,
XLogRegisterBufData(0, tupledata, totaldatalen);
 
/* filtering by origin on a row level is much more 
efficient */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
recptr = XLogInsert(RM_HEAP2_ID, info);
 
@@ -3308,7 +3308,7 @@ l1:
}
 
/* filtering by origin on a row level is much more efficient */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE);
 
@@ -6035,7 +6035,7 @@ heap_finish_speculative(Relation relation, HeapTuple 
tuple)
XLogBeginInsert();
 
/* We want the same filtering on this as on a plain insert */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
XLogRegisterData((char *) , SizeOfHeapConfirm);
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
@@ -7703,7 +7703,7 @@ log_heap_update(Relation reln, Buffer oldbuf,
}
 
/* filtering by origin on a row level is much more efficient */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
recptr = XLogInsert(RM_HEAP_ID, info);
 
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index e11b229..9130816 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5232,7 +5232,7 @@ XactLogCommitRecord(TimestampTz commit_time,
XLogRegisterData((char *) (_origin), sizeof(xl_xact_origin));
 
/* we allow filtering by xacts */
-   XLogIncludeOrigin();
+   XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
return XLogInsert(RM_XACT_ID, info);
 }
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 6cec027..37ecf9c 100644
--- a/src/backend/access/transam/xlog.c
+++ 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-08 Thread David Steele

On 10/5/16 7:18 AM, Michael Paquier wrote:


Note: I am moving this patch to next CF.


And I am back on it more seriously... And I am taking back what I said upthread.

I looked at the v12 that Horiguchi-san has written, and that seems
correct to me. So I have squashed everything into a single patch,
including the log entry that gets logged with log_checkpoints. Testing
with archive_timeout to 10s, checkpoint_timeout to 30s, sometimes
triggering manual activity with CREATE TABLE/whatever and manual
pg_switch_xlog(), I am able to see that checkpoints can be correctly
skipped or generated.

There was as well a compilation error with ereport(). Not sure how I
missed that... Likely too many patches handled these days.

I have also updated the description of archive_timeout that increasing
checkpoint_timeout would reduce unnecessary checkpoints on a idle
system. With this patch, on an idle system checkpoints are just
skipped as they should.

How does that look?


This looks much better now and exhibits exactly the behavior that I 
expect.


In theory it would be nice if the checkpoint records did not cause 
rotation, but this can be mitigated in the way you have described and 
perhaps for safety's sake it's best.


I had a bit of trouble parsing this paragraph:

+   /*
+* Update the progress LSN positions. At least one WAL insertion lock
+* is already taken appropriately before doing that, and it is just more
+* simple to do that here where WAL record data and type is at hand.
+* The progress is set at the start position of the record tracked that
+* is being added, making easier checkpoint progress tracking as the
+* control file already saves the start LSN position of the last
+* checkpoint run. If an exclusive lock is taken for WAL insertion,
+* there is actually no need to update all the progression fields, so

So I did a little reworking:

Update the LSN progress positions. At least one WAL insertion lock is 
already taken appropriately before doing that, and it is simpler to do 
that here when the WAL record data and type are at hand. Progress is set 
at the start position of the tracked record that is being added, making 
checkpoint progress tracking easier as the control file already saves 
the start LSN position of the last checkpoint. If an exclusive lock is 
taken for WAL insertion there is no need to update all the progress 
fields, only the first one.


If that still says what you think it should, then I believe it is 
clearer.  Also:


+* last time a segment has switched because of a timeout. 
Segment
+* switching because of other reasons, like manual trigerring of

typo, should be "triggering".

I don't see any further issues with this patch unless there are 
performance concerns about the locks taken in GetProgressRecPtr().  The 
locks seem reasonable to me but I'd like to see this committed so 
there's plenty of time to detect any regression before 10.0.


As such, my vote is to mark this "Ready for Committer."  I'm fine with 
waiting a few days as Kyotaro suggested, or we can consider my review 
"additional comments" and do it now.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-08 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 8 Nov 2016 14:45:59 +0900, Michael Paquier  
wrote in 
> On Tue, Nov 1, 2016 at 8:31 PM, Kyotaro HORIGUCHI
>  wrote:
> > Could you let me struggle a bit more to avoid LWLocks in
> > GetProgressRecPtr?
> 
> Be my guest :)
> 
> > I considered two alternatives for updating logic of progressAt
> > more seriously. One is, as Amit suggested, replacing progressAt
> > within the SpinLock section in
> > ReserverXLogInsertLocation. Another is using pg_atomic_u64 for
> > progressAt. The attached two patches rouhgly implement the aboves
> > respectively. (But I've not tested them. The patches are to show
> > the two alternatives concretely.)
> 
> Okay.
> 
> > I found that the former requires to take insertpos_lck also on
> > reading. I have to admit that this is too bad. (Even I saw no
> > degradation by pgbench on my poor environment. It marks 118tr/s
> > by 10 threads and that doesn't seem make any stress on xlog
> > logics...)
> 
> Interesting...
> 
> > For the latter, it is free from locks and doesn't reduce parallel
> > degree but I'm not sure it is proper to use it there and I'm not
> > sure about actual overheads.  In the worst case, it makes another
> > SpinLock section for every call to pg_atmoic_* functions.
> 
> The WAL insert slots have been introduced in 9.4, and the PG atomics
> in 9.5, so perhaps the first implementation of the WAL insert slots
> would have used it. Still that looks quite promising. At the same time
> we may be able to do something for insertingAt to make the locks held
> more consistent, and just remove WALInsertLocks, even if this makes me
> wonder about torn reads and about how we may break things if we rely

If I understand you correctly, atomics prevents torn reads by its
definition on cache management and bus arbitration level. It
should be faster than LWlocks but as I said in the previous mail,
on some platforms, if any, it will fallbacks to individual
spinlocks. (atomics.c)

> on something else than LW_EXCLUSIVE compared to now. To keep things
> more simple I' would still favor using WALInsertLocks for this patch,
> that looks more consistent, and also because there is no noticeable
> difference.

Ok, the patch looks fine. So there's nothing for me but to accept
the current shape since the discussion about performance seems
not to be settled with out performance measurement with machines
with many cores.

> > All except the above point looks good for me. Maybe it is better
> > that XLOG_INCLUDE_ORIGIN stuff is in a separate patch.
> 
> I have kept that grouped bas XLOG_INCLUDE_ORIGIN is the only
> XLogInsert flag present on HEAD. Could the patch be marked as "ready
> for committer" then?

Ok, that is not so siginificant point. Well, I'd like to wait for
a couple of days for anyone wants to comment, then mark this
'ready for committer'.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-07 Thread Michael Paquier
On Tue, Nov 1, 2016 at 8:31 PM, Kyotaro HORIGUCHI
 wrote:
> Could you let me struggle a bit more to avoid LWLocks in
> GetProgressRecPtr?

Be my guest :)

> I considered two alternatives for updating logic of progressAt
> more seriously. One is, as Amit suggested, replacing progressAt
> within the SpinLock section in
> ReserverXLogInsertLocation. Another is using pg_atomic_u64 for
> progressAt. The attached two patches rouhgly implement the aboves
> respectively. (But I've not tested them. The patches are to show
> the two alternatives concretely.)

Okay.

> I found that the former requires to take insertpos_lck also on
> reading. I have to admit that this is too bad. (Even I saw no
> degradation by pgbench on my poor environment. It marks 118tr/s
> by 10 threads and that doesn't seem make any stress on xlog
> logics...)

Interesting...

> For the latter, it is free from locks and doesn't reduce parallel
> degree but I'm not sure it is proper to use it there and I'm not
> sure about actual overheads.  In the worst case, it makes another
> SpinLock section for every call to pg_atmoic_* functions.

The WAL insert slots have been introduced in 9.4, and the PG atomics
in 9.5, so perhaps the first implementation of the WAL insert slots
would have used it. Still that looks quite promising. At the same time
we may be able to do something for insertingAt to make the locks held
more consistent, and just remove WALInsertLocks, even if this makes me
wonder about torn reads and about how we may break things if we rely
on something else than LW_EXCLUSIVE compared to now. To keep things
more simple I' would still favor using WALInsertLocks for this patch,
that looks more consistent, and also because there is no noticeable
difference.

> All except the above point looks good for me. Maybe it is better
> that XLOG_INCLUDE_ORIGIN stuff is in a separate patch.

I have kept that grouped bas XLOG_INCLUDE_ORIGIN is the only
XLogInsert flag present on HEAD. Could the patch be marked as "ready
for committer" then?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-01 Thread Kyotaro HORIGUCHI
Thanks for merging. It still applies on the current master with
some displacements.

At Wed, 5 Oct 2016 15:18:53 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-10-05 Thread Michael Paquier
(Squashing replies)

On Fri, Sep 30, 2016 at 6:13 PM, Michael Paquier
 wrote:
> On Fri, Sep 30, 2016 at 2:51 PM, Michael Paquier
>  wrote:
>> On Fri, Sep 30, 2016 at 2:05 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>>>  wrote in 
>>> <20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp>
 I don't see no problem in setting progressAt in XLogInsertRecord.
 But I doubt GetProgressRecPtr is harmful, especially when
>>>
>>> But I suspect that GetProgressRecPtr could be harmful.
>>
>> Well, you can maximize its effects by doing NUM_XLOGINSERT_LOCKS ==
>> nproc and reducing checkpoint_timeout. That's what I did but..
>
> Note: I am moving this patch to next CF.

And I am back on it more seriously... And I am taking back what I said upthread.

I looked at the v12 that Horiguchi-san has written, and that seems
correct to me. So I have squashed everything into a single patch,
including the log entry that gets logged with log_checkpoints. Testing
with archive_timeout to 10s, checkpoint_timeout to 30s, sometimes
triggering manual activity with CREATE TABLE/whatever and manual
pg_switch_xlog(), I am able to see that checkpoints can be correctly
skipped or generated.

There was as well a compilation error with ereport(). Not sure how I
missed that... Likely too many patches handled these days.

I have also updated the description of archive_timeout that increasing
checkpoint_timeout would reduce unnecessary checkpoints on a idle
system. With this patch, on an idle system checkpoints are just
skipped as they should.

How does that look?
-- 
Michael


hs-checkpoints-v14.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-30 Thread Michael Paquier
On Fri, Sep 30, 2016 at 2:51 PM, Michael Paquier
 wrote:
> On Fri, Sep 30, 2016 at 2:05 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>>  wrote in 
>> <20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp>
>>> I don't see no problem in setting progressAt in XLogInsertRecord.
>>> But I doubt GetProgressRecPtr is harmful, especially when
>>
>> But I suspect that GetProgressRecPtr could be harmful.
>
> Well, you can maximize its effects by doing NUM_XLOGINSERT_LOCKS ==
> nproc and reducing checkpoint_timeout. That's what I did but..

Note: I am moving this patch to next CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread Michael Paquier
On Fri, Sep 30, 2016 at 2:05 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp>
>> I don't see no problem in setting progressAt in XLogInsertRecord.
>> But I doubt GetProgressRecPtr is harmful, especially when
>
> But I suspect that GetProgressRecPtr could be harmful.

Well, you can maximize its effects by doing NUM_XLOGINSERT_LOCKS ==
nproc and reducing checkpoint_timeout. That's what I did but..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread Kyotaro HORIGUCHI
Sorry, it wrote wrong thing.

At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp>
> Sorry, I might have torn off this thread somehow..
> 
> At Thu, 29 Sep 2016 11:26:29 -0400, David Steele  wrote 
> in <30095aea-3910-dbb7-1790-a579fb93f...@pgmasters.net>
> > On 9/28/16 10:32 PM, Michael Paquier wrote:
> > > On Thu, Sep 29, 2016 at 7:45 AM, David Steele 
> > > wrote:
> > >>
> > >> In general I agree with the other comments that this could end up
> > >> being
> > >> a problem.  On the other hand, since the additional locks are only
> > >> taken
> > >> at checkpoint or archive_timeout it may not be that big a deal.
> > >
> > > Yes, I did some tests on my laptop a couple of months back, that has 4
> > > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
> > > contention and performing a bunch of INSERT using 4 clients on 4
> > > different relations I could not catch a difference.. Autovacuum was
> > > disabled to eliminate any noise. I tried checkpoint_segments at 30s to
> > > see its effects, as well as larger values to see the impact with the
> > > standby snapshot taken by the bgwriter. Other thoughts are welcome.
> > 
> > I don't have any better ideas than that.
> 
> I don't see no problem in setting progressAt in XLogInsertRecord.
> But I doubt GetProgressRecPtr is harmful, especially when

But I suspect that GetProgressRecPtr could be harmful.

> NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems
> rather alleviates the impact. But it actually doesn't seem so
> harmful up to 8. (Even though I don't like the locking in
> GetProgressRecPtr..)
> 
> Currently possiblly harmful calling of GetProgressRecPtr is that
> in BackgroundWriterMain. It should be called with ther interval
> BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't
> see the issuer of SIGUSR1 of bgwriter..
> 
> 
> > >> +++ b/src/backend/postmaster/checkpointer.c
> > >> +   /* OK, it's time to switch */
> > >> +   elog(LOG, "Request XLog Switch");
> > >>
> > >> LOG level seems a bit much here, perhaps DEBUG1?
> > >
> > > That's from Horiguchi-san's patch, and those would be definitely
> > > better as DEBUG1 by looking at it. Now and in order to keep things
> > > simple I think that we had better discard this patch for now. I was
> > > planning to come back to this thing anyway once we are done with the
> > > first problem.
> > 
> > I still see this:
> > 
> > +++ b/src/backend/postmaster/checkpointer.c
> > /* OK, it's time to switch */
> > +   elog(LOG, "Request XLog Switch");
> > 
> > > Well for now attached are two patches, that could just be squashed
> > > into one.
> 
> Mmmm. Sorry, this was for my quite private instant debug, spilt
> outside.. But I don't mind to leave it with DEBUG2 if it seems
> useful.
> 
> > Yes, I think that makes sense.
> > 
> > More importantly, there is a regression.  With your new patch the
> > xlogs are switching on archive_timeout again even with no changes.
> > The v12 worked fine.
> 
> As Michael mentioned in this or another thread, it is another
> issue that he wants to solve separately. I personally doubt that
> this patch (v11 and v13) can be evaluated alone without it, but
> we can review this with the excessive switching problem, perhaps?
> 
> > The differences are all in 0002-hs-checkpoints-v12-2.patch and as far
> > as I can see the patch does not work correctly without these changes.
> > Am I missing something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread Kyotaro HORIGUCHI
Sorry, I might have torn off this thread somehow..

At Thu, 29 Sep 2016 11:26:29 -0400, David Steele  wrote in 
<30095aea-3910-dbb7-1790-a579fb93f...@pgmasters.net>
> On 9/28/16 10:32 PM, Michael Paquier wrote:
> > On Thu, Sep 29, 2016 at 7:45 AM, David Steele 
> > wrote:
> >>
> >> In general I agree with the other comments that this could end up
> >> being
> >> a problem.  On the other hand, since the additional locks are only
> >> taken
> >> at checkpoint or archive_timeout it may not be that big a deal.
> >
> > Yes, I did some tests on my laptop a couple of months back, that has 4
> > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
> > contention and performing a bunch of INSERT using 4 clients on 4
> > different relations I could not catch a difference.. Autovacuum was
> > disabled to eliminate any noise. I tried checkpoint_segments at 30s to
> > see its effects, as well as larger values to see the impact with the
> > standby snapshot taken by the bgwriter. Other thoughts are welcome.
> 
> I don't have any better ideas than that.

I don't see no problem in setting progressAt in XLogInsertRecord.
But I doubt GetProgressRecPtr is harmful, especially when
NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems
rather alleviates the impact. But it actually doesn't seem so
harmful up to 8. (Even though I don't like the locking in
GetProgressRecPtr..)

Currently possiblly harmful calling of GetProgressRecPtr is that
in BackgroundWriterMain. It should be called with ther interval
BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't
see the issuer of SIGUSR1 of bgwriter..


> >> +++ b/src/backend/postmaster/checkpointer.c
> >> +   /* OK, it's time to switch */
> >> +   elog(LOG, "Request XLog Switch");
> >>
> >> LOG level seems a bit much here, perhaps DEBUG1?
> >
> > That's from Horiguchi-san's patch, and those would be definitely
> > better as DEBUG1 by looking at it. Now and in order to keep things
> > simple I think that we had better discard this patch for now. I was
> > planning to come back to this thing anyway once we are done with the
> > first problem.
> 
> I still see this:
> 
> +++ b/src/backend/postmaster/checkpointer.c
>   /* OK, it's time to switch */
> + elog(LOG, "Request XLog Switch");
> 
> > Well for now attached are two patches, that could just be squashed
> > into one.

Mmmm. Sorry, this was for my quite private instant debug, spilt
outside.. But I don't mind to leave it with DEBUG2 if it seems
useful.

> Yes, I think that makes sense.
> 
> More importantly, there is a regression.  With your new patch the
> xlogs are switching on archive_timeout again even with no changes.
> The v12 worked fine.

As Michael mentioned in this or another thread, it is another
issue that he wants to solve separately. I personally doubt that
this patch (v11 and v13) can be evaluated alone without it, but
we can review this with the excessive switching problem, perhaps?

> The differences are all in 0002-hs-checkpoints-v12-2.patch and as far
> as I can see the patch does not work correctly without these changes.
> Am I missing something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread David Steele

On 9/28/16 10:32 PM, Michael Paquier wrote:

On Thu, Sep 29, 2016 at 7:45 AM, David Steele  wrote:


In general I agree with the other comments that this could end up being
a problem.  On the other hand, since the additional locks are only taken
at checkpoint or archive_timeout it may not be that big a deal.


Yes, I did some tests on my laptop a couple of months back, that has 4
cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
contention and performing a bunch of INSERT using 4 clients on 4
different relations I could not catch a difference.. Autovacuum was
disabled to eliminate any noise. I tried checkpoint_segments at 30s to
see its effects, as well as larger values to see the impact with the
standby snapshot taken by the bgwriter. Other thoughts are welcome.


I don't have any better ideas than that.


+++ b/src/backend/postmaster/checkpointer.c
+   /* OK, it's time to switch */
+   elog(LOG, "Request XLog Switch");

LOG level seems a bit much here, perhaps DEBUG1?


That's from Horiguchi-san's patch, and those would be definitely
better as DEBUG1 by looking at it. Now and in order to keep things
simple I think that we had better discard this patch for now. I was
planning to come back to this thing anyway once we are done with the
first problem.


I still see this:

+++ b/src/backend/postmaster/checkpointer.c
/* OK, it's time to switch */
+   elog(LOG, "Request XLog Switch");


Well for now attached are two patches, that could just be squashed into one.


Yes, I think that makes sense.

More importantly, there is a regression.  With your new patch the xlogs 
are switching on archive_timeout again even with no changes.  The v12 
worked fine.


The differences are all in 0002-hs-checkpoints-v12-2.patch and as far as 
I can see the patch does not work correctly without these changes.  Am I 
missing something?


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-28 Thread Michael Paquier
On Thu, Sep 29, 2016 at 7:45 AM, David Steele  wrote:
> OK, I've done functional testing and this patch seems to work as
> specified (including the caveat noted above).  Some comments:

Thanks!

> * [PATCH 1/3] hs-checkpoints-v12-1
>
> +++ b/src/backend/access/transam/xlog.c
> +* Taking a lock is as well necessary to prevent potential torn reads
> +* on some platforms.
>
> How about, "Taking a lock is also necessary..."
>
> +   LWLockAcquire([i].l.lock, LW_EXCLUSIVE);
>
> That's a lot of exclusive locks and that would seem to have performance
> implications.  It seems to me this is going to be a hard one to
> benchmark because the regression (if any) would only be seen under heavy
> load on a very large system.
>
> In general I agree with the other comments that this could end up being
> a problem.  On the other hand, since the additional locks are only taken
> at checkpoint or archive_timeout it may not be that big a deal.

Yes, I did some tests on my laptop a couple of months back, that has 4
cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
contention and performing a bunch of INSERT using 4 clients on 4
different relations I could not catch a difference.. Autovacuum was
disabled to eliminate any noise. I tried checkpoint_segments at 30s to
see its effects, as well as larger values to see the impact with the
standby snapshot taken by the bgwriter. Other thoughts are welcome.

> +++ b/src/backend/access/transam/xloginsert.c * Should this record
> include the replication origin if one is set up?
>
> Outdated comment from XLogIncludeOrigin().

Fixed. I added as well some comments on top of XLogSetFlags to mention
what are the flags that can be used. I didn't think that it was
necessary to add an assertion here. Also, I noticed that the comment
on top of XLogInsertRecord mentioned those flags but was incorrect.

> * [PATCH 2/3] hs-checkpoints-v12-2
>
> +++ b/src/backend/postmaster/checkpointer.c
> +   /* OK, it's time to switch */
> +   elog(LOG, "Request XLog Switch");
>
> LOG level seems a bit much here, perhaps DEBUG1?

That's from Horiguchi-san's patch, and those would be definitely
better as DEBUG1 by looking at it. Now and in order to keep things
simple I think that we had better discard this patch for now. I was
planning to come back to this thing anyway once we are done with the
first problem.

> * [PATCH 3/3] hs-checkpoints-v12-3
>
> +* switch segment only when any substantial progress have 
> made from
> +* reasons will cause last_xlog_switch_lsn stay behind but it 
> doesn't
>
> How about, "Switch segment only when substantial progress has been made
> after the last segment was switched by a timeout.  Segment switching for
> other reasons..."
>
> +++ b/src/backend/access/transam/xlog.c
> +   elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn 
> %X/%X,
> ckpt %X/%X",
> +   elog(LOG, "Checkpoint is skipped");
> +   elog(LOG, "snapshot taken by checkpoint %X/%X",
>
> Same for the above, seems like it would just be noise for most users.
>
> +++ b/src/backend/postmaster/bgwriter.c
> +   elog(LOG, "snapshot taken by bgwriter %X/%X",
>
> Ditto.

The original patch was developed to ease debugging, and I chose LOG to
not be polluted with a bunch of DEBUG1 entries :)

Now we can do something, as follows:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8450,6 +8450,8 @@ CreateCheckPoint(int flags)
{
if (progress_lsn == ControlFile->checkPoint)
{
+   if (log_checkpoints)
+   ereport(LOG, "checkpoint skipped");
WALInsertLockRelease();
LWLockRelease(CheckpointLock);
END_CRIT_SECTION();
Letting users know that the checkpoint has been skipped sounds like a
good idea. Perhaps that's better if squashed with the first patch.

> I don't see any unintended consequences in this patch but it doesn't
> mean there aren't any.  I'm definitely concerned by the exclusive locks
> but it may turn out they do not actually represent a bottleneck.

That's a hard to see a difference. Perhaps I didn't try hard enough..

Well for now attached are two patches, that could just be squashed into one.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f95fdb8..e87caa6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8450,6 +8450,8 @@ CreateCheckPoint(int flags)
 	{
 		if (progress_lsn == ControlFile->checkPoint)
 		{
+			if (log_checkpoints)
+ereport(LOG, "checkpoint skipped");
 			WALInsertLockRelease();
 			LWLockRelease(CheckpointLock);
 			END_CRIT_SECTION();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b019bc1..ac40731 100644
--- a/src/backend/access/heap/heapam.c
+++ 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-28 Thread David Steele
On 9/28/16 3:35 AM, Michael Paquier wrote:
> On Wed, Sep 28, 2016 at 6:12 AM, David Steele  wrote:
>> I tried the attached patch set and noticed an interesting behavior. With
>> archive_timeout=5 whenever I made a change I would get a WAL segment within
>> a few seconds as expected then another one would follow a few minutes later.
> 
> That's intentional. We may be able to make XLOG_SWITCH records as not
> updating the progress LSN, but I wanted to tackle that as a separate
> patch once we got the basics done correctly, which is still what I
> think this patch is doing. I should have been more precise upthread:
> this patch makes the handling of checkpoint skip logic correct for
> only standby snapshots, not segment switches, and puts the infra to
> handle other things.

OK, I've done functional testing and this patch seems to work as
specified (including the caveat noted above).  Some comments:

* [PATCH 1/3] hs-checkpoints-v12-1

+++ b/src/backend/access/transam/xlog.c
+* Taking a lock is as well necessary to prevent potential torn reads
+* on some platforms.

How about, "Taking a lock is also necessary..."

+   LWLockAcquire([i].l.lock, LW_EXCLUSIVE);

That's a lot of exclusive locks and that would seem to have performance
implications.  It seems to me this is going to be a hard one to
benchmark because the regression (if any) would only be seen under heavy
load on a very large system.

In general I agree with the other comments that this could end up being
a problem.  On the other hand, since the additional locks are only taken
at checkpoint or archive_timeout it may not be that big a deal.

+++ b/src/backend/access/transam/xloginsert.c * Should this record
include the replication origin if one is set up?

Outdated comment from XLogIncludeOrigin().

* [PATCH 2/3] hs-checkpoints-v12-2

+++ b/src/backend/postmaster/checkpointer.c
+   /* OK, it's time to switch */
+   elog(LOG, "Request XLog Switch");

LOG level seems a bit much here, perhaps DEBUG1?

* [PATCH 3/3] hs-checkpoints-v12-3

+* switch segment only when any substantial progress have made 
from
+* reasons will cause last_xlog_switch_lsn stay behind but it 
doesn't

How about, "Switch segment only when substantial progress has been made
after the last segment was switched by a timeout.  Segment switching for
other reasons..."

+++ b/src/backend/access/transam/xlog.c
+   elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn 
%X/%X,
ckpt %X/%X",
+   elog(LOG, "Checkpoint is skipped");
+   elog(LOG, "snapshot taken by checkpoint %X/%X",

Same for the above, seems like it would just be noise for most users.

+++ b/src/backend/postmaster/bgwriter.c
+   elog(LOG, "snapshot taken by bgwriter %X/%X",

Ditto.

I don't see any unintended consequences in this patch but it doesn't
mean there aren't any.  I'm definitely concerned by the exclusive locks
but it may turn out they do not actually represent a bottleneck.

This does seem like the kind of patch that should get committed very
early in the release cycle to allow maximum time for regression testing.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-28 Thread Michael Paquier
On Wed, Sep 28, 2016 at 6:12 AM, David Steele  wrote:
> I tried the attached patch set and noticed an interesting behavior. With
> archive_timeout=5 whenever I made a change I would get a WAL segment within
> a few seconds as expected then another one would follow a few minutes later.

That's intentional. We may be able to make XLOG_SWITCH records as not
updating the progress LSN, but I wanted to tackle that as a separate
patch once we got the basics done correctly, which is still what I
think this patch is doing. I should have been more precise upthread:
this patch makes the handling of checkpoint skip logic correct for
only standby snapshots, not segment switches, and puts the infra to
handle other things.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-28 Thread Michael Paquier
On Tue, Sep 27, 2016 at 7:16 PM, Kyotaro HORIGUCHI
 wrote:
> I apologize in advance that the comments in this message might
> one of the ideas discarded in the past thread.. I might not grasp
> the discussion completely X(

No problem.

> At Wed, 18 May 2016 17:57:49 -0400, Michael Paquier 
>  wrote in 
> 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-27 Thread David Steele

On 9/27/16 6:16 AM, Kyotaro HORIGUCHI wrote:


I apologize in advance that the comments in this message might
one of the ideas discarded in the past thread.. I might not grasp
the discussion completely X(

The attached patches are rebased to the master and additional one
mentioned below.


I tried the attached patch set and noticed an interesting behavior. 
With archive_timeout=5 whenever I made a change I would get a WAL 
segment within a few seconds as expected then another one would follow a 
few minutes later.


Database init:
16M Sep 27 20:05 00010001
16M Sep 27 20:09 00010002

Create test table:
16M Sep 27 20:13 00010003
16M Sep 27 20:15 00010004

Insert row into test table:
16M Sep 27 20:46 00010005
16M Sep 27 20:49 00010006

The cluster was completely idle with no sessions connected in between 
those three commands.  Is it possible this is caused by:


+* switch segment only when any substantial progress have made 
from
+* the last segment switching by timeout. Segment switching by 
other
+* reasons will cause last_xlog_switch_lsn stay behind but it 
doesn't
+* matter since it just causes possible one excessive segment 
switch.
 */

I would like to give Michael a chance to respond to the updated patches 
before delving deeper.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-27 Thread Kyotaro HORIGUCHI
Hi,

I apologize in advance that the comments in this message might
one of the ideas discarded in the past thread.. I might not grasp
the discussion completely X(

The attached patches are rebased to the master and additional one
mentioned below.

At Wed, 18 May 2016 17:57:49 -0400, Michael Paquier  
wrote in 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-27 Thread Michael Paquier
On Thu, May 19, 2016 at 6:57 AM, Michael Paquier
 wrote:
> I am adding that to the commit fest of September.

And a lot of activity has happened here since. Attached are refreshed
patches based on da6c4f6. v11 still applies correctly but it's always
better to avoid hunks when applying them.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 289d240..0fd2e2b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8448,8 +8448,12 @@ CreateCheckPoint(int flags)
 	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
 	{
+		elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt %X/%X",
+			 (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
+			 (uint32) (ControlFile->checkPoint >> 32), (uint32) ControlFile->checkPoint);
 		if (progress_lsn == ControlFile->checkPoint)
 		{
+			elog(LOG, "Checkpoint is skipped");
 			WALInsertLockRelease();
 			LWLockRelease(CheckpointLock);
 			END_CRIT_SECTION();
@@ -8616,7 +8620,11 @@ CreateCheckPoint(int flags)
 	 * recovery we don't need to write running xact data.
 	 */
 	if (!shutdown && XLogStandbyInfoActive())
-		LogStandbySnapshot();
+	{
+		XLogRecPtr lsn = LogStandbySnapshot();
+		elog(LOG, "snapshot taken by checkpoint %X/%X",
+			 (uint32) (lsn >> 32), (uint32) lsn);
+	}
 
 	START_CRIT_SECTION();
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 3a791eb..7637a1d 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -331,7 +331,9 @@ BackgroundWriterMain(void)
 GetLastCheckpointRecPtr() < current_progress_lsn &&
 last_progress_lsn < current_progress_lsn)
 			{
-(void) LogStandbySnapshot();
+XLogRecPtr lsn = LogStandbySnapshot();
+elog(LOG, "snapshot taken by bgwriter %X/%X",
+	 (uint32) (lsn >> 32), (uint32) lsn);
 last_snapshot_ts = now;
 last_progress_lsn = current_progress_lsn;
 			}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b019bc1..ac40731 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2507,7 +2507,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			heaptup->t_len - SizeofHeapTupleHeader);
 
 		/* filtering by origin on a row level is much more efficient */
-		XLogIncludeOrigin();
+		XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
 		recptr = XLogInsert(RM_HEAP_ID, info);
 
@@ -2846,7 +2846,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
 			/* filtering by origin on a row level is much more efficient */
-			XLogIncludeOrigin();
+			XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
@@ -3308,7 +3308,7 @@ l1:
 		}
 
 		/* filtering by origin on a row level is much more efficient */
-		XLogIncludeOrigin();
+		XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE);
 
@@ -6035,7 +6035,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 		XLogBeginInsert();
 
 		/* We want the same filtering on this as on a plain insert */
-		XLogIncludeOrigin();
+		XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
 		XLogRegisterData((char *) , SizeOfHeapConfirm);
 		XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
@@ -7703,7 +7703,7 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	}
 
 	/* filtering by origin on a row level is much more efficient */
-	XLogIncludeOrigin();
+	XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
 	recptr = XLogInsert(RM_HEAP_ID, info);
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e11b229..9130816 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5232,7 +5232,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 		XLogRegisterData((char *) (_origin), sizeof(xl_xact_origin));
 
 	/* we allow filtering by xacts */
-	XLogIncludeOrigin();
+	XLogSetFlags(XLOG_INCLUDE_ORIGIN);
 
 	return XLogInsert(RM_XACT_ID, info);
 }
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c1b9a97..289d240 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -442,11 +442,30 @@ typedef struct XLogwrtResult
  * the WAL record is just copied to the page and the lock is released. But
  * to avoid the deadlock-scenario explained above, the indicator is always
  * updated before sleeping while holding an insertion lock.
+ *
+ * The progressAt values indicate the insertion progress used to determine
+ * WAL insertion activity since a previous checkpoint, which is aimed at
+ * finding out if a checkpoint should be skipped or not or if standby
+ * activity should be logged. Progress position is basically updated
+ * for all types of records, for the time being only snapshot logging
+ * is out of this scope to properly 

[HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-05-18 Thread Michael Paquier
Hi all,

A couple of months back is has been reported to pgsql-bugs that WAL
segments were always switched with a low value of archive_timeout even
if a system is completely idle:
http://www.postgresql.org/message-id/20151016203031.3019.72...@wrigleys.postgresql.org
In short, a closer look at the problem has showed up that the logic in
charge of checking if a checkpoint should be skipped or not is
currently broken, because it completely ignores standby snapshots in
its calculation of the WAL activity. So a checkpoint always occurs
after checkpoint_timeout on an idle system since hot_standby has been
introduced as wal_level. This did not get better from 9.4, since
standby snapshots are logged every 15s by the background writer
process. In 9.6, since wal_level = 'archive' and 'hot_standby'
actually has the same meaning, the skip logic that worked with
wal_level = 'archive' does not do its job anymore.

One solution that has been discussed is to track the progress of WAL
activity when doing record insertion by being able to mark some
records as not updating the progress of WAL. Standby snapshot records
enter in this category, making the checkpoint skip logic more robust.

Attached is a patch implementing a solution for it, by adding in
WALInsertLock a new field that gets updated for each record to track
the LSN progress. This allows to reliably skip the generation of
standby snapshots in the bgwriter or checkpoints on an idle system.
Per discussion with Andres at PGcon, we decided that this is an
optimization, only for 9.7~ because this has been broken for a long
time. I have also changed XLogIncludeOrigin() to use a more generic
routine to set of status flags for a record being inserted:
XLogSetFlags(). This routine can use two flags:
- INCLUDE_ORIGIN to decide if the origin should be logged or not
- NO_PROGRESS to decide at insertion if a record should update the LSN
progress or not.
Andres mentioned me that we'd want to have something similar to
XLogIncludeOrigin, but while hacking I noticed that grouping both
things under the same umbrella made more sense.

I am adding that to the commit fest of September.

Regards,
-- 
Michael


hs-checkpoints-v11.patch
Description: invalid/octet-stream


hs-checkpoints-v11-2.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers