Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-04-20 Thread Michael Paquier
On Mon, Apr 4, 2016 at 3:22 PM, Michael Paquier
 wrote:
> The approach introducing the concept of WAL progress addresses the
> problem of unnecessary checkpoints and of unnecessary standby
> snapshots. If we take the problem only from the angle of RUNNING_XACTS
> the current logic used to check if a checkpoint should be skipped or
> not is not addressed.

This discussion has been stalling for a while regarding the main
patches... Attached are rebased versions based on the approach with
XLogInsertExtended(). And I have added as well a couple of days ago an
entry on the 9.6 open item page, not as a 9.6 open item, but as an
older bug to keep at least track of this problem.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d60c123..5ae6f49 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8357,8 +8357,12 @@ CreateCheckPoint(int flags)
 	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
 	{
+		elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt %X/%X",
+			 (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
+			 (uint32) (ControlFile->checkPoint >> 32), (uint32) ControlFile->checkPoint);
 		if (progress_lsn == ControlFile->checkPoint)
 		{
+			elog(LOG, "Checkpoint is skipped");
 			WALInsertLockRelease();
 			LWLockRelease(CheckpointLock);
 			END_CRIT_SECTION();
@@ -8525,7 +8529,11 @@ CreateCheckPoint(int flags)
 	 * recovery we don't need to write running xact data.
 	 */
 	if (!shutdown && XLogStandbyInfoActive())
-		LogStandbySnapshot();
+	{
+		XLogRecPtr lsn = LogStandbySnapshot();
+		elog(LOG, "snapshot taken by checkpoint %X/%X",
+			 (uint32) (lsn >> 32), (uint32) lsn);
+	}
 
 	START_CRIT_SECTION();
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 79cfd7b..082e589 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -333,7 +333,9 @@ BackgroundWriterMain(void)
 GetLastCheckpointRecPtr() < current_progress_lsn &&
 last_progress_lsn < current_progress_lsn)
 			{
-(void) LogStandbySnapshot();
+XLogRecPtr lsn = LogStandbySnapshot();
+elog(LOG, "snapshot taken by bgwriter %X/%X",
+	 (uint32) (lsn >> 32), (uint32) lsn);
 last_snapshot_ts = now;
 last_progress_lsn = current_progress_lsn;
 			}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9644db..d60c123 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -439,11 +439,30 @@ typedef struct XLogwrtResult
  * the WAL record is just copied to the page and the lock is released. But
  * to avoid the deadlock-scenario explained above, the indicator is always
  * updated before sleeping while holding an insertion lock.
+ *
+ * The progressAt values indicate the insertion progress used to determine
+ * WAL insertion activity since a previous checkpoint, which is aimed at
+ * finding out if a checkpoint should be skipped or not or if standby
+ * activity should be logged. Progress position is basically updated
+ * for all types of records, for the time being only snapshot logging
+ * is out of this scope to properly skip their logging on idle systems.
+ * Tracking the WAL activity directly in WALInsertLock has the advantage
+ * to not rely on taking an exclusive lock on all the WAL insertion locks,
+ * hence reducing the impact of the activity lookup. This takes also
+ * advantage to avoid 8-byte torn reads on some platforms by using the
+ * fact that each insert lock is located on the same cache line.
+ * XXX: There is still room for more improvements here, particularly
+ * WAL operations related to unlogged relations (INIT_FORKNUM) should not
+ * update the progress LSN as those relations are reset during crash
+ * recovery so enforcing buffers of such relations to be flushed for
+ * example in the case of a load only on unlogged relations is a waste
+ * of disk write.
  */
 typedef struct
 {
 	LWLock		lock;
 	XLogRecPtr	insertingAt;
+	XLogRecPtr	progressAt;
 } WALInsertLock;
 
 /*
@@ -881,6 +900,9 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
  * record is always inserted.
  *
+ * 'flags' gives more in-depth control on the record being inserted. As of
+ * now, this controls if the progress LSN positions are updated.
+ *
  * The first XLogRecData in the chain must be for the record header, and its
  * data must be MAXALIGNed.  XLogInsertRecord fills in the xl_prev and
  * xl_crc fields in the header, the rest of the header must already be filled
@@ -893,7 +915,9 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
  * WAL rule "write the log before the data".)
  */
 XLogRecPtr
-XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
+XLogInsertRecord(XLogRecData 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-04-04 Thread Michael Paquier
On Mon, Apr 4, 2016 at 6:41 AM, Simon Riggs  wrote:
> I'm not very excited about this patch. Too much code for so little benefit
> and fragile too.
>
> I'm not even sure what definition of "meaningful progress" is useful. If we
> commit this, a similar bug could be filed for many similar WAL record types.

Up to now, "progress" means the addition of WAL records by backends,
"meaningful" refers to records that impact the decision of skipping
checkpoints or not.

> Since we zero out WAL files specifically to allow them to be compressed in
> the archive, this patch saves a few bytes in the archive. Seems like we
> could fake up some WAL files if needed for restore with an external tool, if
> we really care.

This assumes that most deployments compress the segments when
archiving them for simplicity. I would believe that the reverse is
true, most of archive command processes do not compress them. And I am
pretty sure that some cloud deployments have as well pricing models
depending on the number of accesses done on an instance.

> Since we agree it wouldn't be backpatched, its pretty much saying we don't
> care.

Seeing the flow of this thread, that's not completely true. Andres has
been arguing for no backpatch, Amit and I not, because the checkpoint
skip logic is incorrect. The logic to decide is a checkpoint should be
skipped or not is still broken on back-branches for wal_level =
hot_standby. That's related to the former complain reported here.

> A promising approach would be to just skip logging the RUNNING_XACTS record
> if there are no xacts running, which is the case we care about here (no
> progress => no xacts). HS startup can only happen at a checkpoint, so if
> there is no WAL file there is no checkpoint, so we don't need the
> RUNNING_XACTS record to be written. That's a short patch.

I am not sure if that's actually simpler... Do you mean here that we
should save the latest transaction ID at the moment a snapshot is
taken and skip the next snapshot if TXID has not progressed? Say in
LogStandbySnapshot() when a snapshot is taken, we allocate the last
XID with ReadNewTransactionId() in XlogCtlData, then compare it the
next time a standby snapshot is taken. With such a logic, you need to
do something like that:
- Save the last transaction XID when RUNNING_XACTS has been logged in shmem
- Save in shmem as well if the previous RUNNING_XACTS has overflowed
its subxids or not. If it has overflowed, we should log RUNNING_XACTS
next time LogStandbySnapshot is logged to prevent the case of pending
snapshots at recovery.

AFAIK, standby snapshots with no xacts running are still useful at
recovery for two things to accelerate a hot standby initialization:
- If the previous snapshot overflowed its subxids, the next empty
snapshot can help out clear the situation. That's the case where the
bgwriter snapshot is helpful actually, because there is no need to
wait for the next checkpoint record to be replayed to initialize a hot
standby.
- An empty snapshot after checkpoint is mandatory, even if it is
empty, no? Knowing that, the next checkpoint cannot be skipped because
the logic mentioned above in xlog.c to decide if a checkpoint should
be skipped or not ignores the fact that a standby snapshot has been
logged. It bases its logic on the sole presence of a checkpoint
record, so this will never work. I mean this check of course:
if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
  CHECKPOINT_FORCE)) == 0)
{
if (prevPtr == ControlFile->checkPointCopy.redo &&
prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
{
WALInsertLockRelease();
LWLockRelease(CheckpointLock);
END_CRIT_SECTION();
return;
}
}
The approach introducing the concept of WAL progress addresses the
problem of unnecessary checkpoints and of unnecessary standby
snapshots. If we take the problem only from the angle of RUNNING_XACTS
the current logic used to check if a checkpoint should be skipped or
not is not addressed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-04-03 Thread Simon Riggs
On 3 April 2016 at 21:32, Simon Riggs  wrote:

> On 14 March 2016 at 17:46, David Steele  wrote:
>
>> On 2/24/16 12:40 AM, Michael Paquier wrote:
>>
>> This has the merit to be clear, thanks for the input. Whatever the
>>> approach taken at the end we have two candidates:
>>> - Extend XLogInsert() with an extra argument for flags (Andres)
>>> - Introduce XLogInsertExtended with this extra argument and let
>>> XLogInsert() in peace (Robert and I).
>>> Actually, I lied, there was still something I could do for this
>>> thread: attached are two patches implementing both approaches as
>>> respectively a-1 and a-2. Patch b is the set of logs I used for the
>>> tests to show with a low checkpoint_timeout that checkpoints are
>>> getting correctly skipped on an idle system.
>>>
>>
>> Unfortunately neither A nor B apply anymore.
>>
>> However, since the patches can still be read through I wonder if Robert
>> or Andres would care to opine on whether A or B is better now that they can
>> see the full implementation?
>>
>> For my 2c I'm happy with XLogInsertExtended() since it seems to be a rare
>> use case where flags are required.  This can always be refactored in the
>> future if/when the use of flags spreads.
>>
>
> XLogInsertExtended() is the one I would commit, if.
>

I'm not very excited about this patch. Too much code for so little benefit
and fragile too.

I'm not even sure what definition of "meaningful progress" is useful. If we
commit this, a similar bug could be filed for many similar WAL record types.

Since we zero out WAL files specifically to allow them to be compressed in
the archive, this patch saves a few bytes in the archive. Seems like we
could fake up some WAL files if needed for restore with an external tool,
if we really care.

Since we agree it wouldn't be backpatched, its pretty much saying we don't
care.


A promising approach would be to just skip logging the RUNNING_XACTS record
if there are no xacts running, which is the case we care about here (no
progress => no xacts). HS startup can only happen at a checkpoint, so if
there is no WAL file there is no checkpoint, so we don't need the
RUNNING_XACTS record to be written. That's a short patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-04-03 Thread Simon Riggs
On 14 March 2016 at 17:46, David Steele  wrote:

> On 2/24/16 12:40 AM, Michael Paquier wrote:
>
> This has the merit to be clear, thanks for the input. Whatever the
>> approach taken at the end we have two candidates:
>> - Extend XLogInsert() with an extra argument for flags (Andres)
>> - Introduce XLogInsertExtended with this extra argument and let
>> XLogInsert() in peace (Robert and I).
>> Actually, I lied, there was still something I could do for this
>> thread: attached are two patches implementing both approaches as
>> respectively a-1 and a-2. Patch b is the set of logs I used for the
>> tests to show with a low checkpoint_timeout that checkpoints are
>> getting correctly skipped on an idle system.
>>
>
> Unfortunately neither A nor B apply anymore.
>
> However, since the patches can still be read through I wonder if Robert or
> Andres would care to opine on whether A or B is better now that they can
> see the full implementation?
>
> For my 2c I'm happy with XLogInsertExtended() since it seems to be a rare
> use case where flags are required.  This can always be refactored in the
> future if/when the use of flags spreads.
>

XLogInsertExtended() is the one I would commit, if.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-28 Thread Michael Paquier
On Wed, Feb 24, 2016 at 2:40 PM, Michael Paquier
 wrote:
> This has the merit to be clear, thanks for the input. Whatever the
> approach taken at the end we have two candidates:
> - Extend XLogInsert() with an extra argument for flags (Andres)
> - Introduce XLogInsertExtended with this extra argument and let
> XLogInsert() in peace (Robert and I).
> Actually, I lied, there was still something I could do for this
> thread: attached are two patches implementing both approaches as
> respectively a-1 and a-2. Patch b is the set of logs I used for the
> tests to show with a low checkpoint_timeout that checkpoints are
> getting correctly skipped on an idle system.
>
> Let's see what happens next.

FWIW, I just made a couple of 5-min pgbench runs on my laptop (scale
100, 24 clients using 4 threads) with checkpoint_timeout = 30s and a
small value of shared_buffers to minimize the work of each checkpoint,
and I am seeing similar spikes with variations that can be assimilated
to noise. See for example the graph attached. Trying to put more
contention on the extra locks taken before entering the checkpoint
section to scan the progress LSN looks kind of difficult :) But that's
not really a surprise.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-23 Thread Robert Haas
On Fri, Feb 19, 2016 at 6:32 AM, Michael Paquier
 wrote:
>> To be honest, my heart still balances for the Extended() interface.
>> This reduces the risk of conflicts with back-patching with 9.5.
>
> Andres, others, what else can I do to make this thread move on? I can
> produce any version of this patch depending on committer's
> requirements.

FWIW, I don't expect to have time to review this in the level of
detail that would make me confident to commit it any time soon.  I've
said my piece on what I think the final patch should look like, and I
hope that argument was persuasive, but I don't have anything further
to add to what I already said.  I hope some other committer has some
cycles to look at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-21 Thread Michael Paquier
On Fri, Feb 19, 2016 at 10:33 PM, Amit Kapila wrote:
> On Fri, Feb 12, 2016 at 6:57 PM, Michael Paquier wrote:
> You doesn't seem to have taken care of below typo in your patch as
> pointed out by me earlier.
>
> + * to not rely on taking an exclusive lock an all the WAL insertion locks,
>
> /an all/on all

Sorry about that. Hopefully fixed now.

> Does this in anyway take care of the case when there is an activity
> on unlogged tables?
> I think avoiding to perform checkpoints in an idle system is introduced
> in commit 4d14fe0048cf80052a3ba2053560f8aab1bb1b22 whereas
> unlogged relation is introduced by commit
> 53dbc27c62d8e1b6c5253feba04a5094cb8fe046 which is much later.
> Now, I think one might consider it okay to not do anything for
> unlogged tables as it is not done previously and this patch is
> anyway improving the current situation, but I feel if we agree
> that checkpoints will get skipped if the only operations that
> are happening in the system are on unlogged relations, then
> it is better to add it in comments as an improvement even if we
> don't want to address it as part of this patch.

Nope, nothing is done, just to not complicate the patch more than
necessary. I agree though that we had better not update the progress
LSN when logging something related to INIT_FORKNUM, there is little
point to run non-forced checkpoints in this case as on recovery
unlogged relations are wiped out.

> + elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt
> %X/%X",
> + (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
> + (uint32) (ControlFile->checkPoint >> 32), (uint32)
> ControlFile->checkPoint);
>
> Are you proposing to have the newly introduced elog messages
> as part of commit or are these just for debugging purpose? If you
> are proposing for commit, then I think it is worth to justify the need
> of same and we should discuss what is the appropriate log level,
> otherwise, I think you can have these in an additional patch just for
> verification as the main patch is now very close to being
> ready for committer.

I have never intended to have those logs being part of the patch that
should be committed, and putting them at LOG level avoided to have a
bunch of garbage in the logs during the tests (got lazy to grep on the
entries for those tests). I am no:t against adding a comment, here is
one I just came up with:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -448,6 +448,12 @@ typedef struct XLogwrtResult
  * hence reducing the impact of the activity lookup. This takes also
  * advantage to avoid 8-byte torn reads on some platforms by using the
  * fact that each insert lock is located on the same cache line.
+ * XXX: There is still room for more improvements here, particularly
+ * WAL operations related to unlogged relations (INIT_FORKNUM) should not
+ * update the progress LSN as those relations are reset during crash
+ * recovery so enforcing buffers of such relations to be flushed for
+ * example in the case of a load only on unlogged relations is a waste
+ * of disk write.
  */

> Also, I think it is worth to once take the performance data for
> write tests (using pgbench 30 minute run or some other way) with
> minimum checkpoint_timeout (i.e 30s) to see if the additional locking
> has any impact on performance.  I think taking locks at intervals
> of 15s or 30s should not matter much, but it is better to be safe.

I don't think performance will be impacted, but there are no reasons
to not do any measurements either. I'll try to get some graphs
tomorrow with runs on my laptop, mainly looking for any effects of
this patch on TPS when checkpoints show up.

For now attached is an updated patch, with a second patch (b)
including the logs for testing.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 05e49d7..3ab2bc9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8373,8 +8373,12 @@ CreateCheckPoint(int flags)
 	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
 	{
+		elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt %X/%X",
+			 (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
+			 (uint32) (ControlFile->checkPoint >> 32), (uint32) ControlFile->checkPoint);
 		if (progress_lsn == ControlFile->checkPoint)
 		{
+			elog(LOG, "Checkpoint is skipped");
 			WALInsertLockRelease();
 			LWLockRelease(CheckpointLock);
 			END_CRIT_SECTION();
@@ -8541,7 +8545,11 @@ CreateCheckPoint(int flags)
 	 * recovery we don't need to write running xact data.
 	 */
 	if (!shutdown && XLogStandbyInfoActive())
-		LogStandbySnapshot();
+	{
+		XLogRecPtr lsn = LogStandbySnapshot();
+		elog(LOG, "snapshot taken by checkpoint %X/%X",
+			 (uint32) (lsn >> 32), (uint32) lsn);
+	}
 
 	START_CRIT_SECTION();
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-19 Thread Amit Kapila
On Fri, Feb 12, 2016 at 6:57 PM, Michael Paquier 
wrote:
>
>
> OK, here is attached a new version that I hope addresses all the
> points raised until now. The following things are changed:
> - Extend XLogInsert with a new uint8 argument to have flags. As of now
> there is one flag: XLOG_INSERT_NO_PROGRESS that can be set to not
> update the progress. By default, the progress LSN is updated.
> - Add extra check in bgwriter to not log a snapshot to be sure that no
> snapshots are logged when there is no activity since last snapshot
> taken, and since last checkpoint.
>

You doesn't seem to have taken care of below typo in your patch as
pointed out by me earlier.

+ * to not rely on taking an exclusive lock an all the WAL insertion locks,

/an all/on all

Does this in anyway take care of the case when there is an activity
on unlogged tables?
I think avoiding to perform checkpoints in an idle system is introduced
in commit 4d14fe0048cf80052a3ba2053560f8aab1bb1b22 whereas
unlogged relation is introduced by commit
53dbc27c62d8e1b6c5253feba04a5094cb8fe046 which is much later.
Now, I think one might consider it okay to not do anything for
unlogged tables as it is not done previously and this patch is
anyway improving the current situation, but I feel if we agree
that checkpoints will get skipped if the only operations that
are happening in the system are on unlogged relations, then
it is better to add it in comments as an improvement even if we
don't want to address it as part of this patch.

+ elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt
%X/%X",
+ (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
+ (uint32) (ControlFile->checkPoint >> 32), (uint32)
ControlFile->checkPoint);

Are you proposing to have the newly intorduced elog messages
as part of commit or are these just for debugging purpose? If you
are proposing for commit, then I think it is worth to justify the need
of same and we should discuss what is the appropriate log level,
otherwise, I think you can have these in an additional patch just for
verification as the main patch is now very close to being
ready for committer.

Also, I think it is worth to once take the performance data for
write tests (using pgbench 30 minute run or some other way) with
minimum checkpoint_timeout (i.e 30s) to see if the additional locking
has any impact on performance.  I think taking locks at intervals
of 15s or 30s should not matter much, but it is better to be safe.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-18 Thread Michael Paquier
On Sat, Feb 13, 2016 at 2:26 PM, Michael Paquier
 wrote:
> On Sat, Feb 13, 2016 at 1:01 PM, Robert Haas  wrote:
>> On Fri, Feb 12, 2016 at 12:56 PM, Andres Freund  wrote:
>>> On 2016-02-12 12:37:35 -0500, Robert Haas wrote:
 On Mon, Feb 8, 2016 at 4:18 AM, Andres Freund  wrote:
 > I'm not really a fan. I'd rather change existing callers to add a
 > 'flags' bitmask argument. Right now there can't really be XLogInserts()
 > in extension code, so that's pretty ok to change.

 Yeah, but to what benefit?  You're just turning a smaller patch into a
 bigger one and requiring churning a bunch of code that wouldn't
 otherwise need to be touched.  I think Michael has a good point.
>>>
>>> It has the advantage of not ending up with an extra interface, that
>>> we're otherwise never going to get rid of? If not now, when would we
>>> remove it? Sure it touches a few more lines, but that's entirely trivial
>>> mechanical changes, so what?
>
> Note: the patch has grown from 15kB to 46kB by switching to the
> extended interface to the addition of an argument in XLogInsert().
>
>> I don't feel that there's only one right way to do this, but I think
>> Michael's position is both reasonable and similar to what we have done
>> in previous cases of this sort.
>
> To be honest, my heart still balances for the Extended() interface.
> This reduces the risk of conflicts with back-patching with 9.5.

Andres, others, what else can I do to make this thread move on? I can
produce any version of this patch depending on committer's
requirements.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-12 Thread Michael Paquier
On Thu, Feb 11, 2016 at 5:46 PM, Michael Paquier
 wrote:
> On Thu, Feb 11, 2016 at 5:20 PM, Andres Freund  wrote:
>> On 2016-02-11 09:25:30 +0530, Amit Kapila wrote:
>>> On Wed, Feb 10, 2016 at 1:42 PM, Michael Paquier 
>>> wrote:
>>> >
>>> > On Wed, Feb 10, 2016 at 5:00 PM, Amit Kapila 
>>> wrote:
>>> > > Okay, but isn't it better that we remove the snapshot taken
>>> > > at checkpoint time in the main branch or till where this code is
>>> > > getting back-patched.   Do you see any need of same after
>>> > > having the logging of snapshot in bgwriter?
>>> >
>>> > But this one is necessary as well to allow hot standby faster to
>>> > initialize, no? Particularly in the case where a bgwriter snapshot
>>> > would have been taken just before the checkpoint, there may be up to
>>> > 15s until the next one.
>>> >
>>>
>>> It could be helpful if checkpoint is done at shorter intervals, otherwise
>>> we anyway log it at 15s interval and if need faster initialisation
>>> of hot-standby, then it is better to reduce the log snapshot interval
>>> in bgwriter.
>>
>> No. By emitting the first snapshot directly after the determination of
>> the redo pointer, we can get into STANDBY_SNAPSHOT_READY more
>> quickly. Especially if some of the snapshots are overflowed. During
>> startup 15s can be a *long* time; but on the other hand there's not much
>> benefit at logging snapshots more frequently when the system is up.
>> I don't think we should tinker with the frequency/logging points, while
>> fixing the issue here.
>
> Agreement from here on this point. Andres, what's still remaining on
> my side is to know how to we detect in XLoginsert() how we update the
> LSN progress. I was willing to add XLogInsertExtended() with a
> complementary int/bool flag and let XLogInsert() alone as you don't
> like much doing this decision making using just the record type as I
> did in one of the patch upthread, however you did not like this idea
> either. What exactly do you have in mind?

OK, here is attached a new version that I hope addresses all the
points raised until now. The following things are changed:
- Extend XLogInsert with a new uint8 argument to have flags. As of now
there is one flag: XLOG_INSERT_NO_PROGRESS that can be set to not
update the progress. By default, the progress LSN is updated.
- Add extra check in bgwriter to not log a snapshot to be sure that no
snapshots are logged when there is no activity since last snapshot
taken, and since last checkpoint.
-- 
Michael
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index c740952..22ebba5 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -634,7 +634,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 		XLogRegisterData((char *) , SizeOfBrinCreateIdx);
 		XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
 
-		recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
+		recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX, 0);
 
 		page = BufferGetPage(meta);
 		PageSetLSN(page, recptr);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index d0ca485..98fcc2c 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -199,7 +199,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 			XLogRegisterBuffer(0, oldbuf, REGBUF_STANDARD);
 			XLogRegisterBufData(0, (char *) newtup, newsz);
 
-			recptr = XLogInsert(RM_BRIN_ID, info);
+			recptr = XLogInsert(RM_BRIN_ID, info, 0);
 
 			PageSetLSN(oldpage, recptr);
 		}
@@ -294,7 +294,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 			/* old page */
 			XLogRegisterBuffer(2, oldbuf, REGBUF_STANDARD);
 
-			recptr = XLogInsert(RM_BRIN_ID, info);
+			recptr = XLogInsert(RM_BRIN_ID, info, 0);
 
 			PageSetLSN(oldpage, recptr);
 			PageSetLSN(newpage, recptr);
@@ -444,7 +444,7 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
 
 		XLogRegisterBuffer(1, revmapbuf, 0);
 
-		recptr = XLogInsert(RM_BRIN_ID, info);
+		recptr = XLogInsert(RM_BRIN_ID, info, 0);
 
 		PageSetLSN(page, recptr);
 		PageSetLSN(BufferGetPage(revmapbuf), recptr);
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index b2c273d..d30345f 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -487,7 +487,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 
 		XLogRegisterBuffer(1, buf, REGBUF_WILL_INIT);
 
-		recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_REVMAP_EXTEND);
+		recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_REVMAP_EXTEND, 0);
 		PageSetLSN(metapage, recptr);
 		PageSetLSN(page, recptr);
 	}
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 06ba9cb..db2f959 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -415,7 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-12 Thread Andres Freund
On 2016-02-12 12:37:35 -0500, Robert Haas wrote:
> On Mon, Feb 8, 2016 at 4:18 AM, Andres Freund  wrote:
> > I'm not really a fan. I'd rather change existing callers to add a
> > 'flags' bitmask argument. Right now there can't really be XLogInserts()
> > in extension code, so that's pretty ok to change.
> 
> Yeah, but to what benefit?  You're just turning a smaller patch into a
> bigger one and requiring churning a bunch of code that wouldn't
> otherwise need to be touched.  I think Michael has a good point.

It has the advantage of not ending up with an extra interface, that
we're otherwise never going to get rid of? If not now, when would we
remove it? Sure it touches a few more lines, but that's entirely trivial
mechanical changes, so what?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-12 Thread Robert Haas
On Mon, Feb 8, 2016 at 4:18 AM, Andres Freund  wrote:
> I'm not really a fan. I'd rather change existing callers to add a
> 'flags' bitmask argument. Right now there can't really be XLogInserts()
> in extension code, so that's pretty ok to change.

Yeah, but to what benefit?  You're just turning a smaller patch into a
bigger one and requiring churning a bunch of code that wouldn't
otherwise need to be touched.  I think Michael has a good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-12 Thread Michael Paquier
On Sat, Feb 13, 2016 at 1:01 PM, Robert Haas  wrote:
> On Fri, Feb 12, 2016 at 12:56 PM, Andres Freund  wrote:
>> On 2016-02-12 12:37:35 -0500, Robert Haas wrote:
>>> On Mon, Feb 8, 2016 at 4:18 AM, Andres Freund  wrote:
>>> > I'm not really a fan. I'd rather change existing callers to add a
>>> > 'flags' bitmask argument. Right now there can't really be XLogInserts()
>>> > in extension code, so that's pretty ok to change.
>>>
>>> Yeah, but to what benefit?  You're just turning a smaller patch into a
>>> bigger one and requiring churning a bunch of code that wouldn't
>>> otherwise need to be touched.  I think Michael has a good point.
>>
>> It has the advantage of not ending up with an extra interface, that
>> we're otherwise never going to get rid of? If not now, when would we
>> remove it? Sure it touches a few more lines, but that's entirely trivial
>> mechanical changes, so what?

Note: the patch has grown from 15kB to 46kB by switching to the
extended interface to the addition of an argument in XLogInsert().

> I don't feel that there's only one right way to do this, but I think
> Michael's position is both reasonable and similar to what we have done
> in previous cases of this sort.

To be honest, my heart still balances for the Extended() interface.
This reduces the risk of conflicts with back-patching with 9.5.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-12 Thread Robert Haas
On Fri, Feb 12, 2016 at 12:56 PM, Andres Freund  wrote:
> On 2016-02-12 12:37:35 -0500, Robert Haas wrote:
>> On Mon, Feb 8, 2016 at 4:18 AM, Andres Freund  wrote:
>> > I'm not really a fan. I'd rather change existing callers to add a
>> > 'flags' bitmask argument. Right now there can't really be XLogInserts()
>> > in extension code, so that's pretty ok to change.
>>
>> Yeah, but to what benefit?  You're just turning a smaller patch into a
>> bigger one and requiring churning a bunch of code that wouldn't
>> otherwise need to be touched.  I think Michael has a good point.
>
> It has the advantage of not ending up with an extra interface, that
> we're otherwise never going to get rid of? If not now, when would we
> remove it? Sure it touches a few more lines, but that's entirely trivial
> mechanical changes, so what?

I don't think that it's a bad thing to have a simple interface for
simple use cases and a complex one for more complex cases.  I don't
feel any need to convert every use of ReadBuffer() in the source code
to ReadBufferExtended(), for example.  Nor do I feel like we should
have changed every call to LockAcquire() instead of introducing
LockAcquireExtended().  This case seems analogous, and there are a few
advantages.  One is that it avoids creating meaningless conflicts when
back-patching.   Another is that when a function gets an extra
argument, that often turns out to be something that happens more than
once.  It's nice not to keep having to whack around every call in the
tree every time the call signature changes.  Also, finding the small
number of callers that need non-default behavior is simplified,
because you can just grep for the Extended version of the function and
there won't be many hits.

I don't feel that there's only one right way to do this, but I think
Michael's position is both reasonable and similar to what we have done
in previous cases of this sort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-11 Thread Michael Paquier
On Thu, Feb 11, 2016 at 5:20 PM, Andres Freund  wrote:
> On 2016-02-11 09:25:30 +0530, Amit Kapila wrote:
>> On Wed, Feb 10, 2016 at 1:42 PM, Michael Paquier 
>> wrote:
>> >
>> > On Wed, Feb 10, 2016 at 5:00 PM, Amit Kapila 
>> wrote:
>> > > Okay, but isn't it better that we remove the snapshot taken
>> > > at checkpoint time in the main branch or till where this code is
>> > > getting back-patched.   Do you see any need of same after
>> > > having the logging of snapshot in bgwriter?
>> >
>> > But this one is necessary as well to allow hot standby faster to
>> > initialize, no? Particularly in the case where a bgwriter snapshot
>> > would have been taken just before the checkpoint, there may be up to
>> > 15s until the next one.
>> >
>>
>> It could be helpful if checkpoint is done at shorter intervals, otherwise
>> we anyway log it at 15s interval and if need faster initialisation
>> of hot-standby, then it is better to reduce the log snapshot interval
>> in bgwriter.
>
> No. By emitting the first snapshot directly after the determination of
> the redo pointer, we can get into STANDBY_SNAPSHOT_READY more
> quickly. Especially if some of the snapshots are overflowed. During
> startup 15s can be a *long* time; but on the other hand there's not much
> benefit at logging snapshots more frequently when the system is up.
> I don't think we should tinker with the frequency/logging points, while
> fixing the issue here.

Agreement from here on this point. Andres, what's still remaining on
my side is to know how to we detect in XLoginsert() how we update the
LSN progress. I was willing to add XLogInsertExtended() with a
complementary int/bool flag and let XLogInsert() alone as you don't
like much doing this decision making using just the record type as I
did in one of the patch upthread, however you did not like this idea
either. What exactly do you have in mind?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-11 Thread Andres Freund
On 2016-02-11 09:25:30 +0530, Amit Kapila wrote:
> On Wed, Feb 10, 2016 at 1:42 PM, Michael Paquier 
> wrote:
> >
> > On Wed, Feb 10, 2016 at 5:00 PM, Amit Kapila 
> wrote:
> > > Okay, but isn't it better that we remove the snapshot taken
> > > at checkpoint time in the main branch or till where this code is
> > > getting back-patched.   Do you see any need of same after
> > > having the logging of snapshot in bgwriter?
> >
> > But this one is necessary as well to allow hot standby faster to
> > initialize, no? Particularly in the case where a bgwriter snapshot
> > would have been taken just before the checkpoint, there may be up to
> > 15s until the next one.
> >
> 
> It could be helpful if checkpoint is done at shorter intervals, otherwise
> we anyway log it at 15s interval and if need faster initialisation
> of hot-standby, then it is better to reduce the log snapshot interval
> in bgwriter.

No. By emitting the first snapshot directly after the determination of
the redo pointer, we can get into STANDBY_SNAPSHOT_READY more
quickly. Especially if some of the snapshots are overflowed. During
startup 15s can be a *long* time; but on the other hand there's not much
benefit at logging snapshots more frequently when the system is up.

I don't think we should tinker with the frequency/logging points, while
fixing the issue here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-10 Thread Amit Kapila
On Wed, Feb 10, 2016 at 1:42 PM, Michael Paquier 
wrote:
>
> On Wed, Feb 10, 2016 at 5:00 PM, Amit Kapila 
wrote:
> > Okay, but isn't it better that we remove the snapshot taken
> > at checkpoint time in the main branch or till where this code is
> > getting back-patched.   Do you see any need of same after
> > having the logging of snapshot in bgwriter?
>
> But this one is necessary as well to allow hot standby faster to
> initialize, no? Particularly in the case where a bgwriter snapshot
> would have been taken just before the checkpoint, there may be up to
> 15s until the next one.
>

It could be helpful if checkpoint is done at shorter intervals, otherwise
we anyway log it at 15s interval and if need faster initialisation
of hot-standby, then it is better to reduce the log snapshot interval
in bgwriter.  I think if go by my reasoning then it should have been
removed when bgwriter is changed to log snapshot, so I think you
can take this suggestion if you also feel the same, anyway this
is not a correctness issue.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-10 Thread Amit Kapila
On Wed, Feb 10, 2016 at 1:01 PM, Michael Paquier 
wrote:

>
>
> On Wed, Feb 10, 2016 at 4:11 PM, Amit Kapila 
> wrote:
>
>>
>>
>>
>>>
>>> >> > - last_snapshot_lsn != GetXLogInsertRecPtr())
>>> >> > +
>>> >> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
>>> >> >
>>> >> > How the above check in patch suppose to work?
>>> >> > I think it could so happen once bgwriter logs snapshot, both
>>> checkpoint
>>> >> > and progressAt won't get updated any further and I think this check
>>> will
>>> >> > allow to log snapshots in such a case as well.
>>> >>
>>> >> The only purpose of this check is to do the following thing: if no WAL
>>> >> activity has happened since last checkpoint, there is no need to log a
>>> >> standby snapshot from the perspective of the bgwriter. In case the
>>> >> system is idle, we want to skip logging that and avoid unnecessary
>>> >> checkpoints because those records would have been generated. If the
>>> >> system is not idle, or to put it in other words there has been at
>>> >> least one record since the last checkpoint, we would log a standby
>>> >> snapshot, enforcing as well a checkpoint to happen the next time
>>> >> CreateCheckpoint() is gone through, and a standby snapshot is logged
>>> >> as well after the checkpoint contents are flushed. I am not sure I
>>> >> understand what you are getting at...
>>> >
>>> > Let me try to say again, suppose ControlFile->checkPoint is at
>>> > 100 and latest value of progressAt returned by GetProgressRecPtr
>>> > is 105, so the first time the above check happens, it will allow
>>> > to log standby snapshot which is okay, now assume there is no
>>> > activity, neither there is any checkpoint and again after
>>> > LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
>>> > gets executed, it will pass and log the standby snapshot which is
>>> > *not* okay, because we don't want to log standby snapshot when
>>> > there is no activity.  Am I missing something?
>>>
>>> Ah, right. Sorry for not getting your point. OK now I got it. So
>>> basically what the code does not is that if there is just one record after
>>> a checkpoint we would log every 15s a standby snapshot until the next
>>> checkpoint even if nothing happens, but we can save a bunch of standby
>>> records. So your point is that we need to save the result of
>>> GetProgressRecPtr() at each loop in the bgwriter when a standby snapshot is
>>> taken, say in a static XLogRecPtr called last_progress_lsn. And then at the
>>> next loop we compare it on the current result of GetProgressRecPtr(), so
>>> the condition to check if a standby snapshot should be generated or not in
>>> the bgwriter becomes that:
>>> (now >= timeout &&
>>>  GetLastCheckpointRecPtr() < current_progress_ptr &&
>>>  last_progress_ptr < current_progress_ptr)
>>>
>>
>> Why do you think including checkpoint related check is
>> required, how about something like below:
>>
>> (now >= timeout &&
>>  last_progress_ptr < current_progress_ptr)
>>
>
> We need as well to be sure that no standby snapshots are logged just after
> a checkpoint in this code path when last_progress_ptr is referring to an
> LSN position *before* the last checkpoint LSN. There is already one
> snapshot in CreateCheckpoint() that is logged, there is no need of an extra
> one, explaining why the check on progress since last checkpoint is
> necessary to me.
>

Okay, but isn't it better that we remove the snapshot taken
at checkpoint time in the main branch or till where this code is
getting back-patched.   Do you see any need of same after
having the logging of snapshot in bgwriter?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-10 Thread Michael Paquier
On Wed, Feb 10, 2016 at 5:00 PM, Amit Kapila  wrote:
> Okay, but isn't it better that we remove the snapshot taken
> at checkpoint time in the main branch or till where this code is
> getting back-patched.   Do you see any need of same after
> having the logging of snapshot in bgwriter?

But this one is necessary as well to allow hot standby faster to
initialize, no? Particularly in the case where a bgwriter snapshot
would have been taken just before the checkpoint, there may be up to
15s until the next one.

And AFAIK, this code would not get a back-patch, as stated by Andres
upthread :( I would think that it is better to actually get a
backpatch, but well...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-09 Thread Michael Paquier
On Tue, Feb 9, 2016 at 2:24 PM, Amit Kapila  wrote:
> Do you see any benefit in allowing checkpoints for such cases considering
> bgwriter will anyway take care of logging standby snapshot for such
> cases?

Well, the idea is to improve the system responsiveness. Imagine that
the call to GetProgressRecPtr() is done within the exclusive lock
portion, but that while scanning the WAL insert lock entries another
backend is waiting to take a lock on them, and this backend is trying
to insert the first record that updates the progress LSN since the
last checkpoint. In this case the checkpoint would be skipped.
However, imagine that such a record is able to get through it while
scanning the progress values in the WAL insert locks, in which case a
checkpoint would be generated. In any case, I think that we should try
to get exclusive lock areas as small as possible if we have ways to do
so.

> I don't think there is any reasonable benefit to improve the situation of
> idle-system check for checkpoint other than just including
> standby snapshot WAL record.  OTOH as checkpoint is not so seldom
> operation, so allowing such a change should be okay, but I don't see
> the need for same.

I am not sure I understand your point here...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-09 Thread Amit Kapila
On Tue, Feb 9, 2016 at 6:08 PM, Michael Paquier 
wrote:
>
> On Tue, Feb 9, 2016 at 2:24 PM, Amit Kapila 
wrote:
> > Do you see any benefit in allowing checkpoints for such cases
considering
> > bgwriter will anyway take care of logging standby snapshot for such
> > cases?
>
> Well, the idea is to improve the system responsiveness. Imagine that
> the call to GetProgressRecPtr() is done within the exclusive lock
> portion, but that while scanning the WAL insert lock entries another
> backend is waiting to take a lock on them, and this backend is trying
> to insert the first record that updates the progress LSN since the
> last checkpoint. In this case the checkpoint would be skipped.
> However, imagine that such a record is able to get through it while
> scanning the progress values in the WAL insert locks, in which case a
> checkpoint would be generated.

Such case was not covered without your patch and I don't see the
need of same especially at the cost of taking locks.

>  In any case, I think that we should try
> to get exclusive lock areas as small as possible if we have ways to do
> so.
>

Sure, but not when you are already going to take lock for longer
duration.


- last_snapshot_lsn != GetXLogInsertRecPtr())
+
GetLastCheckpointRecPtr() < GetProgressRecPtr())

How the above check in patch suppose to work?
I think it could so happen once bgwriter logs snapshot, both checkpoint
and progressAt won't get updated any further and I think this check will
allow to log snapshots in such a case as well.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-09 Thread Michael Paquier
On Tue, Feb 9, 2016 at 10:42 PM, Amit Kapila wrote:
> On Tue, Feb 9, 2016 at 6:08 PM, Michael Paquier wrote:
>> Well, the idea is to improve the system responsiveness. Imagine that
>> the call to GetProgressRecPtr() is done within the exclusive lock
>> portion, but that while scanning the WAL insert lock entries another
>> backend is waiting to take a lock on them, and this backend is trying
>> to insert the first record that updates the progress LSN since the
>> last checkpoint. In this case the checkpoint would be skipped.
>> However, imagine that such a record is able to get through it while
>> scanning the progress values in the WAL insert locks, in which case a
>> checkpoint would be generated.
>
> Such case was not covered without your patch and I don't see the
> need of same especially at the cost of taking locks.

In this code path that's quite cheap, and it clearly improves the
decision-making when wal_level >= hs which is now rather broken (say
not optimized much).

>>  In any case, I think that we should try
>> to get exclusive lock areas as small as possible if we have ways to do
>> so.
>>
>
> Sure, but not when you are already going to take lock for longer
> duration.

Why would an exclusive lock be taken for a longer duration in the
checkpoint portion? Do you mean that the global time to take exclusive
locks gets increased? For each individual WAL insert lock, yes that's
the case, but that's still far cheaper to have this evaluation logic
here than in ReserveXLogInsertLocation().

> - last_snapshot_lsn != GetXLogInsertRecPtr())
> +
> GetLastCheckpointRecPtr() < GetProgressRecPtr())
>
> How the above check in patch suppose to work?
> I think it could so happen once bgwriter logs snapshot, both checkpoint
> and progressAt won't get updated any further and I think this check will
> allow to log snapshots in such a case as well.

The only purpose of this check is to do the following thing: if no WAL
activity has happened since last checkpoint, there is no need to log a
standby snapshot from the perspective of the bgwriter. In case the
system is idle, we want to skip logging that and avoid unnecessary
checkpoints because those records would have been generated. If the
system is not idle, or to put it in other words there has been at
least one record since the last checkpoint, we would log a standby
snapshot, enforcing as well a checkpoint to happen the next time
CreateCheckpoint() is gone through, and a standby snapshot is logged
as well after the checkpoint contents are flushed. I am not sure I
understand what you are getting at...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-09 Thread Amit Kapila
On Wed, Feb 10, 2016 at 7:17 AM, Michael Paquier 
wrote:
>
> On Tue, Feb 9, 2016 at 10:42 PM, Amit Kapila wrote:
> > On Tue, Feb 9, 2016 at 6:08 PM, Michael Paquier wrote:
> >> Well, the idea is to improve the system responsiveness. Imagine that
> >> the call to GetProgressRecPtr() is done within the exclusive lock
> >> portion, but that while scanning the WAL insert lock entries another
> >> backend is waiting to take a lock on them, and this backend is trying
> >> to insert the first record that updates the progress LSN since the
> >> last checkpoint. In this case the checkpoint would be skipped.
> >> However, imagine that such a record is able to get through it while
> >> scanning the progress values in the WAL insert locks, in which case a
> >> checkpoint would be generated.
> >
> > Such case was not covered without your patch and I don't see the
> > need of same especially at the cost of taking locks.
>
> In this code path that's quite cheap, and it clearly improves the
> decision-making when wal_level >= hs which is now rather broken (say
> not optimized much).
>
> >>  In any case, I think that we should try
> >> to get exclusive lock areas as small as possible if we have ways to do
> >> so.
> >>
> >
> > Sure, but not when you are already going to take lock for longer
> > duration.
>
> Why would an exclusive lock be taken for a longer duration in the
> checkpoint portion?

Consider below code:

/*

+ * Get progress before acquiring insert locks to shorten the locked

+ * section waiting ahead.

+ */

+ progress_lsn = GetProgressRecPtr();

+

+ /*

  * We must block concurrent insertions while examining insert state to

  * determine the checkpoint REDO pointer.

  */

  WALInsertLockAcquireExclusive();


In GetProgressRecPtr(), first it take WALInsertLocks one-by-one to
retrieve the latest value of progressAt and then again it will take
all the WALInsertLocks in WALInsertLockAcquireExclusive() and then
do some work.  Now I think it is okay to retrieve the latest of progressAt
after WALInsertLockAcquireExclusive() as you don't need to take the
locks again.

>
> > - last_snapshot_lsn != GetXLogInsertRecPtr())
> > +
> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
> >
> > How the above check in patch suppose to work?
> > I think it could so happen once bgwriter logs snapshot, both checkpoint
> > and progressAt won't get updated any further and I think this check will
> > allow to log snapshots in such a case as well.
>
> The only purpose of this check is to do the following thing: if no WAL
> activity has happened since last checkpoint, there is no need to log a
> standby snapshot from the perspective of the bgwriter. In case the
> system is idle, we want to skip logging that and avoid unnecessary
> checkpoints because those records would have been generated. If the
> system is not idle, or to put it in other words there has been at
> least one record since the last checkpoint, we would log a standby
> snapshot, enforcing as well a checkpoint to happen the next time
> CreateCheckpoint() is gone through, and a standby snapshot is logged
> as well after the checkpoint contents are flushed. I am not sure I
> understand what you are getting at...
>

Let me try to say again, suppose ControlFile->checkPoint is at
100 and latest value of progressAt returned by GetProgressRecPtr
is 105, so the first time the above check happens, it will allow
to log standby snapshot which is okay, now assume there is no
activity, neither there is any checkpoint and again after
LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
gets executed, it will pass and log the standby snapshot which is
*not* okay, because we don't want to log standby snapshot when
there is no activity.  Am I missing something?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-09 Thread Amit Kapila
On Wed, Feb 10, 2016 at 12:16 PM, Michael Paquier  wrote:

>
>
> On Wed, Feb 10, 2016 at 12:41 PM, Amit Kapila 
> wrote:
> > On Wed, Feb 10, 2016 at 7:17 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >>
> >
> > Consider below code:
> >
> > /*
> > + * Get progress before acquiring insert locks to shorten the locked
> > + * section waiting ahead.
> > + */
> > + progress_lsn = GetProgressRecPtr();
> > +
> > + /*
> >   * We must block concurrent insertions while examining insert state to
> >   * determine the checkpoint REDO pointer.
> >   */
> >   WALInsertLockAcquireExclusive();
> >
> > In GetProgressRecPtr(), first it take WALInsertLocks one-by-one to
> > retrieve the latest value of progressAt and then again it will take
> > all the WALInsertLocks in WALInsertLockAcquireExclusive() and then
> > do some work.  Now I think it is okay to retrieve the latest of
> progressAt
> > after WALInsertLockAcquireExclusive() as you don't need to take the
> > locks again.
>
> Sure, that would be OK. Now I am not on board if this means to have the
> checkpoint take an exclusive locks for a bit longer duration if that's not
> actually necessary.
>
>
Taking and releasing locks again and again (which is done in patch)
matters much more than adding few instructions under it and I think
if you would have written the code in-a-way as in patch in some of
the critical path, it would have been regressed badly, but because
checkpoint doesn't happen often, reproducing regression is difficult.
Anyway, there is no-point in too much argument, I think you have
understood what I wanted to say and if you think the way code is
currently written is better, then lets leave as it is for somebody else
to give suggestion on it.


>
> >> > - last_snapshot_lsn != GetXLogInsertRecPtr())
> >> > +
> >> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
> >> >
> >> > How the above check in patch suppose to work?
> >> > I think it could so happen once bgwriter logs snapshot, both
> checkpoint
> >> > and progressAt won't get updated any further and I think this check
> will
> >> > allow to log snapshots in such a case as well.
> >>
> >> The only purpose of this check is to do the following thing: if no WAL
> >> activity has happened since last checkpoint, there is no need to log a
> >> standby snapshot from the perspective of the bgwriter. In case the
> >> system is idle, we want to skip logging that and avoid unnecessary
> >> checkpoints because those records would have been generated. If the
> >> system is not idle, or to put it in other words there has been at
> >> least one record since the last checkpoint, we would log a standby
> >> snapshot, enforcing as well a checkpoint to happen the next time
> >> CreateCheckpoint() is gone through, and a standby snapshot is logged
> >> as well after the checkpoint contents are flushed. I am not sure I
> >> understand what you are getting at...
> >
> > Let me try to say again, suppose ControlFile->checkPoint is at
> > 100 and latest value of progressAt returned by GetProgressRecPtr
> > is 105, so the first time the above check happens, it will allow
> > to log standby snapshot which is okay, now assume there is no
> > activity, neither there is any checkpoint and again after
> > LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
> > gets executed, it will pass and log the standby snapshot which is
> > *not* okay, because we don't want to log standby snapshot when
> > there is no activity.  Am I missing something?
>
> Ah, right. Sorry for not getting your point. OK now I got it. So basically
> what the code does not is that if there is just one record after a
> checkpoint we would log every 15s a standby snapshot until the next
> checkpoint even if nothing happens, but we can save a bunch of standby
> records. So your point is that we need to save the result of
> GetProgressRecPtr() at each loop in the bgwriter when a standby snapshot is
> taken, say in a static XLogRecPtr called last_progress_lsn. And then at the
> next loop we compare it on the current result of GetProgressRecPtr(), so
> the condition to check if a standby snapshot should be generated or not in
> the bgwriter becomes that:
> (now >= timeout &&
>  GetLastCheckpointRecPtr() < current_progress_ptr &&
>  last_progress_ptr < current_progress_ptr)
>

Why do you think including checkpoint related check is
required, how about something like below:

(now >= timeout &&
 last_progress_ptr < current_progress_ptr)



> That's your point?
>

Yes.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-09 Thread Michael Paquier
On Wed, Feb 10, 2016 at 12:41 PM, Amit Kapila 
wrote:
> On Wed, Feb 10, 2016 at 7:17 AM, Michael Paquier <
michael.paqu...@gmail.com>
> wrote:
>>
>> On Tue, Feb 9, 2016 at 10:42 PM, Amit Kapila wrote:
>> > On Tue, Feb 9, 2016 at 6:08 PM, Michael Paquier wrote:
>> >> Well, the idea is to improve the system responsiveness. Imagine that
>> >> the call to GetProgressRecPtr() is done within the exclusive lock
>> >> portion, but that while scanning the WAL insert lock entries another
>> >> backend is waiting to take a lock on them, and this backend is trying
>> >> to insert the first record that updates the progress LSN since the
>> >> last checkpoint. In this case the checkpoint would be skipped.
>> >> However, imagine that such a record is able to get through it while
>> >> scanning the progress values in the WAL insert locks, in which case a
>> >> checkpoint would be generated.
>> >
>> > Such case was not covered without your patch and I don't see the
>> > need of same especially at the cost of taking locks.
>>
>> In this code path that's quite cheap, and it clearly improves the
>> decision-making when wal_level >= hs which is now rather broken (say
>> not optimized much).
>>
>> >>  In any case, I think that we should try
>> >> to get exclusive lock areas as small as possible if we have ways to do
>> >> so.
>> >>
>> >
>> > Sure, but not when you are already going to take lock for longer
>> > duration.
>>
>> Why would an exclusive lock be taken for a longer duration in the
>> checkpoint portion?
>
> Consider below code:
>
> /*
> + * Get progress before acquiring insert locks to shorten the locked
> + * section waiting ahead.
> + */
> + progress_lsn = GetProgressRecPtr();
> +
> + /*
>   * We must block concurrent insertions while examining insert state to
>   * determine the checkpoint REDO pointer.
>   */
>   WALInsertLockAcquireExclusive();
>
> In GetProgressRecPtr(), first it take WALInsertLocks one-by-one to
> retrieve the latest value of progressAt and then again it will take
> all the WALInsertLocks in WALInsertLockAcquireExclusive() and then
> do some work.  Now I think it is okay to retrieve the latest of progressAt
> after WALInsertLockAcquireExclusive() as you don't need to take the
> locks again.

Sure, that would be OK. Now I am not on board if this means to have the
checkpoint take an exclusive locks for a bit longer duration if that's not
actually necessary.

>> > - last_snapshot_lsn != GetXLogInsertRecPtr())
>> > +
>> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
>> >
>> > How the above check in patch suppose to work?
>> > I think it could so happen once bgwriter logs snapshot, both checkpoint
>> > and progressAt won't get updated any further and I think this check
will
>> > allow to log snapshots in such a case as well.
>>
>> The only purpose of this check is to do the following thing: if no WAL
>> activity has happened since last checkpoint, there is no need to log a
>> standby snapshot from the perspective of the bgwriter. In case the
>> system is idle, we want to skip logging that and avoid unnecessary
>> checkpoints because those records would have been generated. If the
>> system is not idle, or to put it in other words there has been at
>> least one record since the last checkpoint, we would log a standby
>> snapshot, enforcing as well a checkpoint to happen the next time
>> CreateCheckpoint() is gone through, and a standby snapshot is logged
>> as well after the checkpoint contents are flushed. I am not sure I
>> understand what you are getting at...
>
> Let me try to say again, suppose ControlFile->checkPoint is at
> 100 and latest value of progressAt returned by GetProgressRecPtr
> is 105, so the first time the above check happens, it will allow
> to log standby snapshot which is okay, now assume there is no
> activity, neither there is any checkpoint and again after
> LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
> gets executed, it will pass and log the standby snapshot which is
> *not* okay, because we don't want to log standby snapshot when
> there is no activity.  Am I missing something?

Ah, right. Sorry for not getting your point. OK now I got it. So basically
what the code does not is that if there is just one record after a
checkpoint we would log every 15s a standby snapshot until the next
checkpoint even if nothing happens, but we can save a bunch of standby
records. So your point is that we need to save the result of
GetProgressRecPtr() at each loop in the bgwriter when a standby snapshot is
taken, say in a static XLogRecPtr called last_progress_lsn. And then at the
next loop we compare it on the current result of GetProgressRecPtr(), so
the condition to check if a standby snapshot should be generated or not in
the bgwriter becomes that:
(now >= timeout &&
 GetLastCheckpointRecPtr() < current_progress_ptr &&
 last_progress_ptr < current_progress_ptr)
That's your point?
-- 
Michael


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-09 Thread Michael Paquier
On Wed, Feb 10, 2016 at 4:11 PM, Amit Kapila 
wrote:

> On Wed, Feb 10, 2016 at 12:16 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 10, 2016 at 12:41 PM, Amit Kapila 
>> wrote:
>> > On Wed, Feb 10, 2016 at 7:17 AM, Michael Paquier <
>> michael.paqu...@gmail.com>
>> > wrote:
>> >>
>> >
>> > Consider below code:
>> >
>> > /*
>> > + * Get progress before acquiring insert locks to shorten the locked
>> > + * section waiting ahead.
>> > + */
>> > + progress_lsn = GetProgressRecPtr();
>> > +
>> > + /*
>> >   * We must block concurrent insertions while examining insert state to
>> >   * determine the checkpoint REDO pointer.
>> >   */
>> >   WALInsertLockAcquireExclusive();
>> >
>> > In GetProgressRecPtr(), first it take WALInsertLocks one-by-one to
>> > retrieve the latest value of progressAt and then again it will take
>> > all the WALInsertLocks in WALInsertLockAcquireExclusive() and then
>> > do some work.  Now I think it is okay to retrieve the latest of
>> progressAt
>> > after WALInsertLockAcquireExclusive() as you don't need to take the
>> > locks again.
>>
>> Sure, that would be OK. Now I am not on board if this means to have the
>> checkpoint take an exclusive locks for a bit longer duration if that's not
>> actually necessary.
>>
>>
> Taking and releasing locks again and again (which is done in patch)
> matters much more than adding few instructions under it and I think
> if you would have written the code in-a-way as in patch in some of
> the critical path, it would have been regressed badly, but because
> checkpoint doesn't happen often, reproducing regression is difficult.
> Anyway, there is no-point in too much argument, I think you have
> understood what I wanted to say and if you think the way code is
> currently written is better, then lets leave as it is for somebody else
> to give suggestion on it.
>

Okay.


>
>
>
>>
>> >> > - last_snapshot_lsn != GetXLogInsertRecPtr())
>> >> > +
>> >> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
>> >> >
>> >> > How the above check in patch suppose to work?
>> >> > I think it could so happen once bgwriter logs snapshot, both
>> checkpoint
>> >> > and progressAt won't get updated any further and I think this check
>> will
>> >> > allow to log snapshots in such a case as well.
>> >>
>> >> The only purpose of this check is to do the following thing: if no WAL
>> >> activity has happened since last checkpoint, there is no need to log a
>> >> standby snapshot from the perspective of the bgwriter. In case the
>> >> system is idle, we want to skip logging that and avoid unnecessary
>> >> checkpoints because those records would have been generated. If the
>> >> system is not idle, or to put it in other words there has been at
>> >> least one record since the last checkpoint, we would log a standby
>> >> snapshot, enforcing as well a checkpoint to happen the next time
>> >> CreateCheckpoint() is gone through, and a standby snapshot is logged
>> >> as well after the checkpoint contents are flushed. I am not sure I
>> >> understand what you are getting at...
>> >
>> > Let me try to say again, suppose ControlFile->checkPoint is at
>> > 100 and latest value of progressAt returned by GetProgressRecPtr
>> > is 105, so the first time the above check happens, it will allow
>> > to log standby snapshot which is okay, now assume there is no
>> > activity, neither there is any checkpoint and again after
>> > LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
>> > gets executed, it will pass and log the standby snapshot which is
>> > *not* okay, because we don't want to log standby snapshot when
>> > there is no activity.  Am I missing something?
>>
>> Ah, right. Sorry for not getting your point. OK now I got it. So
>> basically what the code does not is that if there is just one record after
>> a checkpoint we would log every 15s a standby snapshot until the next
>> checkpoint even if nothing happens, but we can save a bunch of standby
>> records. So your point is that we need to save the result of
>> GetProgressRecPtr() at each loop in the bgwriter when a standby snapshot is
>> taken, say in a static XLogRecPtr called last_progress_lsn. And then at the
>> next loop we compare it on the current result of GetProgressRecPtr(), so
>> the condition to check if a standby snapshot should be generated or not in
>> the bgwriter becomes that:
>> (now >= timeout &&
>>  GetLastCheckpointRecPtr() < current_progress_ptr &&
>>  last_progress_ptr < current_progress_ptr)
>>
>
> Why do you think including checkpoint related check is
> required, how about something like below:
>
> (now >= timeout &&
>  last_progress_ptr < current_progress_ptr)
>

We need as well to be sure that no standby snapshots are logged just after
a checkpoint in this code path when last_progress_ptr is referring to an
LSN position *before* the last checkpoint LSN. There is already one
snapshot in CreateCheckpoint() that is 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Andres Freund
On 2016-02-08 15:58:49 +0900, Michael Paquier wrote:
> On Sun, Feb 7, 2016 at 2:49 AM, Andres Freund  wrote:
> > On 2016-02-06 22:03:15 +0900, Michael Paquier wrote:
> >> The patch attached will apply on master, on 9.5 there is one minor
> >> conflict. For older versions we will need another reworked patch.
> >
> > FWIW, I don't think we should backpatch this. It'd look noticeably
> > different in the back branches, and this isn't really a critical
> > issue. I think it makes sense to see this as an optimization.
> 
> The original bug report of this thread is based on 9.4, which is the
> first version where the standby snapshot in the bgwriter has been
> introduced. Knowing that most of the systems in the world are actually
> let idle. If those are running Postgres, I'd like to think that it is
> a waste. Now it is true that this is not a critical issue.

Unconvinced. For one, the equivalent behaviour for checkpoints has
existed since at least 9.0. Secondly, the problem only really appears if
you use archiving, configure a nondefault archive timeout, and that
timeout is significantly smaller than checkpoint timeout. And then the
cost is partially filled segment every now and then.  That just doesn't
stand into relation into adding some nontrivial code into backbranches.

> >> + if (holdingAllLocks)
> >> + {
> >> + int i;
> >> +
> >> + for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
> >> + WALInsertLocks[i].l.progressAt = StartPos;
> >
> > Why update all?
> 
> For consistency. I agree that updating one, say the first one is
> enough. I have added a comment explaining that as well.

We don't set valid values in WALInsertLockAcquireExclusive for all locks
either. I don't see consistency to what this is going to buy us.

>  /*
> + * XLogInsert
> + *
> + * A shorthand for XLogInsertExtended, to update the progress of WAL
> + * activity by default.
> + */
> +XLogRecPtr
> +XLogInsert(RmgrId rmid, uint8 info)
> +{
> + return XLogInsertExtended(rmid, info, true);
> +}
> +
> +/*
> + * XLogInsertExtended

I'm not really a fan. I'd rather change existing callers to add a
'flags' bitmask argument. Right now there can't really be XLogInserts()
in extension code, so that's pretty ok to change.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Michael Paquier
On Mon, Feb 8, 2016 at 6:18 PM, Andres Freund  wrote:
> On 2016-02-08 15:58:49 +0900, Michael Paquier wrote:
>> On Sun, Feb 7, 2016 at 2:49 AM, Andres Freund  wrote:
>>  /*
>> + * XLogInsert
>> + *
>> + * A shorthand for XLogInsertExtended, to update the progress of WAL
>> + * activity by default.
>> + */
>> +XLogRecPtr
>> +XLogInsert(RmgrId rmid, uint8 info)
>> +{
>> + return XLogInsertExtended(rmid, info, true);
>> +}
>> +
>> +/*
>> + * XLogInsertExtended
>
> I'm not really a fan. I'd rather change existing callers to add a
> 'flags' bitmask argument. Right now there can't really be XLogInserts()
> in extension code, so that's pretty ok to change.

So, you mean to extend directly XLogInsert() with an int flag that has
only one possible value for now, and update *all* the calls of
XLogInsert(). Well, I am detecting 218 calls of XLogInsert() in the
code, so that's not a small change. Honestly, I'd really rather have
the Extended() routine with default taking 0 for the integer flag, 0
meaning in the case of the progress that it gets updated. We have
similar routines for RangeVar, some in bufmgr.c and some other places.
So, which way do you want to take? I don't expect to win this argument
if that's a 1vs1 against a committer, just I would like to be sure
that I am not doing useless things. Time matters.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Amit Kapila
On Tue, Feb 9, 2016 at 5:37 AM, Michael Paquier 
wrote:
>
> On Mon, Feb 8, 2016 at 11:24 PM, Amit Kapila 
wrote:
> > On Mon, Feb 8, 2016 at 12:28 PM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >>
> >>
> >> >>   /*
> >> >> +  * Fetch the progress position before taking any WAL insert
lock.
> >> >> This
> >> >> +  * is normally an operation that does not take long, but
leaving
> >> >> this
> >> >> +  * lookup out of the section taken an exclusive lock saves a
> >> >> couple
> >> >> +  * of instructions.
> >> >> +  */
> >> >> + progress_lsn = GetProgressRecPtr();
> >> >
> >> > too long for my taste. How about:
> >> > /* get progress, before acuiring insert locks to shorten locked
section
> >> > */
> >>
> >> Check.
> >>
> >
> > What is the need of holding locks one-by-one during checkpoint when
> > we anyway are going to take lock on all the insertion slots.
>
> A couple of records can slip in while scanning the progress LSN
> through all the locks.
>

Do you see any benefit in allowing checkpoints for such cases considering
bgwriter will anyway take care of logging standby snapshot for such
cases?
I don't think there is any reasonable benefit by improving the situation of
idle-system check for checkpoint other than just including
standbysnapshot WAL record.  OTOH as checkpoint is not so seldom
operation, so allowing such a change should be okay, but I don't see
the need for same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Amit Kapila
On Mon, Feb 8, 2016 at 12:28 PM, Michael Paquier 
wrote:
>
>
> >>   /*
> >> +  * Fetch the progress position before taking any WAL insert
lock. This
> >> +  * is normally an operation that does not take long, but leaving
this
> >> +  * lookup out of the section taken an exclusive lock saves a
couple
> >> +  * of instructions.
> >> +  */
> >> + progress_lsn = GetProgressRecPtr();
> >
> > too long for my taste. How about:
> > /* get progress, before acuiring insert locks to shorten locked section
*/
>
> Check.
>

What is the need of holding locks one-by-one during checkpoint when
we anyway are going to take lock on all the insertion slots.

+ * to not rely on taking an exclusive lock an all the WAL insertion locks,

/an all/on all

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Michael Paquier
On Mon, Feb 8, 2016 at 11:24 PM, Amit Kapila  wrote:
> On Mon, Feb 8, 2016 at 12:28 PM, Michael Paquier 
> wrote:
>>
>>
>> >>   /*
>> >> +  * Fetch the progress position before taking any WAL insert lock.
>> >> This
>> >> +  * is normally an operation that does not take long, but leaving
>> >> this
>> >> +  * lookup out of the section taken an exclusive lock saves a
>> >> couple
>> >> +  * of instructions.
>> >> +  */
>> >> + progress_lsn = GetProgressRecPtr();
>> >
>> > too long for my taste. How about:
>> > /* get progress, before acuiring insert locks to shorten locked section
>> > */
>>
>> Check.
>>
>
> What is the need of holding locks one-by-one during checkpoint when
> we anyway are going to take lock on all the insertion slots.

A couple of records can slip in while scanning the progress LSN
through all the locks.

> + * to not rely on taking an exclusive lock an all the WAL insertion locks,
>
> /an all/on all

Nice catch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-07 Thread Michael Paquier
On Sun, Feb 7, 2016 at 2:49 AM, Andres Freund  wrote:
> On 2016-02-06 22:03:15 +0900, Michael Paquier wrote:
>> The patch attached will apply on master, on 9.5 there is one minor
>> conflict. For older versions we will need another reworked patch.
>
> FWIW, I don't think we should backpatch this. It'd look noticeably
> different in the back branches, and this isn't really a critical
> issue. I think it makes sense to see this as an optimization.

The original bug report of this thread is based on 9.4, which is the
first version where the standby snapshot in the bgwriter has been
introduced. Knowing that most of the systems in the world are actually
let idle. If those are running Postgres, I'd like to think that it is
a waste. Now it is true that this is not a critical issue.

>> + /*
>> +  * Update the progress LSN positions. At least one WAL insertion lock
>> +  * is already taken appropriately before doing that, and it is just 
>> more
>> +  * simple to do that here where WAL record data and type is at hand.
>> +  * The progress is set at the start position of the record tracked that
>> +  * is being added, making easier checkpoint progress tracking as the
>> +  * control file already saves the start LSN position of the last
>> +  * checkpoint run.
>> +  */
>> + if (!isStandbySnapshot)
>> + {
>
> I don't like isStandbySnapshot much, it seems better to do this more
> generally, via a flag passed down by the inserter.

Hm, OK. I think that the best way to deal with that is to introduce
XLogInsertExtended which wraps the existing XLogInsert(). By default
the flag is true. This will reduce the footprint of this patch on the
code base and will ease future backpatches on those code paths at
least down to 9.5 where WAL refactoring has been introduced.

>> + if (holdingAllLocks)
>> + {
>> + int i;
>> +
>> + for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
>> + WALInsertLocks[i].l.progressAt = StartPos;
>
> Why update all?

For consistency. I agree that updating one, say the first one is
enough. I have added a comment explaining that as well.

>>  /*
>> + * GetProgressRecPtr -- Returns the newest WAL activity position, aimed
>> + * at the last significant WAL activity, or in other words any activity
>> + * not referring to standby logging as of now. Finding the last activity
>> + * position is done by scanning each WAL insertion lock by taking directly
>> + * the light-weight lock associated to it.
>> + */
>> +XLogRecPtr
>> +GetProgressRecPtr(void)
>> +{
>> + XLogRecPtr  res = InvalidXLogRecPtr;
>> + int i;
>> +
>> + /*
>> +  * Look at the latest LSN position referring to the activity done by
>> +  * WAL insertion.
>> +  */
>> + for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
>> + {
>> + XLogRecPtr  progress_lsn;
>> +
>> + LWLockAcquire([i].l.lock, LW_EXCLUSIVE);
>> + progress_lsn = WALInsertLocks[i].l.progressAt;
>> + LWLockRelease([i].l.lock);
>
> Add a comment explaining that we a) need a lock because of the potential
> for "torn reads" on some platforms. b) need an exclusive one, because
> the insert lock logic currently only expects exclusive locks.

Check.

>>   /*
>> +  * Fetch the progress position before taking any WAL insert lock. This
>> +  * is normally an operation that does not take long, but leaving this
>> +  * lookup out of the section taken an exclusive lock saves a couple
>> +  * of instructions.
>> +  */
>> + progress_lsn = GetProgressRecPtr();
>
> too long for my taste. How about:
> /* get progress, before acuiring insert locks to shorten locked section */

Check.

> Looks much better now.

Thanks. Let's wrap that! An updated patch is attached.
-- 
Michael


hs-checkpoints-v5.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-06 Thread Michael Paquier
On Sun, Feb 7, 2016 at 2:49 AM, Andres Freund  wrote:
> On 2016-02-06 22:03:15 +0900, Michael Paquier wrote:
>> + /*
>> +  * Update the progress LSN positions. At least one WAL insertion lock
>> +  * is already taken appropriately before doing that, and it is just 
>> more
>> +  * simple to do that here where WAL record data and type is at hand.
>> +  * The progress is set at the start position of the record tracked that
>> +  * is being added, making easier checkpoint progress tracking as the
>> +  * control file already saves the start LSN position of the last
>> +  * checkpoint run.
>> +  */
>> + if (!isStandbySnapshot)
>> + {
>
> I don't like isStandbySnapshot much, it seems better to do this more
> generally, via a flag passed down by the inserter.

Instead of updating every single call of XLogInsert() in the system,
what do you think about introducing a new routine XLogInsertExtended()
that would have this optional flag? This would wrap the existing
XLogInsert() and pass the flag to XLogInsertRecord().
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-06 Thread Andres Freund
On 2016-02-06 22:03:15 +0900, Michael Paquier wrote:
> The patch attached will apply on master, on 9.5 there is one minor
> conflict. For older versions we will need another reworked patch.

FWIW, I don't think we should backpatch this. It'd look noticeably
different in the back branches, and this isn't really a critical
issue. I think it makes sense to see this as an optimization.

> + /*
> +  * Update the progress LSN positions. At least one WAL insertion lock
> +  * is already taken appropriately before doing that, and it is just more
> +  * simple to do that here where WAL record data and type is at hand.
> +  * The progress is set at the start position of the record tracked that
> +  * is being added, making easier checkpoint progress tracking as the
> +  * control file already saves the start LSN position of the last
> +  * checkpoint run.
> +  */
> + if (!isStandbySnapshot)
> + {

I don't like isStandbySnapshot much, it seems better to do this more
generally, via a flag passed down by the inserter.

> + if (holdingAllLocks)
> + {
> + int i;
> +
> + for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
> + WALInsertLocks[i].l.progressAt = StartPos;

Why update all?

>  /*
> + * GetProgressRecPtr -- Returns the newest WAL activity position, aimed
> + * at the last significant WAL activity, or in other words any activity
> + * not referring to standby logging as of now. Finding the last activity
> + * position is done by scanning each WAL insertion lock by taking directly
> + * the light-weight lock associated to it.
> + */
> +XLogRecPtr
> +GetProgressRecPtr(void)
> +{
> + XLogRecPtr  res = InvalidXLogRecPtr;
> + int i;
> +
> + /*
> +  * Look at the latest LSN position referring to the activity done by
> +  * WAL insertion.
> +  */
> + for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
> + {
> + XLogRecPtr  progress_lsn;
> +
> + LWLockAcquire([i].l.lock, LW_EXCLUSIVE);
> + progress_lsn = WALInsertLocks[i].l.progressAt;
> + LWLockRelease([i].l.lock);

Add a comment explaining that we a) need a lock because of the potential
for "torn reads" on some platforms. b) need an exclusive one, because
the insert lock logic currently only expects exclusive locks.

>   /*
> +  * Fetch the progress position before taking any WAL insert lock. This
> +  * is normally an operation that does not take long, but leaving this
> +  * lookup out of the section taken an exclusive lock saves a couple
> +  * of instructions.
> +  */
> + progress_lsn = GetProgressRecPtr();

too long for my taste. How about:
/* get progress, before acuiring insert locks to shorten locked section */


Looks much better now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-06 Thread Michael Paquier
On Thu, Feb 4, 2016 at 6:38 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 4:10 PM, Andres Freund  wrote:
>> On 2016-02-04 18:21:41 +0530, Amit Kapila wrote:
>>> I think generally it is good idea, but one thing worth a thought is that
>>> by doing so, we need to acquire all WAL Insertion locks every
>>> LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for
>>> every slot, do you think it is matter of concern in any way for write
>>> workloads or it won't effect as we need to do this periodically?
>>
>> Michael and I just had an in-person discussion, and one of the topics
>> was that. The plan was basically to adapt the patch to:
>> 1) Store the progress lsn inside the wal insert lock
>> 2) Change the HasActivity API to return an the last LSN at which there
>>was activity, instead of a boolean.
>> 3) Individually acquire each insert locks's lwlock to get it's progress
>>LSN, but not the exclusive insert lock. We need the lwllock to avoid
>>a torn 8byte read on some platforms.
>
> 4) Switch the condition to decide if a checkpoint should be skipped
> using the last activity position compared with ProcLastRecPtr in
> CreateCheckpoint to see if any activity has occurred since the
> checkpoint record was inserted, and do not care anymore if the
> previous record and current record are on different segments. This
> would basically work.

OK, attached is a patch that I believe addresses those issues. The
patch still has a couple of LOG entries that we had better remove in
the version that gets pushed, but they have proved to be useful for me
when testing the patch with a low checkpoint_timeout value to see if
checkpoints are properly skipped on an idle system. I found myself
adding another routine called GetLastCheckpointRecPtr() for the
bgwriter because ControlFile is not declared externally even if it is
in shared memory. I think that's better this way.

The original bug report referred to a low archive_timeout causing
standby snapshots to be logged when segments are switched. So we may
want at the end to not update the progress LSN for segment switch
records as well, but I have let that out of the patch for the time
being to address the primary concern of unnecessary checkpoints for
wal_level >= hs. We could address it with a later patch (planning to
do so), let's keep a step-by-step approach.

The patch attached will apply on master, on 9.5 there is one minor
conflict. For older versions we will need another reworked patch. I am
fine to produce those once we are fine with the shape of what gets
into master and 9.5. 9.4 uses WAL insert locks so things are rather
similar. For ~9.3, I think that we are going to need a single variable
in XLogCtl or similar to track the progress, but I have not looked
into that in details yet.
-- 
Michael


hs-checkpoints-v4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-04 Thread Amit Kapila
On Wed, Feb 3, 2016 at 3:08 PM, Andres Freund  wrote:
>
> On 2016-02-02 10:12:25 +0530, Amit Kapila wrote:
> > @@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
> >   if ((flags & (CHECKPOINT_IS_SHUTDOWN |
> > CHECKPOINT_END_OF_RECOVERY |
> >CHECKPOINT_FORCE)) == 0)
> >   {
> > - if
> > (prevPtr == ControlFile->checkPointCopy.redo &&
> > + if (GetProgressRecPtr() == prevPtr &&
> >
> > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
> >   {
> >
> > I think such a check won't consider all the WAL-activity happened
> > during checkpoint (after checkpoint start, till it finishes) which was
> > not the intention of this code without your patch.
>
> Precisely.
>
> > I think both this and previous patch (hs-checkpoints-v1 ) approach
> > won't fix the issue in all kind of scenario's.
>
> Agreed.
>
> > Let me try to explain what I think should fix this issue based on
> > discussion above, suggestions by Andres and some of my own
> > thoughts:
>
> > Have a new variable in WALInsertLock that would indicate the last
> > insertion position (last_insert_pos) we write to after holding that
lock.
> > Ensure that we don't update last_insert_pos while logging standby
> > snapshot (simple way is to pass a flag in XLogInsert or distinguish
> > it based on type of record (XLOG_RUNNING_XACTS) or if you can
> > think of a more smarter way).  Now both during checkpoint and
> > in bgwriter, to find out whether there is any activity since last
> > time, we need to check all the WALInsertLocks for latest insert position
> > (by referring last_insert_pos) and compare it with the latest position
> > we have noted during last such check and decide whether to proceed
> > or not based on comparison result.  If you think it is not adequate to
> > store last_insert_pos in WALInsertLock, then we can think of having
> > it in PGPROC.
>
> Yes, that's pretty much what I was thinking of.
>

I think generally it is good idea, but one thing worth a thought is that
by doing so, we need to acquire all WAL Insertion locks every
LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for
every slot, do you think it is matter of concern in any way for write
workloads or it won't effect as we need to do this periodically?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-04 Thread Amit Kapila
On Thu, Feb 4, 2016 at 6:40 PM, Andres Freund  wrote:
>
> On 2016-02-04 18:21:41 +0530, Amit Kapila wrote:
> > I think generally it is good idea, but one thing worth a thought is that
> > by doing so, we need to acquire all WAL Insertion locks every
> > LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for
> > every slot, do you think it is matter of concern in any way for write
> > workloads or it won't effect as we need to do this periodically?
>
> Michael and I just had an in-person discussion, and one of the topics
> was that. The plan was basically to adapt the patch to:
> 1) Store the progress lsn inside the wal insert lock
> 2) Change the HasActivity API to return an the last LSN at which there
>was activity, instead of a boolean.
> 2) Individually acquire each insert locks's lwlock to get it's progress
>LSN, but not the exclusive insert lock. We need the lwllock to avoid
>a torn 8byte read on some platforms.
>
> I think that mostly should address your concerns?
>

Yes, this sounds better and in-anycase we can do some benchmarks
to verify the same once patch is in shape.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-04 Thread Andres Freund
On 2016-02-04 18:21:41 +0530, Amit Kapila wrote:
> I think generally it is good idea, but one thing worth a thought is that
> by doing so, we need to acquire all WAL Insertion locks every
> LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for
> every slot, do you think it is matter of concern in any way for write
> workloads or it won't effect as we need to do this periodically?

Michael and I just had an in-person discussion, and one of the topics
was that. The plan was basically to adapt the patch to:
1) Store the progress lsn inside the wal insert lock
2) Change the HasActivity API to return an the last LSN at which there
   was activity, instead of a boolean.
2) Individually acquire each insert locks's lwlock to get it's progress
   LSN, but not the exclusive insert lock. We need the lwllock to avoid
   a torn 8byte read on some platforms.

I think that mostly should address your concerns?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-04 Thread Michael Paquier
On Thu, Feb 4, 2016 at 4:10 PM, Andres Freund  wrote:
> On 2016-02-04 18:21:41 +0530, Amit Kapila wrote:
>> I think generally it is good idea, but one thing worth a thought is that
>> by doing so, we need to acquire all WAL Insertion locks every
>> LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for
>> every slot, do you think it is matter of concern in any way for write
>> workloads or it won't effect as we need to do this periodically?
>
> Michael and I just had an in-person discussion, and one of the topics
> was that. The plan was basically to adapt the patch to:
> 1) Store the progress lsn inside the wal insert lock
> 2) Change the HasActivity API to return an the last LSN at which there
>was activity, instead of a boolean.
> 3) Individually acquire each insert locks's lwlock to get it's progress
>LSN, but not the exclusive insert lock. We need the lwllock to avoid
>a torn 8byte read on some platforms.

4) Switch the condition to decide if a checkpoint should be skipped
using the last activity position compared with ProcLastRecPtr in
CreateCheckpoint to see if any activity has occurred since the
checkpoint record was inserted, and do not care anymore if the
previous record and current record are on different segments. This
would basically work.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-03 Thread Michael Paquier
On Tue, Feb 2, 2016 at 1:42 PM, Amit Kapila  wrote:
> On Sat, Jan 30, 2016 at 7:38 PM, Michael Paquier 
> wrote:
>>
>> On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote:
>> > Well, to put it short, I am just trying to find a way to make the
>> > backend skip unnecessary checkpoints on an idle system, which results
>> > in the following WAL pattern if system is completely idle:
>> > CHECKPOINT_ONLINE
>> > RUNNING_XACTS
>> > RUNNING_XACTS
>> > [etc..]
>> >
>> > The thing is that I am lost with the meaning of this condition to
>> > decide if a checkpoint should be skipped or not:
>> > if (prevPtr == ControlFile->checkPointCopy.redo &&
>> > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
>> > {
>> > WALInsertLockRelease();
>> > LWLockRelease(CheckpointLock);
>> > return;
>> > }
>> > As at least one standby snapshot is logged before the checkpoint
>> > record, the redo position is never going to match the previous insert
>> > LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
>> > Skipping such unnecessary checkpoints is what you would like to
>> > address, no? Because that's what I would like to do here first. And
>> > once we got that right, we could think about addressing the case where
>> > WAL segments are forcibly archived for idle systems.
>>
>> I have put a bit more of brain power into that, and finished with the
>> patch attached. A new field called chkpProgressLSN is attached to
>> XLogCtl, which is to the current insert position of the checkpoint
>> when wal_level <= archive, or the LSN position of the standby snapshot
>> taken after a checkpoint. The bgwriter code is modified as well so as
>> it uses this progress LSN and compares it with the current insert LSN
>> to determine if a standby snapshot should be logged or not. On an
>> instance of Postgres completely idle, no useless checkpoints, and no
>> useless standby snapshots are generated anymore.
>>
>
>
> @@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
>   if ((flags & (CHECKPOINT_IS_SHUTDOWN |
> CHECKPOINT_END_OF_RECOVERY |
>CHECKPOINT_FORCE)) == 0)
>   {
> - if
> (prevPtr == ControlFile->checkPointCopy.redo &&
> + if (GetProgressRecPtr() == prevPtr &&
>
> prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
>   {
>
> I think such a check won't consider all the WAL-activity happened
> during checkpoint (after checkpoint start, till it finishes) which was
> not the intention of this code without your patch.

Yes, that's true, in v2 this progress LSN can be updated in two ways:
- At the position where checkpoint has begun
- At the position where standby snapshot was taken after a checkpoint has run.
Honestly, even if we log the snapshot for example before
CheckpointGuts(), that's not going to make it... The main issue here
is that there will be a snapshot after this checkpoint, so the
existing condition is a rather broken concept already when wal_level
>= hot_standby because the redo pointer is never going to match the
previous LSN pointer. Another idea would be to ensure that the
snapshot is logged just after the redo pointer, this would be
reliable.

> I think both this and previous patch (hs-checkpoints-v1) approach
> won't fix the issue in all kind of scenario's.
>
> Let me try to explain what I think should fix this issue based on
> discussion above, suggestions by Andres and some of my own
> thoughts:
>
> Have a new variable in WALInsertLock that would indicate the last
> insertion position (last_insert_pos) we write to after holding that lock.
> Ensure that we don't update last_insert_pos while logging standby
> snapshot (simple way is to pass a flag in XLogInsert or distinguish
> it based on type of record (XLOG_RUNNING_XACTS) or if you can
> think of a more smarter way).

Simplest way is in XLogInsertRecord, the record data is available
there, there is for example some logic related to segment switches.

> Now both during checkpoint and
> in bgwriter, to find out whether there is any activity since last
> time, we need to check all the WALInsertLocks for latest insert position
> (by referring last_insert_pos) and compare it with the latest position
> we have noted during last such check and decide whether to proceed
> or not based on comparison result.  If you think it is not adequate to
> store last_insert_pos in WALInsertLock, then we can think of having
> it in PGPROC.

So the progress check is used when deciding if a checkpoint should be
skipped or not, and when deciding if a standby snapshot should be
taken by the bgwriter? When bgwriter logs a snapshot, it will update
as well the last LSN found (which would be a single variable in for
example XLogCtlData, updated from the data taken from the WAL insert
slots). But there is a problem here: if there is no activity since the
last bgwriter snapshot, the next checkpoint would be skipped as well.
It seems to me that this is not 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-03 Thread Andres Freund
On 2016-02-02 10:12:25 +0530, Amit Kapila wrote:
> @@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
>   if ((flags & (CHECKPOINT_IS_SHUTDOWN |
> CHECKPOINT_END_OF_RECOVERY |
>CHECKPOINT_FORCE)) == 0)
>   {
> - if
> (prevPtr == ControlFile->checkPointCopy.redo &&
> + if (GetProgressRecPtr() == prevPtr &&
> 
> prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
>   {
> 
> I think such a check won't consider all the WAL-activity happened
> during checkpoint (after checkpoint start, till it finishes) which was
> not the intention of this code without your patch.

Precisely.

> I think both this and previous patch (hs-checkpoints-v1 ) approach
> won't fix the issue in all kind of scenario's.

Agreed.

> Let me try to explain what I think should fix this issue based on
> discussion above, suggestions by Andres and some of my own
> thoughts:

> Have a new variable in WALInsertLock that would indicate the last
> insertion position (last_insert_pos) we write to after holding that lock.
> Ensure that we don't update last_insert_pos while logging standby
> snapshot (simple way is to pass a flag in XLogInsert or distinguish
> it based on type of record (XLOG_RUNNING_XACTS) or if you can
> think of a more smarter way).  Now both during checkpoint and
> in bgwriter, to find out whether there is any activity since last
> time, we need to check all the WALInsertLocks for latest insert position
> (by referring last_insert_pos) and compare it with the latest position
> we have noted during last such check and decide whether to proceed
> or not based on comparison result.  If you think it is not adequate to
> store last_insert_pos in WALInsertLock, then we can think of having
> it in PGPROC.

Yes, that's pretty much what I was thinking of.

> Yet another idea that occurs to me this morning is that why not
> have a variable in shared memory in XLogCtlInsert or XLogCtl
> similar to CurrBytePos/PrevBytePos which will be updated on
> each XLOGInsert apart from the XLOGInsert for standby snapshots
> and use that in a patch somewhat close to what you have in
> hs-checkpoints-v1.

That'll require holding locks longer...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-01 Thread Amit Kapila
On Sat, Jan 30, 2016 at 7:38 PM, Michael Paquier 
wrote:
>
> On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote:
> > Well, to put it short, I am just trying to find a way to make the
> > backend skip unnecessary checkpoints on an idle system, which results
> > in the following WAL pattern if system is completely idle:
> > CHECKPOINT_ONLINE
> > RUNNING_XACTS
> > RUNNING_XACTS
> > [etc..]
> >
> > The thing is that I am lost with the meaning of this condition to
> > decide if a checkpoint should be skipped or not:
> > if (prevPtr == ControlFile->checkPointCopy.redo &&
> > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
> > {
> > WALInsertLockRelease();
> > LWLockRelease(CheckpointLock);
> > return;
> > }
> > As at least one standby snapshot is logged before the checkpoint
> > record, the redo position is never going to match the previous insert
> > LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
> > Skipping such unnecessary checkpoints is what you would like to
> > address, no? Because that's what I would like to do here first. And
> > once we got that right, we could think about addressing the case where
> > WAL segments are forcibly archived for idle systems.
>
> I have put a bit more of brain power into that, and finished with the
> patch attached. A new field called chkpProgressLSN is attached to
> XLogCtl, which is to the current insert position of the checkpoint
> when wal_level <= archive, or the LSN position of the standby snapshot
> taken after a checkpoint. The bgwriter code is modified as well so as
> it uses this progress LSN and compares it with the current insert LSN
> to determine if a standby snapshot should be logged or not. On an
> instance of Postgres completely idle, no useless checkpoints, and no
> useless standby snapshots are generated anymore.
>


@@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
  if ((flags & (CHECKPOINT_IS_SHUTDOWN |
CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
  {
- if
(prevPtr == ControlFile->checkPointCopy.redo &&
+ if (GetProgressRecPtr() == prevPtr &&

prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
  {

I think such a check won't consider all the WAL-activity happened
during checkpoint (after checkpoint start, till it finishes) which was
not the intention of this code without your patch.

I think both this and previous patch (hs-checkpoints-v1 ) approach
won't fix the issue in all kind of scenario's.

Let me try to explain what I think should fix this issue based on
discussion above, suggestions by Andres and some of my own
thoughts:

Have a new variable in WALInsertLock that would indicate the last
insertion position (last_insert_pos) we write to after holding that lock.
Ensure that we don't update last_insert_pos while logging standby
snapshot (simple way is to pass a flag in XLogInsert or distinguish
it based on type of record (XLOG_RUNNING_XACTS) or if you can
think of a more smarter way).  Now both during checkpoint and
in bgwriter, to find out whether there is any activity since last
time, we need to check all the WALInsertLocks for latest insert position
(by referring last_insert_pos) and compare it with the latest position
we have noted during last such check and decide whether to proceed
or not based on comparison result.  If you think it is not adequate to
store last_insert_pos in WALInsertLock, then we can think of having
it in PGPROC.

Yet another idea that occurs to me this morning is that why not
have a variable in shared memory in XLogCtlInsert or XLogCtl
similar to CurrBytePos/PrevBytePos which will be updated on
each XLOGInsert apart from the XLOGInsert for standby snapshots
and use that in a patch somewhat close to what you have in
hs-checkpoints-v1.

One related point is don't we need to bother about activity on
unlogged relations for checkpoints to occur, considering the
case when the only activity happened after last checkpoint
is on unlogged relations?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-31 Thread Michael Paquier
On Sat, Jan 30, 2016 at 11:08 PM, Michael Paquier
 wrote:
> On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote:
>> Well, to put it short, I am just trying to find a way to make the
>> backend skip unnecessary checkpoints on an idle system, which results
>> in the following WAL pattern if system is completely idle:
>> CHECKPOINT_ONLINE
>> RUNNING_XACTS
>> RUNNING_XACTS
>> [etc..]
>>
>> The thing is that I am lost with the meaning of this condition to
>> decide if a checkpoint should be skipped or not:
>> if (prevPtr == ControlFile->checkPointCopy.redo &&
>> prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
>> {
>> WALInsertLockRelease();
>> LWLockRelease(CheckpointLock);
>> return;
>> }
>> As at least one standby snapshot is logged before the checkpoint
>> record, the redo position is never going to match the previous insert
>> LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
>> Skipping such unnecessary checkpoints is what you would like to
>> address, no? Because that's what I would like to do here first. And
>> once we got that right, we could think about addressing the case where
>> WAL segments are forcibly archived for idle systems.
>
> I have put a bit more of brain power into that, and finished with the
> patch attached. A new field called chkpProgressLSN is attached to
> XLogCtl, which is to the current insert position of the checkpoint
> when wal_level <= archive, or the LSN position of the standby snapshot
> taken after a checkpoint. The bgwriter code is modified as well so as
> it uses this progress LSN and compares it with the current insert LSN
> to determine if a standby snapshot should be logged or not. On an
> instance of Postgres completely idle, no useless checkpoints, and no
> useless standby snapshots are generated anymore.

Attached is an additional patch that I have used for my tests (should
have sent that yesterday). This just shows up a couple of logs in the
bgwriter and around checkpoint to see in more details their activity
with not that much noise.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ccef3f0..e51b97e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8259,12 +8259,20 @@ CreateCheckPoint(int flags)
 	 * better if you don't need to keep two WAL segments around to recover the
 	 * checkpoint.
 	 */
+	elog(LOG, "prevPtr %X/%X, curInsert %X/%X, progress %X/%X, redo %X/%X",
+		 (uint32) (prevPtr >> 32), (uint32) prevPtr,
+		 (uint32) (curInsert >> 32), (uint32) curInsert,
+		 (uint32) (GetProgressRecPtr() >> 32), (uint32) GetProgressRecPtr(),
+		 (uint32) (ControlFile->checkPointCopy.redo >> 32),
+		 (uint32) ControlFile->checkPointCopy.redo);
 	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
 	{
+		elog(LOG, "Not a forced or shutdown checkpoint");
 		if (GetProgressRecPtr() == prevPtr &&
 			prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
 		{
+			elog(LOG, "Checkpoint is skipped");
 			WALInsertLockRelease();
 			LWLockRelease(CheckpointLock);
 			END_CRIT_SECTION();
@@ -8437,7 +8445,11 @@ CreateCheckPoint(int flags)
 	 * snapshots need to be logged, and skip them on idle systems.
 	 */
 	if (!shutdown && XLogStandbyInfoActive())
+	{
 		progress_lsn = LogStandbySnapshot();
+		elog(LOG, "snapshot taken by checkpoint %X/%X",
+			 (uint32) (progress_lsn >> 32), (uint32) progress_lsn);
+	}
 
 	START_CRIT_SECTION();
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 22374b0..d55e446 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -308,17 +308,25 @@ BackgroundWriterMain(void)
 		{
 			TimestampTz timeout = 0;
 			TimestampTz now = GetCurrentTimestamp();
+			XLogRecPtr	progress_lsn = GetProgressRecPtr();
+			XLogRecPtr	insert_lsn = GetInsertRecPtr();
 
 			timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
   LOG_SNAPSHOT_INTERVAL_MS);
 
+			elog(LOG, "bgwriter progress_lsn %X/%X, insert_lsn %X/%X",
+ (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
+ (uint32) (insert_lsn >> 32), (uint32) insert_lsn);
 			/*
 			 * only log if enough time has passed.
 			 */
 			if (now >= timeout &&
-GetProgressRecPtr() != GetInsertRecPtr())
+progress_lsn != insert_lsn)
 			{
-(void) LogStandbySnapshot();
+XLogRecPtr lsn;
+lsn = LogStandbySnapshot();
+elog(LOG, "snapshot taken by bgwriter %X/%X",
+	 (uint32) (lsn >> 32), (uint32) lsn);
 last_snapshot_ts = now;
 			}
 		}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a2846c4..ccef3f0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -527,6 +527,8 @@ typedef struct XLogCtlData
 	TransactionId ckptXid;
 	XLogRecPtr	asyncXactLSN;	/* LSN of newest 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-30 Thread Michael Paquier
On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote:
> Well, to put it short, I am just trying to find a way to make the
> backend skip unnecessary checkpoints on an idle system, which results
> in the following WAL pattern if system is completely idle:
> CHECKPOINT_ONLINE
> RUNNING_XACTS
> RUNNING_XACTS
> [etc..]
>
> The thing is that I am lost with the meaning of this condition to
> decide if a checkpoint should be skipped or not:
> if (prevPtr == ControlFile->checkPointCopy.redo &&
> prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
> {
> WALInsertLockRelease();
> LWLockRelease(CheckpointLock);
> return;
> }
> As at least one standby snapshot is logged before the checkpoint
> record, the redo position is never going to match the previous insert
> LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
> Skipping such unnecessary checkpoints is what you would like to
> address, no? Because that's what I would like to do here first. And
> once we got that right, we could think about addressing the case where
> WAL segments are forcibly archived for idle systems.

I have put a bit more of brain power into that, and finished with the
patch attached. A new field called chkpProgressLSN is attached to
XLogCtl, which is to the current insert position of the checkpoint
when wal_level <= archive, or the LSN position of the standby snapshot
taken after a checkpoint. The bgwriter code is modified as well so as
it uses this progress LSN and compares it with the current insert LSN
to determine if a standby snapshot should be logged or not. On an
instance of Postgres completely idle, no useless checkpoints, and no
useless standby snapshots are generated anymore.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a2846c4..ccef3f0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -527,6 +527,8 @@ typedef struct XLogCtlData
 	TransactionId ckptXid;
 	XLogRecPtr	asyncXactLSN;	/* LSN of newest async commit/abort */
 	XLogRecPtr	replicationSlotMinLSN;	/* oldest LSN needed by any slot */
+	XLogRecPtr	ckptProgressLSN;	/* LSN tracking progress of checkpoint
+	 * and standby snapshots */
 
 	XLogSegNo	lastRemovedSegNo;		/* latest removed/recycled XLOG
 		 * segment */
@@ -6314,6 +6316,7 @@ StartupXLOG(void)
 	 checkPoint.newestCommitTsXid);
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
+	XLogCtl->ckptProgressLSN = checkPoint.redo;
 
 	/*
 	 * Initialize replication slots, before there's a chance to remove
@@ -7869,6 +7872,22 @@ GetFlushRecPtr(void)
 }
 
 /*
+ * GetProgressRecPtr -- Returns the reference position of last checkpoint,
+ * taking into account standby snapshots generated by checkpoints.
+ */
+XLogRecPtr
+GetProgressRecPtr(void)
+{
+	XLogRecPtr	progress_lsn;
+
+	SpinLockAcquire(>info_lck);
+	progress_lsn = XLogCtl->ckptProgressLSN;
+	SpinLockRelease(>info_lck);
+
+	return progress_lsn;
+}
+
+/*
  * Get the time of the last xlog segment switch
  */
 pg_time_t
@@ -8133,6 +8152,7 @@ CreateCheckPoint(int flags)
 	XLogRecPtr	PriorRedoPtr;
 	XLogRecPtr	curInsert;
 	XLogRecPtr	prevPtr;
+	XLogRecPtr	progress_lsn;
 	VirtualTransactionId *vxids;
 	int			nvxids;
 
@@ -8214,10 +8234,13 @@ CreateCheckPoint(int flags)
 
 	/*
 	 * We must block concurrent insertions while examining insert state to
-	 * determine the checkpoint REDO pointer.
+	 * determine the checkpoint REDO pointer. The progress LSN of this
+	 * to-be-created checkpoint is for now initialized as the current insert
+	 * position in WAL. This will get updated later on depending on if
+	 * a standby snapshot is logged.
 	 */
 	WALInsertLockAcquireExclusive();
-	curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
+	curInsert = progress_lsn = XLogBytePosToRecPtr(Insert->CurrBytePos);
 	prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos);
 
 	/*
@@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
 	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
 	{
-		if (prevPtr == ControlFile->checkPointCopy.redo &&
+		if (GetProgressRecPtr() == prevPtr &&
 			prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
 		{
 			WALInsertLockRelease();
@@ -8406,9 +8429,15 @@ CreateCheckPoint(int flags)
 	 *
 	 * If we are shutting down, or Startup process is completing crash
 	 * recovery we don't need to write running xact data.
+	 *
+	 * progress_lsn is aimed at tracking the WAL progress that happens
+	 * since this checkpoint. If a standby snapshot is logged, its record
+	 * postion need to be taken into account in the progress LSN position
+	 * that is used to evaluate if checkpoints are necessary or standby
+	 * snapshots need to be logged, and skip them on idle systems.
 	 */
 	if (!shutdown && XLogStandbyInfoActive())
-		LogStandbySnapshot();
+		progress_lsn = LogStandbySnapshot();
 
 	

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-29 Thread Michael Paquier
On Fri, Jan 29, 2016 at 5:13 PM, Andres Freund  wrote:
> On 2016-01-28 16:40:13 +0900, Michael Paquier wrote:
>> OK, so as a first step and after thinking about the whole for a while,
>> I have finished with the patch attached. This patch is aimed at
>> avoiding unnecessary checkpoints on idle systems when wal_level >=
>> hot_standby by centralizing the check to look at if there has some WAL
>> activity since the last checkpoint.
>
> That's not what I suggested.
>
>>  /*
>> + * XLOGHasActivity -- Check if XLOG had some significant activity or
>> + * if it is idle lately. This is primarily used to check if there has
>> + * been some WAL activity since the last checkpoint that occurred on
>> + * system to control the generaton of XLOG record related to standbys.
>> + */
>> +bool
>> +XLOGHasActivity(void)
>> +{
>> + XLogCtlInsert *Insert = >Insert;
>> + XLogRecPtr  redo_lsn = ControlFile->checkPointCopy.redo;
>> + uint64  prev_bytepos;
>> +
>> + /* Check if any activity has happened since last checkpoint */
>> + SpinLockAcquire(>insertpos_lck);
>> + prev_bytepos = Insert->PrevBytePos;
>> + SpinLockRelease(>insertpos_lck);
>> +
>> + return XLogBytePosToRecPtr(prev_bytepos) == redo_lsn;
>> +}
>>
>
> How should this actually should work reliably, given we *want* to have
> included a standby snapshot after the last checkpoint?
>
> In CreateCheckPoint() we have
> /*
>  * Here we update the shared RedoRecPtr for future XLogInsert calls; 
> this
>  * must be done while holding all the insertion locks.
>  *
> RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
>
> computing the next redo rec ptr and then
> if (!shutdown && XLogStandbyInfoActive())
> LogStandbySnapshot();
> before the final
> XLogRegisterData((char *) (), sizeof(checkPoint));
> recptr = XLogInsert(RM_XLOG_ID,
> shutdown ? 
> XLOG_CHECKPOINT_SHUTDOWN :
> XLOG_CHECKPOINT_ONLINE);
>
> so the above condition doesn't really something we want to rely on. Am I
> missing what you're trying to do?

Well, to put it short, I am just trying to find a way to make the
backend skip unnecessary checkpoints on an idle system, which results
in the following WAL pattern if system is completely idle:
CHECKPOINT_ONLINE
RUNNING_XACTS
RUNNING_XACTS
[etc..]

The thing is that I am lost with the meaning of this condition to
decide if a checkpoint should be skipped or not:
if (prevPtr == ControlFile->checkPointCopy.redo &&
prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
{
WALInsertLockRelease();
LWLockRelease(CheckpointLock);
return;
}
As at least one standby snapshot is logged before the checkpoint
record, the redo position is never going to match the previous insert
LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
Skipping such unnecessary checkpoints is what you would like to
address, no? Because that's what I would like to do here first. And
once we got that right, we could think about addressing the case where
WAL segments are forcibly archived for idle systems.

Perhaps I simply do not grab the meaning of what you defined as
"relevant LSN" upthread. As I understand it, it would be a LSN
position marker in shared memory that we could use for some
decision-making regarding if a checkpoint or a standby snapshot record
should be generated. I have for example played with an array in
XLogCtl->Insert made of NUM_XLOGINSERT_LOCKS elements, each one
protected by one insert lock that tracked the last insert LSN of a
checkpoint record (I did that in XLogInsertRecord because XLogRecData
makes that easy), then I used that to do this decision making in
CreateCheckpoint() and in the bgwriter to decide if a standby snapshot
should be logged or not.
But then I noticed that it would be actually easier and cleaner to do
this decision making using directly the checkpoint redo LSN and
compare that with the previous insert LSN, then use that for the
bgwriter code path, resulting in the WIP I just sent previously.

Is what I am trying to do here biased with what you have in mind? Do
we have a different meaning of the problem in mind? What are your
ideas regarding this relevant LSN?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-29 Thread Andres Freund
On 2016-01-28 16:40:13 +0900, Michael Paquier wrote:
> OK, so as a first step and after thinking about the whole for a while,
> I have finished with the patch attached. This patch is aimed at
> avoiding unnecessary checkpoints on idle systems when wal_level >=
> hot_standby by centralizing the check to look at if there has some WAL
> activity since the last checkpoint.

That's not what I suggested.

>  /*
> + * XLOGHasActivity -- Check if XLOG had some significant activity or
> + * if it is idle lately. This is primarily used to check if there has
> + * been some WAL activity since the last checkpoint that occurred on
> + * system to control the generaton of XLOG record related to standbys.
> + */
> +bool
> +XLOGHasActivity(void)
> +{
> + XLogCtlInsert *Insert = >Insert;
> + XLogRecPtr  redo_lsn = ControlFile->checkPointCopy.redo;
> + uint64  prev_bytepos;
> +
> + /* Check if any activity has happened since last checkpoint */
> + SpinLockAcquire(>insertpos_lck);
> + prev_bytepos = Insert->PrevBytePos;
> + SpinLockRelease(>insertpos_lck);
> +
> + return XLogBytePosToRecPtr(prev_bytepos) == redo_lsn;
> +}
>

How should this actually should work reliably, given we *want* to have
included a standby snapshot after the last checkpoint?

In CreateCheckPoint() we have
/*
 * Here we update the shared RedoRecPtr for future XLogInsert calls; 
this
 * must be done while holding all the insertion locks.
 *
RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;

computing the next redo rec ptr and then
if (!shutdown && XLogStandbyInfoActive())
LogStandbySnapshot();
before the final
XLogRegisterData((char *) (), sizeof(checkPoint));
recptr = XLogInsert(RM_XLOG_ID,
shutdown ? 
XLOG_CHECKPOINT_SHUTDOWN :
XLOG_CHECKPOINT_ONLINE);

so the above condition doesn't really something we want to rely on. Am I
missing what you're trying to do?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-27 Thread Michael Paquier
On Tue, Jan 19, 2016 at 5:10 PM, Amit Kapila  wrote:
> On Tue, Jan 19, 2016 at 12:41 PM, Michael Paquier
>  wrote:
>>
>> On Mon, Jan 18, 2016 at 10:19 PM, Amit Kapila 
>> wrote:
>> > On Mon, Jan 18, 2016 at 10:54 AM, Michael Paquier
>> >  wrote:
>> >> Yes, the thing is that, as mentioned at the beginning of the thread, a
>> >> low value of archive_timeout causes a segment to be forcibly switched
>> >> at the end of the timeout defined by this parameter. In combination
>> >> with the standby snapshot taken in bgwriter since 9.4, this is causing
>> >> segments to be always switched even if a system is completely idle.
>> >> Before that, in 9.3 and older versions, we would have a segment
>> >> forcibly switched on an idle system only once per checkpoint.
>> >>
>> >
>> > Okay, so this will fix the case where the system is idle, but what I
>> > am slightly worried is that it should not miss to log the standby
>> > snapshot
>> > due to this check when there is actually some activity in the system.
>> > Why is not possible to have a case such that the segment is forcibly
>> > switched due to reason other than checkpoint (user has requested the
>> > same) and the current insert LSN is at beginning of a new segment
>> > due to write activity? If that is possible, which to me theoretically
>> > seems
>> > possible, then I think bgwriter will miss to log the standby snapshot.
>>
>> Yes, with segments forcibly switched by users this check would make
>> the bgwriter not log in a snapshot if the last action done by server
>> was XLOG_SWITCH. Based on the patch I sent, the only alternative would
>> be to not forcedSegSwitchLSN in those code paths. Perhaps that's fine
>> enough for back-branches..
>>
>
> Yeah, that can work, but I think the code won't look too clean.  I think
> lets try out something on lines of what is suggested by Andres and
> see how it turns out.

OK, so as a first step and after thinking about the whole for a while,
I have finished with the patch attached. This patch is aimed at
avoiding unnecessary checkpoints on idle systems when wal_level >=
hot_standby by centralizing the check to look at if there has some WAL
activity since the last checkpoint. This way, the system does not
generate anymore standby-related records if no activity has occurred.
I think that this is a good step forward, still it does not address
yet the problem of segments forcibly switched, particularly when
archive_timeout has a low value.

In order to address the other problem with switched segments, I think
that we could extend the routine XLOGHasActivity that the patch
attached introduces with some more fine-grained checks. After looking
at the code I think as well that we had better save into shared memory
the last LSN position that was forcibly switched and make use of that
in the bgwriter.

So, thoughts about the attached?
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a2846c4..51fd9e9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7869,6 +7869,27 @@ GetFlushRecPtr(void)
 }
 
 /*
+ * XLOGHasActivity -- Check if XLOG had some significant activity or
+ * if it is idle lately. This is primarily used to check if there has
+ * been some WAL activity since the last checkpoint that occurred on
+ * system to control the generaton of XLOG record related to standbys.
+ */
+bool
+XLOGHasActivity(void)
+{
+	XLogCtlInsert *Insert = >Insert;
+	XLogRecPtr	redo_lsn = ControlFile->checkPointCopy.redo;
+	uint64		prev_bytepos;
+
+	/* Check if any activity has happened since last checkpoint */
+	SpinLockAcquire(>insertpos_lck);
+	prev_bytepos = Insert->PrevBytePos;
+	SpinLockRelease(>insertpos_lck);
+
+	return XLogBytePosToRecPtr(prev_bytepos) == redo_lsn;
+}
+
+/*
  * Get the time of the last xlog segment switch
  */
 pg_time_t
@@ -8239,7 +8260,7 @@ CreateCheckPoint(int flags)
 	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
 	{
-		if (prevPtr == ControlFile->checkPointCopy.redo &&
+		if (XLOGHasActivity() &&
 			prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
 		{
 			WALInsertLockRelease();
@@ -8404,10 +8425,10 @@ CreateCheckPoint(int flags)
 	 * allows us to reconstruct the state of running transactions during
 	 * archive recovery, if required. Skip, if this info disabled.
 	 *
-	 * If we are shutting down, or Startup process is completing crash
-	 * recovery we don't need to write running xact data.
+	 * If we are shutting down, are on a non-idle system or Startup process
+	 * is completing crash recovery we don't need to write running xact data.
 	 */
-	if (!shutdown && XLogStandbyInfoActive())
+	if (!shutdown && XLOGHasActivity() && XLogStandbyInfoActive())
 		LogStandbySnapshot();
 
 	START_CRIT_SECTION();
diff --git a/src/backend/postmaster/bgwriter.c 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-19 Thread Amit Kapila
On Tue, Jan 19, 2016 at 12:41 PM, Michael Paquier 
wrote:
>
> On Mon, Jan 18, 2016 at 10:19 PM, Amit Kapila 
wrote:
> > On Mon, Jan 18, 2016 at 10:54 AM, Michael Paquier
> >  wrote:
> >> Yes, the thing is that, as mentioned at the beginning of the thread, a
> >> low value of archive_timeout causes a segment to be forcibly switched
> >> at the end of the timeout defined by this parameter. In combination
> >> with the standby snapshot taken in bgwriter since 9.4, this is causing
> >> segments to be always switched even if a system is completely idle.
> >> Before that, in 9.3 and older versions, we would have a segment
> >> forcibly switched on an idle system only once per checkpoint.
> >>
> >
> > Okay, so this will fix the case where the system is idle, but what I
> > am slightly worried is that it should not miss to log the standby
snapshot
> > due to this check when there is actually some activity in the system.
> > Why is not possible to have a case such that the segment is forcibly
> > switched due to reason other than checkpoint (user has requested the
> > same) and the current insert LSN is at beginning of a new segment
> > due to write activity? If that is possible, which to me theoretically
seems
> > possible, then I think bgwriter will miss to log the standby snapshot.
>
> Yes, with segments forcibly switched by users this check would make
> the bgwriter not log in a snapshot if the last action done by server
> was XLOG_SWITCH. Based on the patch I sent, the only alternative would
> be to not forcedSegSwitchLSN in those code paths. Perhaps that's fine
> enough for back-branches..
>

Yeah, that can work, but I think the code won't look too clean.  I think
lets try out something on lines of what is suggested by Andres and
see how it turns out.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Amit Kapila
On Mon, Jan 18, 2016 at 10:54 AM, Michael Paquier 
wrote:
>
> On Sun, Jan 17, 2016 at 1:37 PM, Amit Kapila 
wrote:
> > On Sat, Jan 16, 2016 at 6:37 PM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >>
> >>
> >>
> >
> > So here if I understand correctly the check related to the last segment
> > forcibly switched is based on the fact that if it is forcibly switched,
then
> > we don't need to log this record? What is the reason of such an
> > assumption?
>
> Yes, the thing is that, as mentioned at the beginning of the thread, a
> low value of archive_timeout causes a segment to be forcibly switched
> at the end of the timeout defined by this parameter. In combination
> with the standby snapshot taken in bgwriter since 9.4, this is causing
> segments to be always switched even if a system is completely idle.
> Before that, in 9.3 and older versions, we would have a segment
> forcibly switched on an idle system only once per checkpoint.
>

Okay, so this will fix the case where the system is idle, but what I
am slightly worried is that it should not miss to log the standby snapshot
due to this check when there is actually some activity in the system.
Why is not possible to have a case such that the segment is forcibly
switched due to reason other than checkpoint (user has requested the
same) and the current insert LSN is at beginning of a new segment
due to write activity? If that is possible, which to me theoretically seems
possible, then I think bgwriter will miss to log the standby snapshot.

>
> The
> documentation is already clear about that actually. The whole point of
> this patch is to fix this regression, aka switch to a new segment only
> once per checkpoint.
>
> > It is not very clear by reading the comments you have
> > added in patch, may be you can expand comments slightly to explain
> > the reason of same.
>
> OK. Here are the comments that this patch is adding:
> - * only log if enough time has passed and some xlog record
has
> - * been inserted.
> + * Only log if enough time has passed and some xlog record
has
> + * been inserted on a new segment. On an idle system where
> + * segments can be archived in a fast pace with for example a
> + * low archive_command setting, avoid as well logging a new
> + * standby snapshot if the current insert position is still
> + * at the beginning of the segment that has just been
switched.
> + *
> + * It is possible that GetXLogLastSwitchPtr points to the
last
> + * position of previous segment or to the first position of
the
> + * new segment after the switch, hence take both cases into
> + * account when deciding if a standby snapshot should be
taken.
> + * (see comments on top of RequestXLogSwitch for more
details).
>   */
> It makes mention of the problem that it is trying to fix, perhaps you
> mean that this should mention the fact that on an idle system standby
> snapshots should happen at worst once per checkpoint?
>

I mean to say that in below part of comment, explanation about the
the check related to insert position is quite clear whereas why it is
okay to avoid logging standby snapshot when the segment is not
forcibly switched is not apparent.

* avoid as well logging a new
* standby snapshot if the current insert position is still
* at the beginning of the segment that has just been switched.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Andres Freund
On 2015-12-21 16:26:25 +0900, Michael Paquier wrote:
> On Sun, Dec 20, 2015 at 10:14 PM, Michael Paquier
>  wrote:
> > Speaking of which, this patch was registered in this CF, I am moving
> > it to the next as a bug fix.
> 
> I found a stupid bug in my previous patch: when issuing XLOG_SWITCH it
> is possible that the return LSN pointer is on the new segment that has
> been forcibly archived if RequestXLogSwitch is called multiple times
> and that subsequent calls are not necessary. Patch updated.

I find this patch rather unsatisfactory. Yes, it kinda solves the
problem of archive timeout, but it leaves the bigger and longer standing
problems of unneccessary checkpoints with wal_level=hs in place. It's
also somewhat fragile in my opinion.

I think we rather want a per backend (or perhaps per wal insertion lock)
flag that says 'last relevant record inserted at', and a way to not set
that during insertion. Then during a checkpoint or the relevant bgwriter
code, we look wether anything relevant happened in any backend since the
last time we performed a checkpoint/logged a running xacts snapshot.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Amit Kapila
On Mon, Jan 18, 2016 at 7:08 PM, Andres Freund  wrote:
>
> On 2015-12-21 16:26:25 +0900, Michael Paquier wrote:
> > On Sun, Dec 20, 2015 at 10:14 PM, Michael Paquier
> >  wrote:
> > > Speaking of which, this patch was registered in this CF, I am moving
> > > it to the next as a bug fix.
> >
> > I found a stupid bug in my previous patch: when issuing XLOG_SWITCH it
> > is possible that the return LSN pointer is on the new segment that has
> > been forcibly archived if RequestXLogSwitch is called multiple times
> > and that subsequent calls are not necessary. Patch updated.
>
> I find this patch rather unsatisfactory. Yes, it kinda solves the
> problem of archive timeout, but it leaves the bigger and longer standing
> problems of unneccessary checkpoints with wal_level=hs in place. It's
> also somewhat fragile in my opinion.
>
> I think we rather want a per backend (or perhaps per wal insertion lock)
> flag that says 'last relevant record inserted at', and a way to not set
> that during insertion. Then during a checkpoint or the relevant bgwriter
> code, we look wether anything relevant happened in any backend since the
> last time we performed a checkpoint/logged a running xacts snapshot.
>

Sounds to be a more robust way of dealing with this problem.  Michael,
if you don't disagree with above proposal, then we can mark this patch
as Waiting on Author?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Michael Paquier
On Tue, Jan 19, 2016 at 1:28 PM, Amit Kapila  wrote:
> On Mon, Jan 18, 2016 at 7:08 PM, Andres Freund  wrote:
>> I find this patch rather unsatisfactory. Yes, it kinda solves the
>> problem of archive timeout, but it leaves the bigger and longer standing
>> problems of unneccessary checkpoints with wal_level=hs in place. It's
>> also somewhat fragile in my opinion.

Check.

>> I think we rather want a per backend (or perhaps per wal insertion lock)
>> flag that says 'last relevant record inserted at', and a way to not set
>> that during insertion. Then during a checkpoint or the relevant bgwriter
>> code, we look wether anything relevant happened in any backend since the
>> last time we performed a checkpoint/logged a running xacts snapshot.

And in this case, the last relevant record would be caused by a forced
segment switch or a checkpoint record, right? Doing that per WAL
insertion lock seems more scalable to me. I haven't looked at the code
yet though to see how that would work out.

> Sounds to be a more robust way of dealing with this problem.  Michael,
> if you don't disagree with above proposal, then we can mark this patch
> as Waiting on Author?

Yeah let's do so. I'll think more about this thing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Michael Paquier
On Mon, Jan 18, 2016 at 10:19 PM, Amit Kapila  wrote:
> On Mon, Jan 18, 2016 at 10:54 AM, Michael Paquier
>  wrote:
>>
>> On Sun, Jan 17, 2016 at 1:37 PM, Amit Kapila 
>> wrote:
>> > On Sat, Jan 16, 2016 at 6:37 PM, Michael Paquier
>> > 
>> > wrote:
>> >>
>> >>
>> >>
>> >
>> > So here if I understand correctly the check related to the last segment
>> > forcibly switched is based on the fact that if it is forcibly switched,
>> > then
>> > we don't need to log this record? What is the reason of such an
>> > assumption?
>>
>> Yes, the thing is that, as mentioned at the beginning of the thread, a
>> low value of archive_timeout causes a segment to be forcibly switched
>> at the end of the timeout defined by this parameter. In combination
>> with the standby snapshot taken in bgwriter since 9.4, this is causing
>> segments to be always switched even if a system is completely idle.
>> Before that, in 9.3 and older versions, we would have a segment
>> forcibly switched on an idle system only once per checkpoint.
>>
>
> Okay, so this will fix the case where the system is idle, but what I
> am slightly worried is that it should not miss to log the standby snapshot
> due to this check when there is actually some activity in the system.
> Why is not possible to have a case such that the segment is forcibly
> switched due to reason other than checkpoint (user has requested the
> same) and the current insert LSN is at beginning of a new segment
> due to write activity? If that is possible, which to me theoretically seems
> possible, then I think bgwriter will miss to log the standby snapshot.

Yes, with segments forcibly switched by users this check would make
the bgwriter not log in a snapshot if the last action done by server
was XLOG_SWITCH. Based on the patch I sent, the only alternative would
be to not forcedSegSwitchLSN in those code paths. Perhaps that's fine
enough for back-branches..

>> The
>> documentation is already clear about that actually. The whole point of
>> this patch is to fix this regression, aka switch to a new segment only
>> once per checkpoint.
>>
>> > It is not very clear by reading the comments you have
>> > added in patch, may be you can expand comments slightly to explain
>> > the reason of same.
>>
>> OK. Here are the comments that this patch is adding:
>> - * only log if enough time has passed and some xlog record
>> has
>> - * been inserted.
>> + * Only log if enough time has passed and some xlog record
>> has
>> + * been inserted on a new segment. On an idle system where
>> + * segments can be archived in a fast pace with for example a
>> + * low archive_command setting, avoid as well logging a new
>> + * standby snapshot if the current insert position is still
>> + * at the beginning of the segment that has just been
>> switched.
>> + *
>> + * It is possible that GetXLogLastSwitchPtr points to the
>> last
>> + * position of previous segment or to the first position of
>> the
>> + * new segment after the switch, hence take both cases into
>> + * account when deciding if a standby snapshot should be
>> taken.
>> + * (see comments on top of RequestXLogSwitch for more
>> details).
>>   */
>> It makes mention of the problem that it is trying to fix, perhaps you
>> mean that this should mention the fact that on an idle system standby
>> snapshots should happen at worst once per checkpoint?
>>
>
> I mean to say that in below part of comment, explanation about the
> the check related to insert position is quite clear whereas why it is
> okay to avoid logging standby snapshot when the segment is not
> forcibly switched is not apparent.
>
> * avoid as well logging a new
> * standby snapshot if the current insert position is still
> * at the beginning of the segment that has just been switched.

Perhaps: "avoid as well logging a new standby snapshot if the current
insert position is at the beginning of a segment that has been
*forcibly* switched at checkpoint or by archive_timeout"?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-17 Thread Michael Paquier
On Sun, Jan 17, 2016 at 1:37 PM, Amit Kapila  wrote:
> On Sat, Jan 16, 2016 at 6:37 PM, Michael Paquier 
> wrote:
>>
>> On Sat, Jan 16, 2016 at 9:07 PM, Amit Kapila 
>> wrote:
>> > On Sat, Jan 16, 2016 at 5:08 PM, Michael Paquier
>> > 
>> > wrote:
>> >>
>> >> On Sat, Jan 16, 2016 at 7:10 PM, Amit Kapila 
>> >> wrote:
>> >> > Won't this be a problem if the checkpoint occurs after a long time
>> >> > and
>> >> > in
>> >> > the mean time there is some activity in the server?
>> >>
>> >> Why? If there is some activity on the server, the snapshot will be
>> >> immediately taken at the next iteration without caring about the
>> >> checkpoint.
>> >>
>> >
>> > + (insert_lsn % XLOG_SEG_SIZE) != SizeOfXLogLongPHD))
>> >
>> > Do you mean to intend that it is protected by above check in the
>> > patch?
>>
>> Yep, in combination with is_switch_current to check if the last
>> completed segment was forcibly switched.
>>
>> > Isn't it possible that so much WAL is inserted between bgwriter cycles,
>> > that when it checks the location of WAL, it founds it to be at the
>> > beginning
>> > of a new segment?
>>
>> Er, no. Even if the insert LSN is at the beginning of a new segment,
>> this would take a standby snapshot if the last segment was not
>> forcibly switched.
>>
>
> So here if I understand correctly the check related to the last segment
> forcibly switched is based on the fact that if it is forcibly switched, then
> we don't need to log this record? What is the reason of such an
> assumption?

Yes, the thing is that, as mentioned at the beginning of the thread, a
low value of archive_timeout causes a segment to be forcibly switched
at the end of the timeout defined by this parameter. In combination
with the standby snapshot taken in bgwriter since 9.4, this is causing
segments to be always switched even if a system is completely idle.
Before that, in 9.3 and older versions, we would have a segment
forcibly switched on an idle system only once per checkpoint. The
documentation is already clear about that actually. The whole point of
this patch is to fix this regression, aka switch to a new segment only
once per checkpoint.

> It is not very clear by reading the comments you have
> added in patch, may be you can expand comments slightly to explain
> the reason of same.

OK. Here are the comments that this patch is adding:
- * only log if enough time has passed and some xlog record has
- * been inserted.
+ * Only log if enough time has passed and some xlog record has
+ * been inserted on a new segment. On an idle system where
+ * segments can be archived in a fast pace with for example a
+ * low archive_command setting, avoid as well logging a new
+ * standby snapshot if the current insert position is still
+ * at the beginning of the segment that has just been switched.
+ *
+ * It is possible that GetXLogLastSwitchPtr points to the last
+ * position of previous segment or to the first position of the
+ * new segment after the switch, hence take both cases into
+ * account when deciding if a standby snapshot should be taken.
+ * (see comments on top of RequestXLogSwitch for more details).
  */
It makes mention of the problem that it is trying to fix, perhaps you
mean that this should mention the fact that on an idle system standby
snapshots should happen at worst once per checkpoint?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-16 Thread Amit Kapila
On Sat, Jan 16, 2016 at 5:08 PM, Michael Paquier 
wrote:
>
> On Sat, Jan 16, 2016 at 7:10 PM, Amit Kapila 
wrote:
> > On Sun, Dec 20, 2015 at 6:44 PM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >> On Sun, Nov 8, 2015 at 9:50 PM, Michael Paquier
> >>  wrote:
> >> > On Sat, Nov 7, 2015 at 3:54 PM, Michael Paquier wrote:
> >> >> I thought about something like that at some point by saving a
minimum
> >> >> activity pointer in XLogCtl, updated each time a segment was
forcibly
> >> >> switched or after inserting a checkpoint record. Then the bgwriter
> >> >> looked at if the current insert position matched this minimum
activity
> >> >> pointer, skipping LogStandbySnapshot if both positions match. Does
> >> >> this match your line of thoughts?
> >> >
> >> > Looking at the code, it occurred to me that the LSN position saved
for
> >> > a XLOG_SWITCH record is the last position of current segment, so we
> >> > would still need to check if the current insert LSN matches the
> >> > beginning of a new segment and if the last segment was forcibly
> >> > switched by saving RecPtr of RequestXLogSwitch in XLogCtl for
example.
> >> > Thoughts?
> >>
> >> I haven't given up on this patch yet, and putting again my head on
> >> this problem I have finished with the patch attached, which checks if
> >> the current insert LSN position is at the beginning of a segment that
> >> has just been switched to decide if a standby snapshot should be
> >> logged or not. This allows bringing back an idle system to the pre-9.3
> >> state where a segment would be archived in the case of a low
> >> archive_timeout only when a checkpoint has been issued on the system.
> >>
> >
> > Won't this be a problem if the checkpoint occurs after a long time and
in
> > the mean time there is some activity in the server?
>
> Why? If there is some activity on the server, the snapshot will be
> immediately taken at the next iteration without caring about the
> checkpoint.
>

+ (insert_lsn % XLOG_SEG_SIZE) != SizeOfXLogLongPHD))

Do you mean to intend that it is protected by above check in the
patch?

Isn't it possible that so much WAL is inserted between bgwriter cycles,
that when it checks the location of WAL, it founds it to be at the beginning
of a new segment?

> > Another idea to solve this issue could be to see if there is any
progress
> > in the server by checking buffers dirtied/written (we can refer that
> > information using pgBufferUsage) since last time we log this record in
> > bgwriter.
>
> Yes, that may be an idea worth considering, but I really think that we
> had better measure that at WAL level..
>

I thought this is quite close to the previous patch you proposed where
Andres wanted some measurement in terms of progress since last
checkpoint. I understand as per current code your patch can work, but
what if some more similar WAL records needs to be added?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-16 Thread Michael Paquier
On Sat, Jan 16, 2016 at 7:10 PM, Amit Kapila  wrote:
> On Sun, Dec 20, 2015 at 6:44 PM, Michael Paquier 
> wrote:
>> On Sun, Nov 8, 2015 at 9:50 PM, Michael Paquier
>>  wrote:
>> > On Sat, Nov 7, 2015 at 3:54 PM, Michael Paquier wrote:
>> >> I thought about something like that at some point by saving a minimum
>> >> activity pointer in XLogCtl, updated each time a segment was forcibly
>> >> switched or after inserting a checkpoint record. Then the bgwriter
>> >> looked at if the current insert position matched this minimum activity
>> >> pointer, skipping LogStandbySnapshot if both positions match. Does
>> >> this match your line of thoughts?
>> >
>> > Looking at the code, it occurred to me that the LSN position saved for
>> > a XLOG_SWITCH record is the last position of current segment, so we
>> > would still need to check if the current insert LSN matches the
>> > beginning of a new segment and if the last segment was forcibly
>> > switched by saving RecPtr of RequestXLogSwitch in XLogCtl for example.
>> > Thoughts?
>>
>> I haven't given up on this patch yet, and putting again my head on
>> this problem I have finished with the patch attached, which checks if
>> the current insert LSN position is at the beginning of a segment that
>> has just been switched to decide if a standby snapshot should be
>> logged or not. This allows bringing back an idle system to the pre-9.3
>> state where a segment would be archived in the case of a low
>> archive_timeout only when a checkpoint has been issued on the system.
>>
>
> Won't this be a problem if the checkpoint occurs after a long time and in
> the mean time there is some activity in the server?

Why? If there is some activity on the server, the snapshot will be
immediately taken at the next iteration without caring about the
checkpoint.

> Another idea to solve this issue could be to see if there is any progress
> in the server by checking buffers dirtied/written (we can refer that
> information using pgBufferUsage) since last time we log this record in
> bgwriter.

Yes, that may be an idea worth considering, but I really think that we
had better measure that at WAL level..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-16 Thread Michael Paquier
On Sat, Jan 16, 2016 at 9:07 PM, Amit Kapila  wrote:
> On Sat, Jan 16, 2016 at 5:08 PM, Michael Paquier 
> wrote:
>>
>> On Sat, Jan 16, 2016 at 7:10 PM, Amit Kapila 
>> wrote:
>> > On Sun, Dec 20, 2015 at 6:44 PM, Michael Paquier
>> >  wrote:
>> >> > Looking at the code, it occurred to me that the LSN position saved
>> >> > for
>> >> > a XLOG_SWITCH record is the last position of current segment, so we
>> >> > would still need to check if the current insert LSN matches the
>> >> > beginning of a new segment and if the last segment was forcibly
>> >> > switched by saving RecPtr of RequestXLogSwitch in XLogCtl for
>> >> > example.
>> >> > Thoughts?
>> >>
>> >> I haven't given up on this patch yet, and putting again my head on
>> >> this problem I have finished with the patch attached, which checks if
>> >> the current insert LSN position is at the beginning of a segment that
>> >> has just been switched to decide if a standby snapshot should be
>> >> logged or not. This allows bringing back an idle system to the pre-9.3
>> >> state where a segment would be archived in the case of a low
>> >> archive_timeout only when a checkpoint has been issued on the system.
>> >>
>> >
>> > Won't this be a problem if the checkpoint occurs after a long time and
>> > in
>> > the mean time there is some activity in the server?
>>
>> Why? If there is some activity on the server, the snapshot will be
>> immediately taken at the next iteration without caring about the
>> checkpoint.
>>
>
> + (insert_lsn % XLOG_SEG_SIZE) != SizeOfXLogLongPHD))
>
> Do you mean to intend that it is protected by above check in the
> patch?

Yep, in combination with is_switch_current to check if the last
completed segment was forcibly switched.

> Isn't it possible that so much WAL is inserted between bgwriter cycles,
> that when it checks the location of WAL, it founds it to be at the beginning
> of a new segment?

Er, no. Even if the insert LSN is at the beginning of a new segment,
this would take a standby snapshot if the last segment was not
forcibly switched.

>> > Another idea to solve this issue could be to see if there is any
>> > progress
>> > in the server by checking buffers dirtied/written (we can refer that
>> > information using pgBufferUsage) since last time we log this record in
>> > bgwriter.
>>
>> Yes, that may be an idea worth considering, but I really think that we
>> had better measure that at WAL level..
>
> I thought this is quite close to the previous patch you proposed where
> Andres wanted some measurement in terms of progress since last
> checkpoint. I understand as per current code your patch can work, but
> what if some more similar WAL records needs to be added?

A record like this one for the bgwriter? We could still rely on the
same check tracking the last-forced-segment, no?

It seems to me that tracking progress here is not really only a matter
of the number of shared buffers dirtied or written, we would need as
well to track XLOG_STANDBY_LOCK and provide for them fresh snapshots.
Imagine for example a read-only load on the master with some exclusive
locks taken on relations from time to time (perhaps unlikely so but
who knows?).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-16 Thread Amit Kapila
On Sat, Jan 16, 2016 at 6:37 PM, Michael Paquier 
wrote:
>
> On Sat, Jan 16, 2016 at 9:07 PM, Amit Kapila 
wrote:
> > On Sat, Jan 16, 2016 at 5:08 PM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >>
> >> On Sat, Jan 16, 2016 at 7:10 PM, Amit Kapila 
> >> wrote:
> >> > Won't this be a problem if the checkpoint occurs after a long time
and
> >> > in
> >> > the mean time there is some activity in the server?
> >>
> >> Why? If there is some activity on the server, the snapshot will be
> >> immediately taken at the next iteration without caring about the
> >> checkpoint.
> >>
> >
> > + (insert_lsn % XLOG_SEG_SIZE) != SizeOfXLogLongPHD))
> >
> > Do you mean to intend that it is protected by above check in the
> > patch?
>
> Yep, in combination with is_switch_current to check if the last
> completed segment was forcibly switched.
>
> > Isn't it possible that so much WAL is inserted between bgwriter cycles,
> > that when it checks the location of WAL, it founds it to be at the
beginning
> > of a new segment?
>
> Er, no. Even if the insert LSN is at the beginning of a new segment,
> this would take a standby snapshot if the last segment was not
> forcibly switched.
>

So here if I understand correctly the check related to the last segment
forcibly switched is based on the fact that if it is forcibly switched, then
we don't need to log this record?  What is the reason of such an
assumption?   It is not very clear by reading the comments you have
added in patch, may be you can expand comments slightly to explain
the reason of same.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-16 Thread Amit Kapila
On Sun, Dec 20, 2015 at 6:44 PM, Michael Paquier 
wrote:
> On Sun, Nov 8, 2015 at 9:50 PM, Michael Paquier
>  wrote:
> > On Sat, Nov 7, 2015 at 3:54 PM, Michael Paquier wrote:
> >> I thought about something like that at some point by saving a minimum
> >> activity pointer in XLogCtl, updated each time a segment was forcibly
> >> switched or after inserting a checkpoint record. Then the bgwriter
> >> looked at if the current insert position matched this minimum activity
> >> pointer, skipping LogStandbySnapshot if both positions match. Does
> >> this match your line of thoughts?
> >
> > Looking at the code, it occurred to me that the LSN position saved for
> > a XLOG_SWITCH record is the last position of current segment, so we
> > would still need to check if the current insert LSN matches the
> > beginning of a new segment and if the last segment was forcibly
> > switched by saving RecPtr of RequestXLogSwitch in XLogCtl for example.
> > Thoughts?
>
> I haven't given up on this patch yet, and putting again my head on
> this problem I have finished with the patch attached, which checks if
> the current insert LSN position is at the beginning of a segment that
> has just been switched to decide if a standby snapshot should be
> logged or not. This allows bringing back an idle system to the pre-9.3
> state where a segment would be archived in the case of a low
> archive_timeout only when a checkpoint has been issued on the system.
>

Won't this be a problem if the checkpoint occurs after a long time and in
the mean time there is some activity in the server?

Another idea to solve this issue could be to see if there is any progress
in the server by checking buffers dirtied/written (we can refer that
information using pgBufferUsage) since last time we log this record in
bgwriter.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-12-20 Thread Michael Paquier
On Sun, Nov 8, 2015 at 9:50 PM, Michael Paquier
 wrote:
> On Sat, Nov 7, 2015 at 3:54 PM, Michael Paquier wrote:
>> I thought about something like that at some point by saving a minimum
>> activity pointer in XLogCtl, updated each time a segment was forcibly
>> switched or after inserting a checkpoint record. Then the bgwriter
>> looked at if the current insert position matched this minimum activity
>> pointer, skipping LogStandbySnapshot if both positions match. Does
>> this match your line of thoughts?
>
> Looking at the code, it occurred to me that the LSN position saved for
> a XLOG_SWITCH record is the last position of current segment, so we
> would still need to check if the current insert LSN matches the
> beginning of a new segment and if the last segment was forcibly
> switched by saving RecPtr of RequestXLogSwitch in XLogCtl for example.
> Thoughts?

I haven't given up on this patch yet, and putting again my head on
this problem I have finished with the patch attached, which checks if
the current insert LSN position is at the beginning of a segment that
has just been switched to decide if a standby snapshot should be
logged or not. This allows bringing back an idle system to the pre-9.3
state where a segment would be archived in the case of a low
archive_timeout only when a checkpoint has been issued on the system.
In order to achieve this idea I have added a field on XLogCtl that
saves the last time a segment has been forcibly switched after
XLOG_SWITCH.

Honestly I am failing to see why we should track the progress since
last checkpoint as mentioned upthread, and the current behavior is
certainly a regression.

Speaking of which, this patch was registered in this CF, I am moving
it to the next as a bug fix.
Regards,
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 147fd53..6608666 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -526,6 +526,8 @@ typedef struct XLogCtlData
 	XLogRecPtr	RedoRecPtr;		/* a recent copy of Insert->RedoRecPtr */
 	uint32		ckptXidEpoch;	/* nextXID & epoch of latest checkpoint */
 	TransactionId ckptXid;
+	XLogRecPtr	forcedSegSwitchLSN;	/* LSN position of last forced segment
+	 * switch */
 	XLogRecPtr	asyncXactLSN;	/* LSN of newest async commit/abort */
 	XLogRecPtr	replicationSlotMinLSN;	/* oldest LSN needed by any slot */
 
@@ -6315,6 +6317,7 @@ StartupXLOG(void)
 	 checkPoint.newestCommitTs);
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
+	XLogCtl->forcedSegSwitchLSN = InvalidXLogRecPtr;
 
 	/*
 	 * Initialize replication slots, before there's a chance to remove
@@ -8988,6 +8991,10 @@ RequestXLogSwitch(void)
 	XLogBeginInsert();
 	RecPtr = XLogInsert(RM_XLOG_ID, XLOG_SWITCH);
 
+	SpinLockAcquire(>info_lck);
+	XLogCtl->forcedSegSwitchLSN = RecPtr;
+	SpinLockRelease(>info_lck);
+
 	return RecPtr;
 }
 
@@ -10628,6 +10635,21 @@ GetXLogWriteRecPtr(void)
 }
 
 /*
+ * Get last WAL position where an XLOG segment has been forcibly switched.
+ */
+XLogRecPtr
+GetXLogLastSwitchPtr(void)
+{
+	XLogRecPtr	last_switch_lsn;
+
+	SpinLockAcquire(>info_lck);
+	last_switch_lsn = XLogCtl->forcedSegSwitchLSN;
+	SpinLockRelease(>info_lck);
+
+	return last_switch_lsn;
+}
+
+/*
  * Returns the redo pointer of the last checkpoint or restartpoint. This is
  * the oldest point in WAL that we still need, if we have to restart recovery.
  */
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 65465d6..ddd6efc 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -315,14 +315,28 @@ BackgroundWriterMain(void)
   LOG_SNAPSHOT_INTERVAL_MS);
 
 			/*
-			 * only log if enough time has passed and some xlog record has
-			 * been inserted.
+			 * Only log if enough time has passed and some xlog record has
+			 * been inserted on a new segment. On an idle system where
+			 * segments can be archived in a fast pace with for example a
+			 * low archive_command setting, avoid as well logging a new
+			 * standby snapshot if the current insert position is still
+			 * at the beginning of the segment that has just been switched.
 			 */
-			if (now >= timeout &&
-last_snapshot_lsn != GetXLogInsertRecPtr())
+			if (now >= timeout)
 			{
-last_snapshot_lsn = LogStandbySnapshot();
-last_snapshot_ts = now;
+XLogRecPtr	insert_lsn = GetXLogInsertRecPtr();
+XLogRecPtr	last_forced_switch_lsn = GetXLogLastSwitchPtr();
+XLogSegNo	insert_segno;
+
+XLByteToSeg(insert_lsn, insert_segno);
+
+if (last_snapshot_lsn != insert_lsn &&
+	!XLByteInPrevSeg(last_forced_switch_lsn, insert_segno) &&
+	(insert_lsn % XLOG_SEG_SIZE) != SizeOfXLogLongPHD)
+{
+	last_snapshot_lsn = LogStandbySnapshot();
+	last_snapshot_ts = now;
+}
 			}
 		}
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-12-20 Thread Michael Paquier
On Sun, Dec 20, 2015 at 10:14 PM, Michael Paquier
 wrote:
> Speaking of which, this patch was registered in this CF, I am moving
> it to the next as a bug fix.

I found a stupid bug in my previous patch: when issuing XLOG_SWITCH it
is possible that the return LSN pointer is on the new segment that has
been forcibly archived if RequestXLogSwitch is called multiple times
and that subsequent calls are not necessary. Patch updated.
-- 
Michael


archive-timeout-v5.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-08 Thread Michael Paquier
On Sat, Nov 7, 2015 at 3:54 PM, Michael Paquier wrote:
> I thought about something like that at some point by saving a minimum
> activity pointer in XLogCtl, updated each time a segment was forcibly
> switched or after inserting a checkpoint record. Then the bgwriter
> looked at if the current insert position matched this minimum activity
> pointer, skipping LogStandbySnapshot if both positions match. Does
> this match your line of thoughts?

Looking at the code, it occurred to me that the LSN position saved for
a XLOG_SWITCH record is the last position of current segment, so we
would still need to check if the current insert LSN matches the
beginning of a new segment and if the last segment was forcibly
switched by saving RecPtr of RequestXLogSwitch in XLogCtl for example.
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-06 Thread Michael Paquier
On Sat, Nov 7, 2015 at 1:52 AM, Andres Freund  wrote:
> On 2015-11-06 11:42:56 -0500, Robert Haas wrote:
>> On Fri, Nov 6, 2015 at 2:47 AM, Michael Paquier
>>  wrote:
>> > I have as well thought a bit about adding a space-related constraint
>> > on the standby snapshot generated by the bgwriter, so as to not rely
>> > entirely on the interval of 15s. I finished with the attached that
>> > uses a check based on CheckPointSegments / 8 to be sure that at least
>> > this number of segments has been generated since the last checkpoint
>> > before logging a new snapshot. I guess that's less brittle than the
>> > last patch. Thoughts?
>>
>> I can't see why that would be a good idea.  My understanding is that
>> the logical decoding code needs to get those messages pretty
>> regularly, and I don't see why that need would be reduced on systems
>> where CheckPointSegments is large.
>
> Precisely.
>
>
> What I'm thinking of right now is a marker somewhere in shared memory,
> that tells whether anything worthwhile has happened since the last
> determination of the redo pointer. Where standby snapshots don't
> count. That seems like it'd be to maintain going forward than doing
> precise size calculations like CreateCheckPoint() already does, and
> would additionally need to handle its own standby snapshot, not to speak
> of the background ones.

I thought about something like that at some point by saving a minimum
activity pointer in XLogCtl, updated each time a segment was forcibly
switched or after inserting a checkpoint record. Then the bgwriter
looked at if the current insert position matched this minimum activity
pointer, skipping LogStandbySnapshot if both positions match. Does
this match your line of thoughts?

> Seems like it'd be doable in ReserveXLogInsertLocation().
> Whether it's actually worthwhile I'm not all that sure tho.

I am not sure doing extra work there is a good idea. There are already
5 instructions within the spinlock section, and this code path is
already high in many profiles for busy systems.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-06 Thread Andres Freund
On November 6, 2015 6:21:50 PM GMT+01:00, Robert Haas  
wrote:
>On Fri, Nov 6, 2015 at 11:52 AM, Andres Freund 
>wrote:
>> Seems like it'd be doable in ReserveXLogInsertLocation().
>>
>> Whether it's actually worthwhile I'm not all that sure tho.
>
>Why not?

Adds another instruction in one of the hottest spinlock protected sections of 
PG. Probably won't be significant, but...

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-06 Thread Robert Haas
On Fri, Nov 6, 2015 at 11:52 AM, Andres Freund  wrote:
> On 2015-11-06 11:42:56 -0500, Robert Haas wrote:
>> On Fri, Nov 6, 2015 at 2:47 AM, Michael Paquier
>>  wrote:
>> > I have as well thought a bit about adding a space-related constraint
>> > on the standby snapshot generated by the bgwriter, so as to not rely
>> > entirely on the interval of 15s. I finished with the attached that
>> > uses a check based on CheckPointSegments / 8 to be sure that at least
>> > this number of segments has been generated since the last checkpoint
>> > before logging a new snapshot. I guess that's less brittle than the
>> > last patch. Thoughts?
>>
>> I can't see why that would be a good idea.  My understanding is that
>> the logical decoding code needs to get those messages pretty
>> regularly, and I don't see why that need would be reduced on systems
>> where CheckPointSegments is large.
>
> Precisely.
>
> What I'm thinking of right now is a marker somewhere in shared memory,
> that tells whether anything worthwhile has happened since the last
> determination of the redo pointer. Where standby snapshots don't
> count. That seems like it'd be to maintain going forward than doing
> precise size calculations like CreateCheckPoint() already does, and
> would additionally need to handle its own standby snapshot, not to speak
> of the background ones.

Good idea.

> Seems like it'd be doable in ReserveXLogInsertLocation().
>
> Whether it's actually worthwhile I'm not all that sure tho.

Why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-06 Thread Robert Haas
On Fri, Nov 6, 2015 at 2:47 AM, Michael Paquier
 wrote:
> I have as well thought a bit about adding a space-related constraint
> on the standby snapshot generated by the bgwriter, so as to not rely
> entirely on the interval of 15s. I finished with the attached that
> uses a check based on CheckPointSegments / 8 to be sure that at least
> this number of segments has been generated since the last checkpoint
> before logging a new snapshot. I guess that's less brittle than the
> last patch. Thoughts?

I can't see why that would be a good idea.  My understanding is that
the logical decoding code needs to get those messages pretty
regularly, and I don't see why that need would be reduced on systems
where CheckPointSegments is large.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-06 Thread Andres Freund
On 2015-11-06 11:42:56 -0500, Robert Haas wrote:
> On Fri, Nov 6, 2015 at 2:47 AM, Michael Paquier
>  wrote:
> > I have as well thought a bit about adding a space-related constraint
> > on the standby snapshot generated by the bgwriter, so as to not rely
> > entirely on the interval of 15s. I finished with the attached that
> > uses a check based on CheckPointSegments / 8 to be sure that at least
> > this number of segments has been generated since the last checkpoint
> > before logging a new snapshot. I guess that's less brittle than the
> > last patch. Thoughts?
> 
> I can't see why that would be a good idea.  My understanding is that
> the logical decoding code needs to get those messages pretty
> regularly, and I don't see why that need would be reduced on systems
> where CheckPointSegments is large.

Precisely.


What I'm thinking of right now is a marker somewhere in shared memory,
that tells whether anything worthwhile has happened since the last
determination of the redo pointer. Where standby snapshots don't
count. That seems like it'd be to maintain going forward than doing
precise size calculations like CreateCheckPoint() already does, and
would additionally need to handle its own standby snapshot, not to speak
of the background ones.

Seems like it'd be doable in ReserveXLogInsertLocation().

Whether it's actually worthwhile I'm not all that sure tho.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-06 Thread Robert Haas
On Fri, Nov 6, 2015 at 12:26 PM, Andres Freund  wrote:
> On November 6, 2015 6:21:50 PM GMT+01:00, Robert Haas  
> wrote:
>>On Fri, Nov 6, 2015 at 11:52 AM, Andres Freund 
>>wrote:
>>> Seems like it'd be doable in ReserveXLogInsertLocation().
>>>
>>> Whether it's actually worthwhile I'm not all that sure tho.
>>
>>Why not?
>
> Adds another instruction in one of the hottest spinlock protected sections of 
> PG. Probably won't be significant, but...

Oh.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-05 Thread Michael Paquier
On Thu, Nov 5, 2015 at 3:00 PM, Michael Paquier
 wrote:
> On Wed, Nov 4, 2015 at 7:33 PM, Andres Freund wrote:
>> As soon as a single checkpoint ever happened the early-return logic in
>> CreateCheckPoint() will fail to take the LogStandbySnapshot() in
>> CreateCheckPoint() into account. The test is:
>> if (curInsert == ControlFile->checkPoint +
>> MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
>> ControlFile->checkPoint == ControlFile->checkPointCopy.redo)
>> which obviously doesn't work if there's been a WAL record logged after
>> the redo pointer has been determined etc.
>
> Yes. If segment switches are enforced at a pace faster than
> checkpoint_timeout, this check considers that a checkpoint needs to
> happen because a SWITCH_XLOG record is in-between. I am a bit
> surprised that this should happen actually. The segment switch
> triggers a checkpoint record, and vice-versa, even for idle systems.
> Shouldn't we make this check a bit smarter then?

Ah, the documentation clearly explains that setting archive_timeout
will enforce a segment switch if any activity occurred, including a
checkpoint:
http://www.postgresql.org/docs/devel/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING
I missed that, sorry.

I have as well thought a bit about adding a space-related constraint
on the standby snapshot generated by the bgwriter, so as to not rely
entirely on the interval of 15s. I finished with the attached that
uses a check based on CheckPointSegments / 8 to be sure that at least
this number of segments has been generated since the last checkpoint
before logging a new snapshot. I guess that's less brittle than the
last patch. Thoughts?
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 08d1682..3f410b7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -526,6 +526,8 @@ typedef struct XLogCtlData
 	XLogwrtRqst LogwrtRqst;
 	XLogRecPtr	RedoRecPtr;		/* a recent copy of Insert->RedoRecPtr */
 	uint32		ckptXidEpoch;	/* nextXID & epoch of latest checkpoint */
+	XLogRecPtr	ckptPtrRec;		/* LSN position of latest checkpoint
+ * record */
 	TransactionId ckptXid;
 	XLogRecPtr	asyncXactLSN;	/* LSN of newest async commit/abort */
 	XLogRecPtr	replicationSlotMinLSN;	/* oldest LSN needed by any slot */
@@ -6316,6 +6318,7 @@ StartupXLOG(void)
 	 checkPoint.newestCommitTs);
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
+	XLogCtl->ckptPtrRec = checkPointLoc;
 
 	/*
 	 * Initialize replication slots, before there's a chance to remove
@@ -8476,10 +8479,11 @@ CreateCheckPoint(int flags)
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
 
-	/* Update shared-memory copy of checkpoint XID/epoch */
+	/* Update shared-memory copy of checkpoint XID/epoch/LSN */
 	SpinLockAcquire(>info_lck);
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
+	XLogCtl->ckptPtrRec = recptr;
 	SpinLockRelease(>info_lck);
 
 	/*
@@ -9278,6 +9282,7 @@ xlog_redo(XLogReaderState *record)
 		SpinLockAcquire(>info_lck);
 		XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 		XLogCtl->ckptXid = checkPoint.nextXid;
+		XLogCtl->ckptPtrRec = lsn;
 		SpinLockRelease(>info_lck);
 
 		/*
@@ -9328,6 +9333,7 @@ xlog_redo(XLogReaderState *record)
 		SpinLockAcquire(>info_lck);
 		XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 		XLogCtl->ckptXid = checkPoint.nextXid;
+		XLogCtl->ckptPtrRec = lsn;
 		SpinLockRelease(>info_lck);
 
 		/* TLI should not change in an on-line checkpoint */
@@ -10623,6 +10629,21 @@ GetXLogWriteRecPtr(void)
 }
 
 /*
+ * Get latest WAL checkpoint pointer
+ */
+XLogRecPtr
+GetXLogCheckpointRecPtr(void)
+{
+	XLogRecPtr	chkptr;
+
+	SpinLockAcquire(>info_lck);
+	chkptr = XLogCtl->ckptPtrRec;
+	SpinLockRelease(>info_lck);
+
+	return chkptr;
+}
+
+/*
  * Returns the redo pointer of the last checkpoint or restartpoint. This is
  * the oldest point in WAL that we still need, if we have to restart recovery.
  */
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 65465d6..0706871 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -78,6 +78,12 @@ int			BgWriterDelay = 200;
 #define LOG_SNAPSHOT_INTERVAL_MS 15000
 
 /*
+ * Minimal number of WAL segments that need to be generated before logging
+ * a standby snapshot.
+ */
+#define LOG_SNAPSHOT_MIN_SEGS		(CheckPointSegments / 8)
+
+/*
  * LSN and timestamp at which we last issued a LogStandbySnapshot(), to avoid
  * doing so too often or repeatedly if there has been no other write activity
  * in the system.
@@ -310,16 +316,19 @@ BackgroundWriterMain(void)
 		{
 			TimestampTz timeout = 0;
 			TimestampTz now = GetCurrentTimestamp();
+			XLogRecPtr	checkpoint_lsn = GetXLogCheckpointRecPtr();
+			XLogRecPtr	insert_lsn = GetXLogInsertRecPtr();
 
 			timeout = 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-04 Thread Michael Paquier
On Wed, Nov 4, 2015 at 7:33 PM, Andres Freund  wrote:
> On 2015-11-04 16:01:28 +0900, Michael Paquier wrote:
>> On Wed, Nov 4, 2015 at 8:39 AM, Andres Freund  wrote:
>> > On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier wrote:
>> >>On a completely idle system, I don't think we should log any standby
>> >>records. This is what ~9.3 does.
>> >
>> > Are you sure? I think it'll around checkpoints, no? I thought Heikki had 
>> > fixed that, but looking sound that doesn't seem to be the case.
>>
>> Er, yes, sorry. I should have used clearer words: I meant idle system
>> with something running nothing including internal checkpoints.
>
> Uh, but you'll always have checkpoints happen on wal_level =
> hot_standby, even in 9.3?  Maybe I'm not parsing your sentence right.

Reading again my previous sentence I cannot get the meaning of it
myself :) Well, I just meant that in ~9.3 LogStandbySnapshot() is
called at each checkpoint, checkpoints occurring after
checkpoint_timeout even if the system is idle.

> As soon as a single checkpoint ever happened the early-return logic in
> CreateCheckPoint() will fail to take the LogStandbySnapshot() in
> CreateCheckPoint() into account. The test is:
> if (curInsert == ControlFile->checkPoint +
> MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
> ControlFile->checkPoint == ControlFile->checkPointCopy.redo)
> which obviously doesn't work if there's been a WAL record logged after
> the redo pointer has been determined etc.

Yes. If segment switches are enforced at a pace faster than
checkpoint_timeout, this check considers that a checkpoint needs to
happen because a SWITCH_XLOG record is in-between. I am a bit
surprised that this should happen actually. The segment switch
triggers a checkpoint record, and vice-versa, even for idle systems.
Shouldn't we make this check a bit smarter then?

> The reason that a single checkpoint is needed to "jumpstart" the
> pointless checkpoints is that otherwise we'll never have issued a
> LogStandbySnapshot() and thus the above code block works if we started
> from a proper shutdown checkpoint.
>
> Independent of the idle issue, it seems to me that the location of the
> LogStandbySnapshot() is actually rather suboptimal - it really should
> really be before the CheckPointGuts(), not afterwards. As closer it's to
> the redo pointer of the checkpoint a hot standby node starts up from,
> the sooner that node can reach consistency.  There's no difference for
> the first time a node starts from a basebackup (since we gotta replay
> that checkpoint anyway before we're consistent), but if we start from a
> restartpoint...

Agreed. LogStandbySnapshot() is called after CheckPointGuts() since
its introduction in efc16ea5. This may save time. This would surely be
a master-only optimization though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-04 Thread Andres Freund
On 2015-11-04 16:01:28 +0900, Michael Paquier wrote:
> On Wed, Nov 4, 2015 at 8:39 AM, Andres Freund  wrote:
> > On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier wrote:
> >>On a completely idle system, I don't think we should log any standby
> >>records. This is what ~9.3 does.
> >
> > Are you sure? I think it'll around checkpoints, no? I thought Heikki had 
> > fixed that, but looking sound that doesn't seem to be the case.
> 
> Er, yes, sorry. I should have used clearer words: I meant idle system
> with something running nothing including internal checkpoints.

Uh, but you'll always have checkpoints happen on wal_level =
hot_standby, even in 9.3?  Maybe I'm not parsing your sentence right.

As soon as a single checkpoint ever happened the early-return logic in
CreateCheckPoint() will fail to take the LogStandbySnapshot() in
CreateCheckPoint() into account. The test is:
if (curInsert == ControlFile->checkPoint +
MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
ControlFile->checkPoint == ControlFile->checkPointCopy.redo)
which obviously doesn't work if there's been a WAL record logged after
the redo pointer has been determined etc.

The reason that a single checkpoint is needed to "jumpstart" the
pointless checkpoints is that otherwise we'll never have issued a
LogStandbySnapshot() and thus the above code block works if we started
from a proper shutdown checkpoint.


Independent of the idle issue, it seems to me that the location of the
LogStandbySnapshot() is actually rather suboptimal - it really should
really be before the CheckPointGuts(), not afterwards. As closer it's to
the redo pointer of the checkpoint a hot standby node starts up from,
the sooner that node can reach consistency.  There's no difference for
the first time a node starts from a basebackup (since we gotta replay
that checkpoint anyway before we're consistent), but if we start from a
restartpoint...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-03 Thread Robert Haas
On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes  wrote:
> If a transaction holding locks aborts on an otherwise idle server, perhaps it 
> will take a very long time for a log-shipping standby to realize this.  But I 
> have hard time believing that anyone who cares about that would be using 
> log-shipping (rather than streaming) anyway.

I'm sure other people here understand this better than me, but I
wonder if it wouldn't make more sense to somehow log this data only if
something material has changed in the data being logged.  This seems
to be trying to log something only if something else has been written
to WAL, which I'm not sure is the right test.

Also, this check here:

+   if (last_snapshot_lsn != insert_lsn &&
+   checkpoint_lsn != insert_lsn &&
+   checkpoint_lsn != previous_lsn)

...seems like it will fire if there have been 0 or 1 WAL records since
the last checkpoint, regardless of what they are.  I'm not sure that's
the right test, and it'll break again the minute we have a third thing
we want to log only if the system is non-idle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-03 Thread Andres Freund
On 2015-11-03 10:23:35 -0500, Robert Haas wrote:
> On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes  wrote:
> > If a transaction holding locks aborts on an otherwise idle server, perhaps 
> > it will take a very long time for a log-shipping standby to realize this.  
> > But I have hard time believing that anyone who cares about that would be 
> > using log-shipping (rather than streaming) anyway.
> 
> I'm sure other people here understand this better than me, but I
> wonder if it wouldn't make more sense to somehow log this data only if
> something material has changed in the data being logged.

Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
comparing the snapshot and such, especially in the back
branches.

We could maybe add something that we only log a snapshot if XXX
megabytes have been logged or something. But I don't know which number
to pick here - and if there's other write activity the price of a
snapshot record really isn't high.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-03 Thread Andres Freund
On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier 
 wrote:
>On Wed, Nov 4, 2015 at 12:43 AM, Andres Freund 
>wrote:
>> On 2015-11-03 10:23:35 -0500, Robert Haas wrote:
>>> On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes 
>wrote:
>>> > If a transaction holding locks aborts on an otherwise idle server,
>perhaps it will take a very long time for a log-shipping standby to
>realize this.  But I have hard time believing that anyone who cares
>about that would be using log-shipping (rather than streaming) anyway.
>>>
>>> I'm sure other people here understand this better than me, but I
>>> wonder if it wouldn't make more sense to somehow log this data only
>if
>>> something material has changed in the data being logged.
>>
>> Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
>> comparing the snapshot and such, especially in the back
>> branches.
>
>Well, I guess that's why I thought it would be more simple to check if
>we are at the beginning of a segment at first sight. This has no
>chance to break if anything else like that is being added in the
>future as it doesn't depend on the record types, though new similar
>records added on a timely manner would need a similar check. Perhaps
>this could be coupled by a check on the last XLOG_SWITCH_XLOG record
>instead of checkpoint activity though.
>
>> We could maybe add something that we only log a snapshot if XXX
>> megabytes have been logged or something. But I don't know which
>number
>> to pick here - and if there's other write activity the price of a
>> snapshot record really isn't high.
>
>On a completely idle system, I don't think we should log any standby
>records. This is what ~9.3 does.

Are you sure? I think it'll around checkpoints, no? I thought Heikki had fixed 
that, but looking sound that doesn't seem to be the case.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-03 Thread Michael Paquier
On Wed, Nov 4, 2015 at 12:43 AM, Andres Freund  wrote:
> On 2015-11-03 10:23:35 -0500, Robert Haas wrote:
>> On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes  wrote:
>> > If a transaction holding locks aborts on an otherwise idle server, perhaps 
>> > it will take a very long time for a log-shipping standby to realize this.  
>> > But I have hard time believing that anyone who cares about that would be 
>> > using log-shipping (rather than streaming) anyway.
>>
>> I'm sure other people here understand this better than me, but I
>> wonder if it wouldn't make more sense to somehow log this data only if
>> something material has changed in the data being logged.
>
> Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
> comparing the snapshot and such, especially in the back
> branches.

Well, I guess that's why I thought it would be more simple to check if
we are at the beginning of a segment at first sight. This has no
chance to break if anything else like that is being added in the
future as it doesn't depend on the record types, though new similar
records added on a timely manner would need a similar check. Perhaps
this could be coupled by a check on the last XLOG_SWITCH_XLOG record
instead of checkpoint activity though.

> We could maybe add something that we only log a snapshot if XXX
> megabytes have been logged or something. But I don't know which number
> to pick here - and if there's other write activity the price of a
> snapshot record really isn't high.

On a completely idle system, I don't think we should log any standby
records. This is what ~9.3 does.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-03 Thread Michael Paquier
On Wed, Nov 4, 2015 at 12:43 AM, Andres Freund  wrote:
> On 2015-11-03 10:23:35 -0500, Robert Haas wrote:
>> On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes  wrote:
>> > If a transaction holding locks aborts on an otherwise idle server, perhaps 
>> > it will take a very long time for a log-shipping standby to realize this.  
>> > But I have hard time believing that anyone who cares about that would be 
>> > using log-shipping (rather than streaming) anyway.
>>
>> I'm sure other people here understand this better than me, but I
>> wonder if it wouldn't make more sense to somehow log this data only if
>> something material has changed in the data being logged.
>
> Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
> comparing the snapshot and such, especially in the back
> branches.
>
> We could maybe add something that we only log a snapshot if XXX
> megabytes have been logged or something. But I don't know which number
> to pick here - and if there's other write activity the price of a
> snapshot record really isn't high.

My first guess on the matter is that we would like to have an extra
condition that depends on max_wal_size with at least a minimum number
of segments generated since the last standby snapshot, perhaps
max_wal_size / 16, but this coefficient is clearly a rule of thumb.
With the default configuration of 1GB, that would be waiting for 4
segments to be generated before logging in a standby snapshot.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-03 Thread Michael Paquier
On Wed, Nov 4, 2015 at 8:39 AM, Andres Freund  wrote:
> On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier wrote:
>>On a completely idle system, I don't think we should log any standby
>>records. This is what ~9.3 does.
>
> Are you sure? I think it'll around checkpoints, no? I thought Heikki had 
> fixed that, but looking sound that doesn't seem to be the case.

Er, yes, sorry. I should have used clearer words: I meant idle system
with something running nothing including internal checkpoints. An
instance indeed generates a XLOG_RUNNING_XACTS record before a
checkpoint record on an idle system.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-01 Thread Michael Paquier
On Mon, Nov 2, 2015 at 2:58 PM, Jeff Janes  wrote:
> Simple patch, applies and makes cleanly, does what it says and says what it 
> does.
>
> If a transaction holding locks aborts on an otherwise idle server, perhaps it 
> will take a very long time for a log-shipping standby to realize this.  But I 
> have hard time believing that anyone who cares about that would be using 
> log-shipping (rather than streaming) anyway.
>
> Marking it ready for committer.
>
> The new status of this patch is: Ready for Committer

Thanks! That was deadly fast.

Just wondering: shouldn't we keep the discussion around this patch on
-bugs instead? Not saying you are wrong, Jeff, I am just not sure what
would be the best practice regarding patches related to bugs. I would
think that it is at least necessary to keep the person who reported
the bug in CC to let him know the progress though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2015-11-01 Thread Jeff Janes
On Sun, Nov 1, 2015 at 11:09 PM, Michael Paquier
 wrote:
> On Mon, Nov 2, 2015 at 2:58 PM, Jeff Janes  wrote:
>> Simple patch, applies and makes cleanly, does what it says and says what it 
>> does.
>>
>> If a transaction holding locks aborts on an otherwise idle server, perhaps 
>> it will take a very long time for a log-shipping standby to realize this.  
>> But I have hard time believing that anyone who cares about that would be 
>> using log-shipping (rather than streaming) anyway.
>>
>> Marking it ready for committer.
>>
>> The new status of this patch is: Ready for Committer
>
> Thanks! That was deadly fast.
>
> Just wondering: shouldn't we keep the discussion around this patch on
> -bugs instead? Not saying you are wrong, Jeff, I am just not sure what
> would be the best practice regarding patches related to bugs. I would
> think that it is at least necessary to keep the person who reported
> the bug in CC to let him know the progress though.

I wasn't sure about -bugs vs -hackers either, but in this case I used
the review form built into the commit-fest app, and the app is what
sent the email.  As far as I know I can't change its destination or
its CC list, even if I had thought ahead to do so.

I think the bug reporter should certainly be CCed when the bug is
closed, I don't know about intermediate steps in the "sausage making"
process.  Something to think about for a bug-tracker we might
implement in the future.  I think most bugs are summarily handled by
committers, so don't go through the commitfest process at all.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers