Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Robert Haas
On Wed, Oct 5, 2011 at 1:19 AM, Fujii Masao masao.fu...@gmail.com wrote:
 While the system is idle, we skip duplicate checkpoints for some
 reasons. But when wal_level is set to hot_standby, I found that
 checkpoints are wrongly duplicated even while the system is idle.
 The cause is that XLOG_RUNNING_XACTS WAL record always
 follows CHECKPOINT one when wal_level is set to hot_standby.
 So the subsequent checkpoint wrongly thinks that there is inserted
 record (i.e., XLOG_RUNNING_XACTS record) since the start of the
 last checkpoint, the system is not idle, and this checkpoint cannot
 be skipped. Is this intentional behavior? Or a bug?

If we can eliminate it in some cases where it isn't necessary, that
seems like a good thing, but I'm not sure I have a good handle on when
it is or isn't necessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Simon Riggs
On Wed, Oct 5, 2011 at 6:19 AM, Fujii Masao masao.fu...@gmail.com wrote:

 While the system is idle, we skip duplicate checkpoints for some
 reasons. But when wal_level is set to hot_standby, I found that
 checkpoints are wrongly duplicated even while the system is idle.
 The cause is that XLOG_RUNNING_XACTS WAL record always
 follows CHECKPOINT one when wal_level is set to hot_standby.
 So the subsequent checkpoint wrongly thinks that there is inserted
 record (i.e., XLOG_RUNNING_XACTS record) since the start of the
 last checkpoint, the system is not idle, and this checkpoint cannot
 be skipped. Is this intentional behavior? Or a bug?

I think it is avoidable behaviour, but not a bug.

Thinking some more about this, IMHO it is possible to improve the
situation greatly by returning to look at the true purpose of
checkpoints. Checkpoints exist to minimise the time taken during crash
recovery, and as starting points for backups/archive recoveries.

The current idea is that if there has been no activity then we skip
checkpoint. But all it takes is a single WAL record and off we go with
another checkpoint. If there hasn't been much WAL activity, there is
not much point in having another checkpoint record since there is
little if any time to be saved in recovery.

So why not avoid checkpoints until we have written at least 1 WAL file
worth of data? That way checkpoint records are always in different
files, so we are safer with regard to primary and secondary checkpoint
records. That would mean in some cases that dirty data would stay in
shared buffers for days or weeks? No, because the bgwriter would clean
it - but even if it did, so what? Recovery will still be incredibly
quick, which is the whole point.

Testing whether we're in a different segment is easy and much simpler
than trying to wriggle around trying to directly fix the problem you
mention. Patch attached.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


spaced_checkpoints.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 The current idea is that if there has been no activity then we skip
 checkpoint. But all it takes is a single WAL record and off we go with
 another checkpoint. If there hasn't been much WAL activity, there is
 not much point in having another checkpoint record since there is
 little if any time to be saved in recovery.

 So why not avoid checkpoints until we have written at least 1 WAL file
 worth of data?

+1, but I think you need to compare to the last checkpoint's REDO
pointer, not to the position of the checkpoint record itself.
Otherwise, the argument falls down if there was a lot of activity
during the last checkpoint (which is not unlikely in these days of
spread checkpoints).

Also I think the comment needs more extensive revision than you gave it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Robert Haas
On Thu, Oct 6, 2011 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The current idea is that if there has been no activity then we skip
 checkpoint. But all it takes is a single WAL record and off we go with
 another checkpoint. If there hasn't been much WAL activity, there is
 not much point in having another checkpoint record since there is
 little if any time to be saved in recovery.

 So why not avoid checkpoints until we have written at least 1 WAL file
 worth of data?

 +1, but I think you need to compare to the last checkpoint's REDO
 pointer, not to the position of the checkpoint record itself.
 Otherwise, the argument falls down if there was a lot of activity
 during the last checkpoint (which is not unlikely in these days of
 spread checkpoints).

 Also I think the comment needs more extensive revision than you gave it.

If we go with this approach, we presumably also need to update the
documentation, especially for checkpoint_timeout (which will no longer
be a hard limit on the time between checkpoints).

I'm not entirely sure I understand the rationale, though.  I mean, if
very little has happened since the last checkpoint, then the
checkpoint will be very cheap.  In the totally degenerate case Fujii
Masao is reporting, where absolutely nothing has happened, it should
be basically free.  We'll loop through a whole bunch of things, decide
there's nothing to fsync, and call it a day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm not entirely sure I understand the rationale, though.  I mean, if
 very little has happened since the last checkpoint, then the
 checkpoint will be very cheap.  In the totally degenerate case Fujii
 Masao is reporting, where absolutely nothing has happened, it should
 be basically free.  We'll loop through a whole bunch of things, decide
 there's nothing to fsync, and call it a day.

I think the point is that a totally idle database should not continue to
emit WAL, not even at a slow rate.  There are also power-consumption
objections to allowing the checkpoint process to fire up to no purpose.

The larger issue is that we should not only be concerned with optimizing
for high load.  Doing nothing when there's nothing to be done is an
important part of good data-center citizenship.  See also the ongoing
work to avoid unnecessary process wakeups.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Robert Haas
On Thu, Oct 6, 2011 at 12:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm not entirely sure I understand the rationale, though.  I mean, if
 very little has happened since the last checkpoint, then the
 checkpoint will be very cheap.  In the totally degenerate case Fujii
 Masao is reporting, where absolutely nothing has happened, it should
 be basically free.  We'll loop through a whole bunch of things, decide
 there's nothing to fsync, and call it a day.

 I think the point is that a totally idle database should not continue to
 emit WAL, not even at a slow rate.  There are also power-consumption
 objections to allowing the checkpoint process to fire up to no purpose.

Hmm, OK.  I still think it's a little funny to say that
checkpoint_timeout will force a checkpoint every N minutes except when
it doesn't, but maybe there's no real harm in that as long as we
document it properly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 
 I think the point is that a totally idle database should not
 continue to emit WAL, not even at a slow rate.  There are also
 power-consumption objections to allowing the checkpoint process
 to fire up to no purpose.
 
 Hmm, OK.  I still think it's a little funny to say that
 checkpoint_timeout will force a checkpoint every N minutes except
 when it doesn't, but maybe there's no real harm in that as long as
 we document it properly.
 
What will be the best way to determine, from looking only at a
standby, whether replication is healthy and progressing?  Going back
to the 8.1 warm standby days we have run pg_controldata and compared
the Time of latest checkpoint to current time; I'm hoping there's
a better way now, so that we can drop that kludge.  If not, I would
like a way to kick out a checkpoint from the master at some finite
and configurable interval for monitoring purposes.  I'm all for
having a nice, sharp fillet knife, but if that's not available,
please don't take away this rock I chipped at until it had an
edge.
 
Most likely there's something there which I've missed, but it's
really nice to be able to tell the difference between a quiet master
and broken replication.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Simon Riggs
On Thu, Oct 6, 2011 at 5:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The current idea is that if there has been no activity then we skip
 checkpoint. But all it takes is a single WAL record and off we go with
 another checkpoint. If there hasn't been much WAL activity, there is
 not much point in having another checkpoint record since there is
 little if any time to be saved in recovery.

 So why not avoid checkpoints until we have written at least 1 WAL file
 worth of data?

 +1, but I think you need to compare to the last checkpoint's REDO
 pointer, not to the position of the checkpoint record itself.
 Otherwise, the argument falls down if there was a lot of activity
 during the last checkpoint (which is not unlikely in these days of
 spread checkpoints).

Agreed, that's better.

 Also I think the comment needs more extensive revision than you gave it.

OK, will do.

New version attached. Docs later.

Do we want this backpatched? If so, suggest just 9.1 and 9.0?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


spaced_checkpoints.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Do we want this backpatched? If so, suggest just 9.1 and 9.0?

-1 for backpatching; it's more an improvement than a bug fix.

In any case, I think we still need to respond to the point Kevin made
about how to tell an idle master from broken replication.  Right now,
you will get at least a few bytes of data every checkpoint_timeout
seconds.  If we change this, you won't.

I'm inclined to think that the way to deal with that is not to force out
useless WAL data, but to add some sort of explicit I'm alive heartbeat
signal to the walsender/walreceiver protocol.  The hard part of that is
to figure out how to expose it where you can see it on the slave side
--- or do we have a status view that could handle that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Oct 6, 2011 at 12:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think the point is that a totally idle database should not continue to
 emit WAL, not even at a slow rate.  There are also power-consumption
 objections to allowing the checkpoint process to fire up to no purpose.

 Hmm, OK.  I still think it's a little funny to say that
 checkpoint_timeout will force a checkpoint every N minutes except when
 it doesn't, but maybe there's no real harm in that as long as we
 document it properly.

Well ... if we think that it's sane to only checkpoint once per WAL
segment, maybe we should just take out checkpoint_timeout.

We'd need some other mechanism to address replication use-cases, but see
my comments to Simon's followup patch just now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Heikki Linnakangas

On 06.10.2011 20:58, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Thu, Oct 6, 2011 at 12:44 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

I think the point is that a totally idle database should not continue to
emit WAL, not even at a slow rate.  There are also power-consumption
objections to allowing the checkpoint process to fire up to no purpose.



Hmm, OK.  I still think it's a little funny to say that
checkpoint_timeout will force a checkpoint every N minutes except when
it doesn't, but maybe there's no real harm in that as long as we
document it properly.


Well ... if we think that it's sane to only checkpoint once per WAL
segment, maybe we should just take out checkpoint_timeout.


Huh? Surely not, in my mind checkpoint_timeout is the primary way of 
controlling checkpoints, and checkpoint_segments you just set high 
enough so that you never reach it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Simon Riggs
On Thu, Oct 6, 2011 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Do we want this backpatched? If so, suggest just 9.1 and 9.0?

 -1 for backpatching; it's more an improvement than a bug fix.

OK, works for me.

 In any case, I think we still need to respond to the point Kevin made
 about how to tell an idle master from broken replication.  Right now,
 you will get at least a few bytes of data every checkpoint_timeout
 seconds.  If we change this, you won't.

 I'm inclined to think that the way to deal with that is not to force out
 useless WAL data, but to add some sort of explicit I'm alive heartbeat
 signal to the walsender/walreceiver protocol.  The hard part of that is
 to figure out how to expose it where you can see it on the slave side
 --- or do we have a status view that could handle that?

Different but related issue and yes, am on it, and yes, the way you just said.

I foresee a function that tells you the delay based on a protocol
message of 'k' for keepalive.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Robert Haas
On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Do we want this backpatched? If so, suggest just 9.1 and 9.0?

 -1 for backpatching; it's more an improvement than a bug fix.

 In any case, I think we still need to respond to the point Kevin made
 about how to tell an idle master from broken replication.  Right now,
 you will get at least a few bytes of data every checkpoint_timeout
 seconds.  If we change this, you won't.

 I'm inclined to think that the way to deal with that is not to force out
 useless WAL data, but to add some sort of explicit I'm alive heartbeat
 signal to the walsender/walreceiver protocol.  The hard part of that is
 to figure out how to expose it where you can see it on the slave side
 --- or do we have a status view that could handle that?

As of 9.1, we already have something very much like this, in the
opposite direction.  See wal_receiver_status_interval and
replication_timeout.  I bet we could adapt that slightly to work in
the other direction, too.  But that'll only work with streaming
replication - do we care about the WAL shipping case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm inclined to think that the way to deal with that is not to force out
 useless WAL data, but to add some sort of explicit I'm alive heartbeat
 signal to the walsender/walreceiver protocol.  The hard part of that is
 to figure out how to expose it where you can see it on the slave side
 --- or do we have a status view that could handle that?

 As of 9.1, we already have something very much like this, in the
 opposite direction.  See wal_receiver_status_interval and
 replication_timeout.  I bet we could adapt that slightly to work in
 the other direction, too.  But that'll only work with streaming
 replication - do we care about the WAL shipping case?

I can't get excited about the WAL-shipping case. The artifact that we'll
generate a checkpoint record every few minutes does not create enough
WAL volume for WAL-shipping to reliably generate a heartbeat signal.
It'd be way too long between filling up segments if that were the only
WAL data being generated.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Robert Haas
On Thu, Oct 6, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm inclined to think that the way to deal with that is not to force out
 useless WAL data, but to add some sort of explicit I'm alive heartbeat
 signal to the walsender/walreceiver protocol.  The hard part of that is
 to figure out how to expose it where you can see it on the slave side
 --- or do we have a status view that could handle that?

 As of 9.1, we already have something very much like this, in the
 opposite direction.  See wal_receiver_status_interval and
 replication_timeout.  I bet we could adapt that slightly to work in
 the other direction, too.  But that'll only work with streaming
 replication - do we care about the WAL shipping case?

 I can't get excited about the WAL-shipping case. The artifact that we'll
 generate a checkpoint record every few minutes does not create enough
 WAL volume for WAL-shipping to reliably generate a heartbeat signal.
 It'd be way too long between filling up segments if that were the only
 WAL data being generated.

Depends how you set archive_timeout, I think.

Anyway, I'm not saying we *have* to care about that case.  I'm just
asking whether we do, before we go start changing things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 I foresee a function that tells you the delay based on a protocol
 message of 'k' for keepalive.
 
If the delay you mention is basically a ping time or something
similar, that would answer the need I've been on about.  We need to
know, based on access to the replica, that the replication system is
alive and well even with an idle master; as it can otherwise be hard
to distinguish an idle master from a failed replication system
(broken connection, misconfigured replication, etc.).
 
Right now there is a periodic checkpoint which flows through the WAL
and affects the pg_controldata report on the replica -- we've been
using that for monitoring.  Any sort of heartbeat or ping which
provides sign-of-life on the connection, accessible on the replica,
should do for our purposes. If it only works over streaming
replication on a hot standby, that's OK -- we plan to be running
everything that way before 9.2 comes out, as long as we can
materialize traditional WAL files on the receiving end from the SR
stream.
 
-1 on changing the checkpoint behavior before 9.2, though.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoints are duplicated even while the system is idle

2011-10-06 Thread Simon Riggs
On Thu, Oct 6, 2011 at 7:18 PM, Robert Haas robertmh...@gmail.com wrote:

 As of 9.1, we already have something very much like this, in the
 opposite direction.

Yes Robert, I wrote it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers