Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Wed, Oct 5, 2011 at 1:19 AM, Fujii Masao masao.fu...@gmail.com wrote: While the system is idle, we skip duplicate checkpoints for some reasons. But when wal_level is set to hot_standby, I found that checkpoints are wrongly duplicated even while the system is idle. The cause is that XLOG_RUNNING_XACTS WAL record always follows CHECKPOINT one when wal_level is set to hot_standby. So the subsequent checkpoint wrongly thinks that there is inserted record (i.e., XLOG_RUNNING_XACTS record) since the start of the last checkpoint, the system is not idle, and this checkpoint cannot be skipped. Is this intentional behavior? Or a bug? If we can eliminate it in some cases where it isn't necessary, that seems like a good thing, but I'm not sure I have a good handle on when it is or isn't necessary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Wed, Oct 5, 2011 at 6:19 AM, Fujii Masao masao.fu...@gmail.com wrote: While the system is idle, we skip duplicate checkpoints for some reasons. But when wal_level is set to hot_standby, I found that checkpoints are wrongly duplicated even while the system is idle. The cause is that XLOG_RUNNING_XACTS WAL record always follows CHECKPOINT one when wal_level is set to hot_standby. So the subsequent checkpoint wrongly thinks that there is inserted record (i.e., XLOG_RUNNING_XACTS record) since the start of the last checkpoint, the system is not idle, and this checkpoint cannot be skipped. Is this intentional behavior? Or a bug? I think it is avoidable behaviour, but not a bug. Thinking some more about this, IMHO it is possible to improve the situation greatly by returning to look at the true purpose of checkpoints. Checkpoints exist to minimise the time taken during crash recovery, and as starting points for backups/archive recoveries. The current idea is that if there has been no activity then we skip checkpoint. But all it takes is a single WAL record and off we go with another checkpoint. If there hasn't been much WAL activity, there is not much point in having another checkpoint record since there is little if any time to be saved in recovery. So why not avoid checkpoints until we have written at least 1 WAL file worth of data? That way checkpoint records are always in different files, so we are safer with regard to primary and secondary checkpoint records. That would mean in some cases that dirty data would stay in shared buffers for days or weeks? No, because the bgwriter would clean it - but even if it did, so what? Recovery will still be incredibly quick, which is the whole point. Testing whether we're in a different segment is easy and much simpler than trying to wriggle around trying to directly fix the problem you mention. Patch attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services spaced_checkpoints.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
Simon Riggs si...@2ndquadrant.com writes: The current idea is that if there has been no activity then we skip checkpoint. But all it takes is a single WAL record and off we go with another checkpoint. If there hasn't been much WAL activity, there is not much point in having another checkpoint record since there is little if any time to be saved in recovery. So why not avoid checkpoints until we have written at least 1 WAL file worth of data? +1, but I think you need to compare to the last checkpoint's REDO pointer, not to the position of the checkpoint record itself. Otherwise, the argument falls down if there was a lot of activity during the last checkpoint (which is not unlikely in these days of spread checkpoints). Also I think the comment needs more extensive revision than you gave it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Thu, Oct 6, 2011 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: The current idea is that if there has been no activity then we skip checkpoint. But all it takes is a single WAL record and off we go with another checkpoint. If there hasn't been much WAL activity, there is not much point in having another checkpoint record since there is little if any time to be saved in recovery. So why not avoid checkpoints until we have written at least 1 WAL file worth of data? +1, but I think you need to compare to the last checkpoint's REDO pointer, not to the position of the checkpoint record itself. Otherwise, the argument falls down if there was a lot of activity during the last checkpoint (which is not unlikely in these days of spread checkpoints). Also I think the comment needs more extensive revision than you gave it. If we go with this approach, we presumably also need to update the documentation, especially for checkpoint_timeout (which will no longer be a hard limit on the time between checkpoints). I'm not entirely sure I understand the rationale, though. I mean, if very little has happened since the last checkpoint, then the checkpoint will be very cheap. In the totally degenerate case Fujii Masao is reporting, where absolutely nothing has happened, it should be basically free. We'll loop through a whole bunch of things, decide there's nothing to fsync, and call it a day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
Robert Haas robertmh...@gmail.com writes: I'm not entirely sure I understand the rationale, though. I mean, if very little has happened since the last checkpoint, then the checkpoint will be very cheap. In the totally degenerate case Fujii Masao is reporting, where absolutely nothing has happened, it should be basically free. We'll loop through a whole bunch of things, decide there's nothing to fsync, and call it a day. I think the point is that a totally idle database should not continue to emit WAL, not even at a slow rate. There are also power-consumption objections to allowing the checkpoint process to fire up to no purpose. The larger issue is that we should not only be concerned with optimizing for high load. Doing nothing when there's nothing to be done is an important part of good data-center citizenship. See also the ongoing work to avoid unnecessary process wakeups. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Thu, Oct 6, 2011 at 12:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm not entirely sure I understand the rationale, though. I mean, if very little has happened since the last checkpoint, then the checkpoint will be very cheap. In the totally degenerate case Fujii Masao is reporting, where absolutely nothing has happened, it should be basically free. We'll loop through a whole bunch of things, decide there's nothing to fsync, and call it a day. I think the point is that a totally idle database should not continue to emit WAL, not even at a slow rate. There are also power-consumption objections to allowing the checkpoint process to fire up to no purpose. Hmm, OK. I still think it's a little funny to say that checkpoint_timeout will force a checkpoint every N minutes except when it doesn't, but maybe there's no real harm in that as long as we document it properly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
Robert Haas robertmh...@gmail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: I think the point is that a totally idle database should not continue to emit WAL, not even at a slow rate. There are also power-consumption objections to allowing the checkpoint process to fire up to no purpose. Hmm, OK. I still think it's a little funny to say that checkpoint_timeout will force a checkpoint every N minutes except when it doesn't, but maybe there's no real harm in that as long as we document it properly. What will be the best way to determine, from looking only at a standby, whether replication is healthy and progressing? Going back to the 8.1 warm standby days we have run pg_controldata and compared the Time of latest checkpoint to current time; I'm hoping there's a better way now, so that we can drop that kludge. If not, I would like a way to kick out a checkpoint from the master at some finite and configurable interval for monitoring purposes. I'm all for having a nice, sharp fillet knife, but if that's not available, please don't take away this rock I chipped at until it had an edge. Most likely there's something there which I've missed, but it's really nice to be able to tell the difference between a quiet master and broken replication. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Thu, Oct 6, 2011 at 5:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: The current idea is that if there has been no activity then we skip checkpoint. But all it takes is a single WAL record and off we go with another checkpoint. If there hasn't been much WAL activity, there is not much point in having another checkpoint record since there is little if any time to be saved in recovery. So why not avoid checkpoints until we have written at least 1 WAL file worth of data? +1, but I think you need to compare to the last checkpoint's REDO pointer, not to the position of the checkpoint record itself. Otherwise, the argument falls down if there was a lot of activity during the last checkpoint (which is not unlikely in these days of spread checkpoints). Agreed, that's better. Also I think the comment needs more extensive revision than you gave it. OK, will do. New version attached. Docs later. Do we want this backpatched? If so, suggest just 9.1 and 9.0? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services spaced_checkpoints.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
Simon Riggs si...@2ndquadrant.com writes: Do we want this backpatched? If so, suggest just 9.1 and 9.0? -1 for backpatching; it's more an improvement than a bug fix. In any case, I think we still need to respond to the point Kevin made about how to tell an idle master from broken replication. Right now, you will get at least a few bytes of data every checkpoint_timeout seconds. If we change this, you won't. I'm inclined to think that the way to deal with that is not to force out useless WAL data, but to add some sort of explicit I'm alive heartbeat signal to the walsender/walreceiver protocol. The hard part of that is to figure out how to expose it where you can see it on the slave side --- or do we have a status view that could handle that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
Robert Haas robertmh...@gmail.com writes: On Thu, Oct 6, 2011 at 12:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think the point is that a totally idle database should not continue to emit WAL, not even at a slow rate. There are also power-consumption objections to allowing the checkpoint process to fire up to no purpose. Hmm, OK. I still think it's a little funny to say that checkpoint_timeout will force a checkpoint every N minutes except when it doesn't, but maybe there's no real harm in that as long as we document it properly. Well ... if we think that it's sane to only checkpoint once per WAL segment, maybe we should just take out checkpoint_timeout. We'd need some other mechanism to address replication use-cases, but see my comments to Simon's followup patch just now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On 06.10.2011 20:58, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Thu, Oct 6, 2011 at 12:44 PM, Tom Lanet...@sss.pgh.pa.us wrote: I think the point is that a totally idle database should not continue to emit WAL, not even at a slow rate. There are also power-consumption objections to allowing the checkpoint process to fire up to no purpose. Hmm, OK. I still think it's a little funny to say that checkpoint_timeout will force a checkpoint every N minutes except when it doesn't, but maybe there's no real harm in that as long as we document it properly. Well ... if we think that it's sane to only checkpoint once per WAL segment, maybe we should just take out checkpoint_timeout. Huh? Surely not, in my mind checkpoint_timeout is the primary way of controlling checkpoints, and checkpoint_segments you just set high enough so that you never reach it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Thu, Oct 6, 2011 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Do we want this backpatched? If so, suggest just 9.1 and 9.0? -1 for backpatching; it's more an improvement than a bug fix. OK, works for me. In any case, I think we still need to respond to the point Kevin made about how to tell an idle master from broken replication. Right now, you will get at least a few bytes of data every checkpoint_timeout seconds. If we change this, you won't. I'm inclined to think that the way to deal with that is not to force out useless WAL data, but to add some sort of explicit I'm alive heartbeat signal to the walsender/walreceiver protocol. The hard part of that is to figure out how to expose it where you can see it on the slave side --- or do we have a status view that could handle that? Different but related issue and yes, am on it, and yes, the way you just said. I foresee a function that tells you the delay based on a protocol message of 'k' for keepalive. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Do we want this backpatched? If so, suggest just 9.1 and 9.0? -1 for backpatching; it's more an improvement than a bug fix. In any case, I think we still need to respond to the point Kevin made about how to tell an idle master from broken replication. Right now, you will get at least a few bytes of data every checkpoint_timeout seconds. If we change this, you won't. I'm inclined to think that the way to deal with that is not to force out useless WAL data, but to add some sort of explicit I'm alive heartbeat signal to the walsender/walreceiver protocol. The hard part of that is to figure out how to expose it where you can see it on the slave side --- or do we have a status view that could handle that? As of 9.1, we already have something very much like this, in the opposite direction. See wal_receiver_status_interval and replication_timeout. I bet we could adapt that slightly to work in the other direction, too. But that'll only work with streaming replication - do we care about the WAL shipping case? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
Robert Haas robertmh...@gmail.com writes: On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm inclined to think that the way to deal with that is not to force out useless WAL data, but to add some sort of explicit I'm alive heartbeat signal to the walsender/walreceiver protocol. The hard part of that is to figure out how to expose it where you can see it on the slave side --- or do we have a status view that could handle that? As of 9.1, we already have something very much like this, in the opposite direction. See wal_receiver_status_interval and replication_timeout. I bet we could adapt that slightly to work in the other direction, too. But that'll only work with streaming replication - do we care about the WAL shipping case? I can't get excited about the WAL-shipping case. The artifact that we'll generate a checkpoint record every few minutes does not create enough WAL volume for WAL-shipping to reliably generate a heartbeat signal. It'd be way too long between filling up segments if that were the only WAL data being generated. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Thu, Oct 6, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm inclined to think that the way to deal with that is not to force out useless WAL data, but to add some sort of explicit I'm alive heartbeat signal to the walsender/walreceiver protocol. The hard part of that is to figure out how to expose it where you can see it on the slave side --- or do we have a status view that could handle that? As of 9.1, we already have something very much like this, in the opposite direction. See wal_receiver_status_interval and replication_timeout. I bet we could adapt that slightly to work in the other direction, too. But that'll only work with streaming replication - do we care about the WAL shipping case? I can't get excited about the WAL-shipping case. The artifact that we'll generate a checkpoint record every few minutes does not create enough WAL volume for WAL-shipping to reliably generate a heartbeat signal. It'd be way too long between filling up segments if that were the only WAL data being generated. Depends how you set archive_timeout, I think. Anyway, I'm not saying we *have* to care about that case. I'm just asking whether we do, before we go start changing things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
Simon Riggs si...@2ndquadrant.com wrote: I foresee a function that tells you the delay based on a protocol message of 'k' for keepalive. If the delay you mention is basically a ping time or something similar, that would answer the need I've been on about. We need to know, based on access to the replica, that the replication system is alive and well even with an idle master; as it can otherwise be hard to distinguish an idle master from a failed replication system (broken connection, misconfigured replication, etc.). Right now there is a periodic checkpoint which flows through the WAL and affects the pg_controldata report on the replica -- we've been using that for monitoring. Any sort of heartbeat or ping which provides sign-of-life on the connection, accessible on the replica, should do for our purposes. If it only works over streaming replication on a hot standby, that's OK -- we plan to be running everything that way before 9.2 comes out, as long as we can materialize traditional WAL files on the receiving end from the SR stream. -1 on changing the checkpoint behavior before 9.2, though. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints are duplicated even while the system is idle
On Thu, Oct 6, 2011 at 7:18 PM, Robert Haas robertmh...@gmail.com wrote: As of 9.1, we already have something very much like this, in the opposite direction. Yes Robert, I wrote it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers