Re: [HACKERS] recovery_target_time and standby_mode
On Thu, Nov 13, 2014 at 2:45 AM, Josh Berkus j...@agliodbs.com wrote: On 11/12/2014 10:06 AM, Robert Haas wrote: hat *appears* to be happening is that the pause_at_recovery_target, followed by the restart, on the replica causes it to advance one commit on timeline 1. But *not all the time*; this doesn't happen in my pgbench-based tests. There's a workaround for the user (they just restore the replica to 5 minutes earlier), but I'm thinking this is a minor bug somewhere. I'm not sure what's going on here, but keep in mind that when you restart the replica, it's going to back up to the most recent restartpoint and begin replication from there, not from the point it was at when you shut down. Except that in the problem case, it appears to be going *forwards*. What would cause that? Unfortunately, I have no idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/07/2014 02:03 PM, Josh Berkus wrote: But, like I said, there's a serviceable workaround. Some update on this. We've seen a problem in production with this setup which I can't reproduce as a test case, but which may jog Heikki's memory for something to fix. 1. Recover master to 2014-11-10 12:10:00 2. Recover replica to 2014-11-10 12:10:00, with pause_at_recovery_target 3. reconfigure recovery.conf for streaming replication and restart the replica 4. get a fatal error for replication, because the replica is ahead of the master on timeline1 What *appears* to be happening is that the pause_at_recovery_target, followed by the restart, on the replica causes it to advance one commit on timeline 1. But *not all the time*; this doesn't happen in my pgbench-based tests. There's a workaround for the user (they just restore the replica to 5 minutes earlier), but I'm thinking this is a minor bug somewhere. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Wed, Nov 12, 2014 at 12:12 PM, Josh Berkus j...@agliodbs.com wrote: On 11/07/2014 02:03 PM, Josh Berkus wrote: But, like I said, there's a serviceable workaround. Some update on this. We've seen a problem in production with this setup which I can't reproduce as a test case, but which may jog Heikki's memory for something to fix. 1. Recover master to 2014-11-10 12:10:00 2. Recover replica to 2014-11-10 12:10:00, with pause_at_recovery_target 3. reconfigure recovery.conf for streaming replication and restart the replica 4. get a fatal error for replication, because the replica is ahead of the master on timeline1 What *appears* to be happening is that the pause_at_recovery_target, followed by the restart, on the replica causes it to advance one commit on timeline 1. But *not all the time*; this doesn't happen in my pgbench-based tests. There's a workaround for the user (they just restore the replica to 5 minutes earlier), but I'm thinking this is a minor bug somewhere. I'm not sure what's going on here, but keep in mind that when you restart the replica, it's going to back up to the most recent restartpoint and begin replication from there, not from the point it was at when you shut down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/12/2014 10:06 AM, Robert Haas wrote: hat *appears* to be happening is that the pause_at_recovery_target, followed by the restart, on the replica causes it to advance one commit on timeline 1. But *not all the time*; this doesn't happen in my pgbench-based tests. There's a workaround for the user (they just restore the replica to 5 minutes earlier), but I'm thinking this is a minor bug somewhere. I'm not sure what's going on here, but keep in mind that when you restart the replica, it's going to back up to the most recent restartpoint and begin replication from there, not from the point it was at when you shut down. Except that in the problem case, it appears to be going *forwards*. What would cause that? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Wed, Nov 5, 2014 at 9:15 PM, Josh Berkus j...@agliodbs.com wrote: What I'm pointing out is that you can't actually do that. You think you can, but you can't. I do think that. You haven't explained why I'm wrong; just asserted than I am. Which doesn't really get us anywhere. However, if you do happen to want to do what you're describing, it seems like pause_at_recovery_target might get you there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/07/2014 08:12 AM, Robert Haas wrote: On Wed, Nov 5, 2014 at 9:15 PM, Josh Berkus j...@agliodbs.com wrote: What I'm pointing out is that you can't actually do that. You think you can, but you can't. I do think that. You haven't explained why I'm wrong; just asserted than I am. Which doesn't really get us anywhere. TIAS. I've already posted the steps I took and the result. You're asserting that I'm wrong without even testing it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
All, The point of this thread was to determine: Is the current interaction of recovery_target_time and standby_mode (that is, that recovery_target_time causes standby_mode to be ignorned) the correct behavior? If Yes, then we have a tech bug and a doc bug: Tech Bug: if the user sets both recovery_target_time and standby_mode in recovery.conf, they should get an error. Doc Bug: we should make it clear that these are exclusive options. If No, then we have a tech bug to change the current functionality, and a decision about backporting. So far, I've seen one opinion (yes) on whether our current behavior is correct or not. For my part, I would like to have a different interacton, but I think that's a future feature rather than a bug, as long as we do the stuff in the Yes column. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Fri, Nov 7, 2014 at 1:35 PM, Josh Berkus j...@agliodbs.com wrote: On 11/07/2014 08:12 AM, Robert Haas wrote: On Wed, Nov 5, 2014 at 9:15 PM, Josh Berkus j...@agliodbs.com wrote: What I'm pointing out is that you can't actually do that. You think you can, but you can't. I do think that. You haven't explained why I'm wrong; just asserted than I am. Which doesn't really get us anywhere. TIAS. I've already posted the steps I took and the result. You're asserting that I'm wrong without even testing it. You posted the steps you originally took; namely, setting recovery_target_time = 'SOME-PAST-TIMESTAMP' and standby_mode = 'on'. Several people then suggested that you could accomplish your originally stated goal - namely restore a master *and replica* to a point in time before Bad Stuff happened, and then have a working master-replica pair - by just connecting the new standby to the master directly, without using recovery_target_time. As long as primary_conninfo and restore_command are both set, the standby should be able to fetch older segments from the archive and then seamlessly switch to fetching new segments from the new master. If you tried that and it didn't work, I don't see a description of the outcome anywhere on this thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Fri, Nov 7, 2014 at 1:40 PM, Josh Berkus j...@agliodbs.com wrote: Is the current interaction of recovery_target_time and standby_mode (that is, that recovery_target_time causes standby_mode to be ignorned) the correct behavior? I think this summary of the behavior is probably not correct in detail. For example, if the recovery target isn't reached by the time the standby reaches the end of archived WAL, I think standby_mode will affect what happens next. I do think that the documentation of standby_mode ought to mention that recovery_target_whatever can cause the server to promote even when standby_mode is on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/07/2014 12:02 PM, Robert Haas wrote: Several people then suggested that you could accomplish your originally stated goal - namely restore a master *and replica* to a point in time before Bad Stuff happened, and then have a working master-replica pair - by just connecting the new standby to the master directly, without using recovery_target_time. As long as primary_conninfo and restore_command are both set, the standby should be able to fetch older segments from the archive and then seamlessly switch to fetching new segments from the new master. If you tried that and it didn't work, I don't see a description of the outcome anywhere on this thread. Aha! I went ahead and tested that, which I did not ever expect to work since our documentation says it won't. The problem is that the replica continues to catch up from the archive *past* the point where the master is caught up. It doesn't switch to the streaming replication connection until it runs out of archives (which is as designed, for a variety of good reasons). And since the replica is now beyond the master's timeline, replication fails with end of wal reached on timeline 1 320/478ff780; new timeline 2 forked timeline 1 before current recovery point 320/47e0. In order for this to work, the archive would need to stop before recovery_target_time. On 11/07/2014 12:07 PM, Robert Haas wrote: On Fri, Nov 7, 2014 at 1:40 PM, Josh Berkus j...@agliodbs.com wrote: Is the current interaction of recovery_target_time and standby_mode (that is, that recovery_target_time causes standby_mode to be ignorned) the correct behavior? I think this summary of the behavior is probably not correct in detail. For example, if the recovery target isn't reached by the time the standby reaches the end of archived WAL, I think standby_mode will affect what happens next. Oh, yeah, good point. So just a doc patch then. Presumably when the recovery_target_time is reached the replica promotes even if it's streaming? Will test. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Fri, Nov 7, 2014 at 4:00 PM, Josh Berkus j...@agliodbs.com wrote: On 11/07/2014 12:02 PM, Robert Haas wrote: Several people then suggested that you could accomplish your originally stated goal - namely restore a master *and replica* to a point in time before Bad Stuff happened, and then have a working master-replica pair - by just connecting the new standby to the master directly, without using recovery_target_time. As long as primary_conninfo and restore_command are both set, the standby should be able to fetch older segments from the archive and then seamlessly switch to fetching new segments from the new master. If you tried that and it didn't work, I don't see a description of the outcome anywhere on this thread. Aha! I went ahead and tested that, which I did not ever expect to work since our documentation says it won't. The problem is that the replica continues to catch up from the archive *past* the point where the master is caught up. It doesn't switch to the streaming replication connection until it runs out of archives (which is as designed, for a variety of good reasons). And since the replica is now beyond the master's timeline, replication fails with end of wal reached on timeline 1 320/478ff780; new timeline 2 forked timeline 1 before current recovery point 320/47e0. In order for this to work, the archive would need to stop before recovery_target_time. Yeah, good point. I didn't think of the case where you've rewound the master but not the archive. That will indeed require some special handling, but it also seems like a somewhat unusual setup, because if the master is trying to archive back to that same archive, archiving will fail, with all the usual problems that entails. Or maybe the master is archiving there but on a different timeline, but in that case why can the standby follow the timeline switch when connecting directly to the master, but not via the archive? My brain hurts. Is the current interaction of recovery_target_time and standby_mode (that is, that recovery_target_time causes standby_mode to be ignorned) the correct behavior? I think this summary of the behavior is probably not correct in detail. For example, if the recovery target isn't reached by the time the standby reaches the end of archived WAL, I think standby_mode will affect what happens next. Oh, yeah, good point. So just a doc patch then. Sounds like a plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/07/2014 01:30 PM, Robert Haas wrote: On Fri, Nov 7, 2014 at 4:00 PM, Josh Berkus j...@agliodbs.com wrote: In order for this to work, the archive would need to stop before recovery_target_time. Yeah, good point. I didn't think of the case where you've rewound the master but not the archive. That will indeed require some special handling, but it also seems like a somewhat unusual setup, because if the master is trying to archive back to that same archive, archiving will fail, with all the usual problems that entails. Or maybe the master is archiving there but on a different timeline, but in that case why can the standby follow the timeline switch when connecting directly to the master, but not via the archive? My brain hurts. I'm not surprised this issue hasn't come up before. We manage replication and archiving for many clients, and this is the first time I've had this question. The reason this user wants to do things this way is that their archive storage is higher bandwidth (fiber) than their local network, so it's faster to restore several servers in parallel from the archive than it is to restore the master and then take basebackups. But, like I said, there's a serviceable workaround. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote: When the recovery_target_time is reached, switch to streaming replication and stay a standby. Then shouldn't he just not specify a recovert_target at all? That's the default behaviour for standby_mode on, the whole point of recovery_target is to specify when to stop recovery and leave standby mode, no? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/05/2014 05:00 PM, Greg Stark wrote: On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote: When the recovery_target_time is reached, switch to streaming replication and stay a standby. Then shouldn't he just not specify a recovert_target at all? That's the default behaviour for standby_mode on, the whole point of recovery_target is to specify when to stop recovery and leave standby mode, no? Their goal was to restore a master *and replica* to a point in time before Bad Stuff happened, and then have a working master-replica pair. Like I said, this is probably working as intended, we just need to clarify it in the docs. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Thu, Nov 6, 2014 at 10:00 AM, Greg Stark st...@mit.edu wrote: On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote: When the recovery_target_time is reached, switch to streaming replication and stay a standby. Then shouldn't he just not specify a recovert_target at all? That's the default behaviour for standby_mode on, the whole point of recovery_target is to specify when to stop recovery and leave standby mode, no? Agreed with Greg, once a target recovery is switched the node gets out of recovery. What the user should have done here is not specify recovery_target_time in the standby's recovery.conf such as it follows the master through streaming. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Thu, Nov 6, 2014 at 10:41 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Nov 6, 2014 at 10:00 AM, Greg Stark st...@mit.edu wrote: On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote: When the recovery_target_time is reached, switch to streaming replication and stay a standby. Then shouldn't he just not specify a recovert_target at all? That's the default behaviour for standby_mode on, the whole point of recovery_target is to specify when to stop recovery and leave standby mode, no? Agreed with Greg, once a target recovery is switched the node gets out of recovery. What the user should have done here is not specify recovery_target_time in the standby's recovery.conf such as it follows the master through streaming. Just adding: ... On the new timeline that master is bumping to. If the standby already replayed of the point where WAL forked on master, then the standby should be rewinded. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/05/2014 05:41 PM, Michael Paquier wrote: On Thu, Nov 6, 2014 at 10:00 AM, Greg Stark st...@mit.edu wrote: On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote: When the recovery_target_time is reached, switch to streaming replication and stay a standby. Then shouldn't he just not specify a recovert_target at all? That's the default behaviour for standby_mode on, the whole point of recovery_target is to specify when to stop recovery and leave standby mode, no? Agreed with Greg, once a target recovery is switched the node gets out of recovery. What the user should have done here is not specify recovery_target_time in the standby's recovery.conf such as it follows the master through streaming. What I'm pointing out is that you can't actually do that. You think you can, but you can't. Instead, what you need to do is: 1) Recover to target_time. 2) Pause 3) shut down the replica 4) replace recovery.conf with one which streams 5) restart replica This is consistent behavior and makes sense when you think about it. So I think what we need to do is clarify in the documentation covering recovery_target and standby_mode that they are exclusive. Hmmm. You know, I think this means we do have a bug. If recovery_target_time and standby_mode are exclusive, we should error if the user attempts to set them both. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers