Re: [HACKERS] recovery_target_time and standby_mode

2014-11-13 Thread Robert Haas
On Thu, Nov 13, 2014 at 2:45 AM, Josh Berkus j...@agliodbs.com wrote:
 On 11/12/2014 10:06 AM, Robert Haas wrote:
 hat *appears* to be happening is that the pause_at_recovery_target,
  followed by the restart, on the replica causes it to advance one commit
  on timeline 1.  But *not all the time*; this doesn't happen in my
  pgbench-based tests.
 
  There's a workaround for the user (they just restore the replica to 5
  minutes earlier), but I'm thinking this is a minor bug somewhere.
 I'm not sure what's going on here, but keep in mind that when you
 restart the replica, it's going to back up to the most recent
 restartpoint and begin replication from there, not from the point it
 was at when you shut down.

 Except that in the problem case, it appears to be going *forwards*.
 What would cause that?

Unfortunately, I have no idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-12 Thread Josh Berkus
On 11/07/2014 02:03 PM, Josh Berkus wrote:
 But, like I said, there's a serviceable workaround.

Some update on this.  We've seen a problem in production with this setup
which I can't reproduce as a test case, but which may jog Heikki's
memory for something to fix.

1. Recover master to 2014-11-10 12:10:00
2. Recover replica to 2014-11-10 12:10:00,
   with pause_at_recovery_target
3. reconfigure recovery.conf for streaming replication
   and restart the replica
4. get a fatal error for replication, because
   the replica is ahead of the master on timeline1

What *appears* to be happening is that the pause_at_recovery_target,
followed by the restart, on the replica causes it to advance one commit
on timeline 1.  But *not all the time*; this doesn't happen in my
pgbench-based tests.

There's a workaround for the user (they just restore the replica to 5
minutes earlier), but I'm thinking this is a minor bug somewhere.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-12 Thread Robert Haas
On Wed, Nov 12, 2014 at 12:12 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/07/2014 02:03 PM, Josh Berkus wrote:
 But, like I said, there's a serviceable workaround.

 Some update on this.  We've seen a problem in production with this setup
 which I can't reproduce as a test case, but which may jog Heikki's
 memory for something to fix.

 1. Recover master to 2014-11-10 12:10:00
 2. Recover replica to 2014-11-10 12:10:00,
with pause_at_recovery_target
 3. reconfigure recovery.conf for streaming replication
and restart the replica
 4. get a fatal error for replication, because
the replica is ahead of the master on timeline1

 What *appears* to be happening is that the pause_at_recovery_target,
 followed by the restart, on the replica causes it to advance one commit
 on timeline 1.  But *not all the time*; this doesn't happen in my
 pgbench-based tests.

 There's a workaround for the user (they just restore the replica to 5
 minutes earlier), but I'm thinking this is a minor bug somewhere.

I'm not sure what's going on here, but keep in mind that when you
restart the replica, it's going to back up to the most recent
restartpoint and begin replication from there, not from the point it
was at when you shut down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-12 Thread Josh Berkus
On 11/12/2014 10:06 AM, Robert Haas wrote:
 hat *appears* to be happening is that the pause_at_recovery_target,
  followed by the restart, on the replica causes it to advance one commit
  on timeline 1.  But *not all the time*; this doesn't happen in my
  pgbench-based tests.
 
  There's a workaround for the user (they just restore the replica to 5
  minutes earlier), but I'm thinking this is a minor bug somewhere.
 I'm not sure what's going on here, but keep in mind that when you
 restart the replica, it's going to back up to the most recent
 restartpoint and begin replication from there, not from the point it
 was at when you shut down.

Except that in the problem case, it appears to be going *forwards*.
What would cause that?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Robert Haas
On Wed, Nov 5, 2014 at 9:15 PM, Josh Berkus j...@agliodbs.com wrote:
 What I'm pointing out is that you can't actually do that.  You think you
 can, but you can't.

I do think that.  You haven't explained why I'm wrong; just asserted
than I am.  Which doesn't really get us anywhere.

However, if you do happen to want to do what you're describing, it
seems like pause_at_recovery_target might get you there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Josh Berkus
On 11/07/2014 08:12 AM, Robert Haas wrote:
 On Wed, Nov 5, 2014 at 9:15 PM, Josh Berkus j...@agliodbs.com wrote:
 What I'm pointing out is that you can't actually do that.  You think you
 can, but you can't.
 
 I do think that.  You haven't explained why I'm wrong; just asserted
 than I am.  Which doesn't really get us anywhere.

TIAS.  I've already posted the steps I took and the result.  You're
asserting that I'm wrong without even testing it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Josh Berkus
All,

The point of this thread was to determine:

Is the current interaction of recovery_target_time and standby_mode
(that is, that recovery_target_time causes standby_mode to be ignorned)
the correct behavior?

If Yes, then we have a tech bug and a doc bug:
Tech Bug: if the user sets both recovery_target_time and standby_mode in
recovery.conf, they should get an error.
Doc Bug: we should make it clear that these are exclusive options.

If No, then we have a tech bug to change the current functionality, and
a decision about backporting.

So far, I've seen one opinion (yes) on whether our current behavior is
correct or not.  For my part, I would like to have a different
interacton, but I think that's a future feature rather than a bug, as
long as we do the stuff in the Yes column.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Robert Haas
On Fri, Nov 7, 2014 at 1:35 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/07/2014 08:12 AM, Robert Haas wrote:
 On Wed, Nov 5, 2014 at 9:15 PM, Josh Berkus j...@agliodbs.com wrote:
 What I'm pointing out is that you can't actually do that.  You think you
 can, but you can't.

 I do think that.  You haven't explained why I'm wrong; just asserted
 than I am.  Which doesn't really get us anywhere.

 TIAS.  I've already posted the steps I took and the result.  You're
 asserting that I'm wrong without even testing it.

You posted the steps you originally took; namely, setting
recovery_target_time = 'SOME-PAST-TIMESTAMP' and standby_mode = 'on'.
Several people then suggested that you could accomplish your
originally stated goal - namely restore a master *and replica* to a
point in time before Bad Stuff happened, and then have a working
master-replica pair - by just connecting the new standby to the
master directly, without using recovery_target_time.  As long as
primary_conninfo and restore_command are both set, the standby should
be able to fetch older segments from the archive and then seamlessly
switch to fetching new segments from the new master.  If you tried
that and it didn't work, I don't see a description of the outcome
anywhere on this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Robert Haas
On Fri, Nov 7, 2014 at 1:40 PM, Josh Berkus j...@agliodbs.com wrote:
 Is the current interaction of recovery_target_time and standby_mode
 (that is, that recovery_target_time causes standby_mode to be ignorned)
 the correct behavior?

I think this summary of the behavior is probably not correct in
detail.  For example, if the recovery target isn't reached by the time
the standby reaches the end of archived WAL, I think standby_mode will
affect what happens next.

I do think that the documentation of standby_mode ought to mention
that recovery_target_whatever can cause the server to promote even
when standby_mode is on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Josh Berkus
On 11/07/2014 12:02 PM, Robert Haas wrote:
 Several people then suggested that you could accomplish your
 originally stated goal - namely restore a master *and replica* to a
 point in time before Bad Stuff happened, and then have a working
 master-replica pair - by just connecting the new standby to the
 master directly, without using recovery_target_time.  As long as
 primary_conninfo and restore_command are both set, the standby should
 be able to fetch older segments from the archive and then seamlessly
 switch to fetching new segments from the new master.  If you tried
 that and it didn't work, I don't see a description of the outcome
 anywhere on this thread.

Aha!

I went ahead and tested that, which I did not ever expect to work since
our documentation says it won't. The problem is that the replica
continues to catch up from the archive *past* the point where the master
is caught up.  It doesn't switch to the streaming replication connection
until it runs out of archives (which is as designed, for a variety of
good reasons).

And since the replica is now beyond the master's timeline, replication
fails with end of wal reached on timeline 1 320/478ff780; new timeline
2 forked timeline 1 before current recovery point 320/47e0.

In order for this to work, the archive would need to stop before
recovery_target_time.

On 11/07/2014 12:07 PM, Robert Haas wrote: On Fri, Nov 7, 2014 at 1:40
PM, Josh Berkus j...@agliodbs.com wrote:
 Is the current interaction of recovery_target_time and standby_mode
 (that is, that recovery_target_time causes standby_mode to be ignorned)
 the correct behavior?

 I think this summary of the behavior is probably not correct in
 detail.  For example, if the recovery target isn't reached by the time
 the standby reaches the end of archived WAL, I think standby_mode will
 affect what happens next.

Oh, yeah, good point.  So just a doc patch then.

Presumably when the recovery_target_time is reached the replica promotes
even if it's streaming?  Will test.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Robert Haas
On Fri, Nov 7, 2014 at 4:00 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/07/2014 12:02 PM, Robert Haas wrote:
 Several people then suggested that you could accomplish your
 originally stated goal - namely restore a master *and replica* to a
 point in time before Bad Stuff happened, and then have a working
 master-replica pair - by just connecting the new standby to the
 master directly, without using recovery_target_time.  As long as
 primary_conninfo and restore_command are both set, the standby should
 be able to fetch older segments from the archive and then seamlessly
 switch to fetching new segments from the new master.  If you tried
 that and it didn't work, I don't see a description of the outcome
 anywhere on this thread.

 Aha!

 I went ahead and tested that, which I did not ever expect to work since
 our documentation says it won't. The problem is that the replica
 continues to catch up from the archive *past* the point where the master
 is caught up.  It doesn't switch to the streaming replication connection
 until it runs out of archives (which is as designed, for a variety of
 good reasons).

 And since the replica is now beyond the master's timeline, replication
 fails with end of wal reached on timeline 1 320/478ff780; new timeline
 2 forked timeline 1 before current recovery point 320/47e0.

 In order for this to work, the archive would need to stop before
 recovery_target_time.

Yeah, good point.  I didn't think of the case where you've rewound the
master but not the archive.  That will indeed require some special
handling, but it also seems like a somewhat unusual setup, because if
the master is trying to archive back to that same archive, archiving
will fail, with all the usual problems that entails.  Or maybe the
master is archiving there but on a different timeline, but in that
case why can the standby follow the timeline switch when connecting
directly to the master, but not via the archive?  My brain hurts.

 Is the current interaction of recovery_target_time and standby_mode
 (that is, that recovery_target_time causes standby_mode to be ignorned)
 the correct behavior?

 I think this summary of the behavior is probably not correct in
 detail.  For example, if the recovery target isn't reached by the time
 the standby reaches the end of archived WAL, I think standby_mode will
 affect what happens next.

 Oh, yeah, good point.  So just a doc patch then.

Sounds like a plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-07 Thread Josh Berkus
On 11/07/2014 01:30 PM, Robert Haas wrote:
 On Fri, Nov 7, 2014 at 4:00 PM, Josh Berkus j...@agliodbs.com wrote:
 In order for this to work, the archive would need to stop before
 recovery_target_time.
 
 Yeah, good point.  I didn't think of the case where you've rewound the
 master but not the archive.  That will indeed require some special
 handling, but it also seems like a somewhat unusual setup, because if
 the master is trying to archive back to that same archive, archiving
 will fail, with all the usual problems that entails.  Or maybe the
 master is archiving there but on a different timeline, but in that
 case why can the standby follow the timeline switch when connecting
 directly to the master, but not via the archive?  My brain hurts.

I'm not surprised this issue hasn't come up before.  We manage
replication and archiving for many clients, and this is the first time
I've had this question.  The reason this user wants to do things this
way is that their archive storage is higher bandwidth (fiber) than their
local network, so it's faster to restore several servers in parallel
from the archive than it is to restore the master and then take basebackups.

But, like I said, there's a serviceable workaround.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-05 Thread Greg Stark
On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote:
 When the recovery_target_time is reached, switch to streaming
 replication and stay a standby.

Then shouldn't he just not specify a recovert_target at all? That's
the default behaviour for standby_mode on, the whole point of
recovery_target is to specify when to stop recovery and leave standby
mode, no?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-05 Thread Josh Berkus
On 11/05/2014 05:00 PM, Greg Stark wrote:
 On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote:
 When the recovery_target_time is reached, switch to streaming
 replication and stay a standby.
 
 Then shouldn't he just not specify a recovert_target at all? That's
 the default behaviour for standby_mode on, the whole point of
 recovery_target is to specify when to stop recovery and leave standby
 mode, no?
 

Their goal was to restore a master *and replica* to a point in time
before Bad Stuff happened, and then have a working master-replica pair.

Like I said, this is probably working as intended, we just need to
clarify it in the docs.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-05 Thread Michael Paquier
On Thu, Nov 6, 2014 at 10:00 AM, Greg Stark st...@mit.edu wrote:
 On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote:
 When the recovery_target_time is reached, switch to streaming
 replication and stay a standby.

 Then shouldn't he just not specify a recovert_target at all? That's
 the default behaviour for standby_mode on, the whole point of
 recovery_target is to specify when to stop recovery and leave standby
 mode, no?
Agreed with Greg, once a target recovery is switched the node gets out
of recovery. What the user should have done here is not specify
recovery_target_time in the standby's recovery.conf such as it follows
the master through streaming.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-05 Thread Michael Paquier
On Thu, Nov 6, 2014 at 10:41 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Nov 6, 2014 at 10:00 AM, Greg Stark st...@mit.edu wrote:
 On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote:
 When the recovery_target_time is reached, switch to streaming
 replication and stay a standby.

 Then shouldn't he just not specify a recovert_target at all? That's
 the default behaviour for standby_mode on, the whole point of
 recovery_target is to specify when to stop recovery and leave standby
 mode, no?
 Agreed with Greg, once a target recovery is switched the node gets out
 of recovery. What the user should have done here is not specify
 recovery_target_time in the standby's recovery.conf such as it follows
 the master through streaming.
Just adding: ... On the new timeline that master is bumping to. If the
standby already replayed of the point where WAL forked on master, then
the standby should be rewinded.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_time and standby_mode

2014-11-05 Thread Josh Berkus
On 11/05/2014 05:41 PM, Michael Paquier wrote:
 On Thu, Nov 6, 2014 at 10:00 AM, Greg Stark st...@mit.edu wrote:
 On Thu, Nov 6, 2014 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote:
 When the recovery_target_time is reached, switch to streaming
 replication and stay a standby.

 Then shouldn't he just not specify a recovert_target at all? That's
 the default behaviour for standby_mode on, the whole point of
 recovery_target is to specify when to stop recovery and leave standby
 mode, no?
 Agreed with Greg, once a target recovery is switched the node gets out
 of recovery. What the user should have done here is not specify
 recovery_target_time in the standby's recovery.conf such as it follows
 the master through streaming.

What I'm pointing out is that you can't actually do that.  You think you
can, but you can't.

Instead, what you need to do is:

1) Recover to target_time.
2) Pause
3) shut down the replica
4) replace recovery.conf with one which streams
5) restart replica

This is consistent behavior and makes sense when you think about it.  So
I think what we need to do is clarify in the documentation covering
recovery_target and standby_mode that they are exclusive.

Hmmm.  You know, I think this means we do have a bug.  If
recovery_target_time and standby_mode are exclusive, we should error if
the user attempts to set them both.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers