Re: [HACKERS] Streaming-only Remastering

2012-06-17 Thread Josh Berkus
Simon,

 The major limitation was solved by repmgr close to 2 years ago now.
 So while you're correct that the patch to fix that assumed that
 archiving worked as well, it has been possible to operate happily
 without it.

repmgr is not able to remaster using only streaming replication.  It
also requires an SSH connection, as well as a bunch of other
administative setup (and compiling from source on most platforms, a not
at all insignificant obstacle).  So you haven't solved the problem,
you've just provided a somewhat less awkward packaged workaround.

It's certainly possible to devise all kinds of workarounds for the
problem; I have a few myself in Bash and Python.  What I want is to stop
using workarounds.

Without the requirement for archiving, PostgreSQL binary replication is
almost ideally simple to set up and administer.  Turn settings on in
server A and Server B, run pg_basebackup and you're replicating.  It's
like 4 steps, all but one of which can be scripted through puppet.
However, the moment you add log-shipping to the mix things get an order
of magnitude more complicated, repmgr or not.

There's really only too things standing in the way of binary replication
being completely developer-friendly.  Remastering is the big one, and
the separate recovery.conf is the small one.  We can fix both.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-17 Thread Josh Berkus

 Instead of using re-synchronization (e.g. repmgr in its relation to
 rsync), I intend to proxy and also inspect the streaming replication
 traffic and then quiesce all standbys and figure out what node is
 farthest ahead.  Once I figure out the node that is farthest ahead, if
 it is not a node that is eligible for promotion to the master, I need
 to exchange its changes to nodes that are eligible for promotion[0],
 and then promote one of those, repointing all other standbys to that
 node. This must all take place nominally within a second or thirty.
 Conceptually it is simple, but mechanically it's somewhat intense,
 especially in relation to the inconvenience of doing this incorrectly.

So you're suggesting that it would be great to be able to
double-remaster?  i.e. given OM = Original Master, 1S = standby furthest
ahead, NM = desired new master, to do:

1S --- OM --- NM

OM dies, then:

1S --- NM

until NM is caught up, then

1S --- NM

Yes?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-17 Thread Daniel Farina
On Sun, Jun 17, 2012 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote:

 Instead of using re-synchronization (e.g. repmgr in its relation to
 rsync), I intend to proxy and also inspect the streaming replication
 traffic and then quiesce all standbys and figure out what node is
 farthest ahead.  Once I figure out the node that is farthest ahead, if
 it is not a node that is eligible for promotion to the master, I need
 to exchange its changes to nodes that are eligible for promotion[0],
 and then promote one of those, repointing all other standbys to that
 node. This must all take place nominally within a second or thirty.
 Conceptually it is simple, but mechanically it's somewhat intense,
 especially in relation to the inconvenience of doing this incorrectly.

 So you're suggesting that it would be great to be able to
 double-remaster?  i.e. given OM = Original Master, 1S = standby furthest
 ahead, NM = desired new master, to do:

Yeah. Although it seems like it would degenerate to single-remastering
applied a couple times, no?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-16 Thread Daniel Farina
On Fri, Jun 15, 2012 at 3:53 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 June 2012 19:47, Joshua Berkus j...@agliodbs.com wrote:

 So currently we have a major limitation in binary replication, where it is 
 not possible to remaster your system (that is, designate the most 
 caught-up standby as the new master) based on streaming replication only.  
 This is a major limitation because the requirement to copy physical logs 
 over scp (or similar methods), manage and expire them more than doubles the 
 administrative overhead of managing replication.  This becomes even more of 
 a problem if you're doing cascading replication.

 The major limitation was solved by repmgr close to 2 years ago now.
 So while you're correct that the patch to fix that assumed that
 archiving worked as well, it has been possible to operate happily
 without it.

Remastering is one of the biggest thorns in my side over the last
year.  I don't think it's yet a trivially mechanized issue yet, but I
do need to get there, and probably a few alterations in Postgres would
help, although I have not itemized what they are (rather, I was
intending to work around problems with what I have today).  But since
it is apropos to this discussion, here's what I've been thinking along
these lines:

Instead of using re-synchronization (e.g. repmgr in its relation to
rsync), I intend to proxy and also inspect the streaming replication
traffic and then quiesce all standbys and figure out what node is
farthest ahead.  Once I figure out the node that is farthest ahead, if
it is not a node that is eligible for promotion to the master, I need
to exchange its changes to nodes that are eligible for promotion[0],
and then promote one of those, repointing all other standbys to that
node. This must all take place nominally within a second or thirty.
Conceptually it is simple, but mechanically it's somewhat intense,
especially in relation to the inconvenience of doing this incorrectly.

I surmise someone could come up with supporting mechanisms to make it
less burdensome to write.

One snarl is the interaction with the archive and restore commands:
Postgres might, for example, have been in the middle of  download and
replaying a WAL segment even when I wish to be quiesced, and there's
not a great way to stop it[1].

Ideally, I could replace those archive/dearchive commands with
software that speaks the streaming replication protocol and just have
less code involved overall.  I think that is technically possible
today, but maybe could be made easier, in particular being able to
more easily chunk and align the WAL stream into units of some kind
from the streaming protocol.  Maybe it's already possible, but it will
take a little thinking.  I had already written off getting this level
of cohesion in the next year (intending a detailed mix of
archive_command and streaming protocol software), but it's not
something that leaves me close to satisfied by any measure.

Furthermore, some use cases demand that no matter what the user
setting with regard to syncrep is that Postgres not make progress
unless it has synchronously replicated to a special piece of proxy
software.  This is useful if one wants to offload the exact location
and storage strategy for crash recovery to another piece of software.
That's the obvious next step after a cohesive delegation of
(de-)archiving.

So, all in all, Postgres has no great way to cohesively delegate all
WAL-persistence and WAL-restoration and I don't know if the streaming
protocol + sync rep facilities can completely conveniently subsume all
those use cases (but I think it probably can without enormous
modification).  I think it should learn what it needs to learn to make
that happen.  It might even allow the existing shell-command based
(de-)archiver to live as a contrib.


[0]: Use case: When a small standby used for some reporting happens to
be the farthest ahead)

[1]: Details: a simple touched file to no-op the restore_command is
unsatisfying, because the restore_command may have already been
started by postgres, so now you have to make your restore_command
coordinate with your streaming replication proxy software to be safe
or wait long enough for a single segment to replay as so one can be
assured that the system is quiesced.  I see this is an anti-feature of
the current file-based archiving strategy)

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-15 Thread Josh Berkus
On 6/10/12 11:47 AM, Joshua Berkus wrote:
 So currently we have a major limitation in binary replication, where it is 
 not possible to remaster your system (that is, designate the most caught-up 
 standby as the new master) based on streaming replication only.  This is a 
 major limitation because the requirement to copy physical logs over scp (or 
 similar methods), manage and expire them more than doubles the administrative 
 overhead of managing replication.  This becomes even more of a problem if 
 you're doing cascading replication.
 
 Therefore I think this is a high priority for 9.3.
 
 As far as I can tell, the change required for remastering over streaming is 
 relatively small; we just need to add a new record type to the streaming 
 protocol, and then start writing the timeline change to that.  Are there 
 other steps required which I'm not seeing?

*sound of crickets chirping*

Is there other work involved which isn't immediately apparent?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-15 Thread Simon Riggs
On 10 June 2012 19:47, Joshua Berkus j...@agliodbs.com wrote:

 So currently we have a major limitation in binary replication, where it is 
 not possible to remaster your system (that is, designate the most caught-up 
 standby as the new master) based on streaming replication only.  This is a 
 major limitation because the requirement to copy physical logs over scp (or 
 similar methods), manage and expire them more than doubles the administrative 
 overhead of managing replication.  This becomes even more of a problem if 
 you're doing cascading replication.

The major limitation was solved by repmgr close to 2 years ago now.
So while you're correct that the patch to fix that assumed that
archiving worked as well, it has been possible to operate happily
without it.

http://www.repmgr.org

New versions for 9.2 will be out soon.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-15 Thread Magnus Hagander
On Sat, Jun 16, 2012 at 6:53 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 June 2012 19:47, Joshua Berkus j...@agliodbs.com wrote:

 So currently we have a major limitation in binary replication, where it is 
 not possible to remaster your system (that is, designate the most 
 caught-up standby as the new master) based on streaming replication only.  
 This is a major limitation because the requirement to copy physical logs 
 over scp (or similar methods), manage and expire them more than doubles the 
 administrative overhead of managing replication.  This becomes even more of 
 a problem if you're doing cascading replication.

 The major limitation was solved by repmgr close to 2 years ago now.

It was solved for limited (but important) cases.

For example, repmgr does (afaik, maybe I missed a major update at some
point?) still require you to have set up ssh with trusted keys between
the servers. There are many usecases where that's not an acceptable
solution. One of the more obvious ones being when you're on Windows.

repmgr hasn't really *solved* it, it has provided a well working workaround...

IIRC repmgs is also GPLv3, which means that some companies just won't
look at it... Not many, but some. And it's a license that's
incompatible with PostgreSQL itself.


 So while you're correct that the patch to fix that assumed that
 archiving worked as well, it has been possible to operate happily
 without it.

 http://www.repmgr.org

 New versions for 9.2 will be out soon.

That's certainly good, but that doesn't actually solve the problem
either. It updates the good workaround.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-10 Thread Rob Wultsch
On Sun, Jun 10, 2012 at 11:47 AM, Joshua Berkus j...@agliodbs.com wrote:
 So currently we have a major limitation in binary replication, where it is 
 not possible to remaster your system (that is, designate the most caught-up 
 standby as the new master) based on streaming replication only.  This is a 
 major limitation because the requirement to copy physical logs over scp (or 
 similar methods), manage and expire them more than doubles the administrative 
 overhead of managing replication.  This becomes even more of a problem if 
 you're doing cascading replication.

 Therefore I think this is a high priority for 9.3.

 As far as I can tell, the change required for remastering over streaming is 
 relatively small; we just need to add a new record type to the streaming 
 protocol, and then start writing the timeline change to that.  Are there 
 other steps required which I'm not seeing?


Problem that may exist and is likely out of scope:
It is possible for a master with multiple slave servers to have slaves
which have not read all of the logs off of the master. It is annoying
to have to rebuild a replica because it was 1kb behind in reading logs
from the master. If the new master could deliver the last bit of the
old masters logs that would be very nice.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers