On Fri, Jun 15, 2012 at 3:53 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 10 June 2012 19:47, Joshua Berkus <j...@agliodbs.com> wrote:
>> So currently we have a major limitation in binary replication, where it is 
>> not possible to "remaster" your system (that is, designate the most 
>> caught-up standby as the new master) based on streaming replication only.  
>> This is a major limitation because the requirement to copy physical logs 
>> over scp (or similar methods), manage and expire them more than doubles the 
>> administrative overhead of managing replication.  This becomes even more of 
>> a problem if you're doing cascading replication.
> The "major limitation" was solved by repmgr close to 2 years ago now.
> So while you're correct that the patch to fix that assumed that
> archiving worked as well, it has been possible to operate happily
> without it.

Remastering is one of the biggest thorns in my side over the last
year.  I don't think it's yet a trivially mechanized issue yet, but I
do need to get there, and probably a few alterations in Postgres would
help, although I have not itemized what they are (rather, I was
intending to work around problems with what I have today).  But since
it is apropos to this discussion, here's what I've been thinking along
these lines:

Instead of using re-synchronization (e.g. repmgr in its relation to
rsync), I intend to proxy and also inspect the streaming replication
traffic and then quiesce all standbys and figure out what node is
farthest ahead.  Once I figure out the node that is farthest ahead, if
it is not a node that is eligible for promotion to the master, I need
to exchange its changes to nodes that are eligible for promotion[0],
and then promote one of those, repointing all other standbys to that
node. This must all take place nominally within a second or thirty.
Conceptually it is simple, but mechanically it's somewhat intense,
especially in relation to the inconvenience of doing this incorrectly.

I surmise someone could come up with supporting mechanisms to make it
less burdensome to write.

One snarl is the interaction with the archive and restore commands:
Postgres might, for example, have been in the middle of  download and
replaying a WAL segment even when I wish to be quiesced, and there's
not a great way to stop it[1].

Ideally, I could replace those archive/dearchive commands with
software that speaks the streaming replication protocol and just have
less code involved overall.  I think that is technically possible
today, but maybe could be made easier, in particular being able to
more easily chunk and align the WAL stream into units of some kind
from the streaming protocol.  Maybe it's already possible, but it will
take a little thinking.  I had already written off getting this level
of cohesion in the next year (intending a detailed mix of
archive_command and streaming protocol software), but it's not
something that leaves me close to satisfied by any measure.

Furthermore, some use cases demand that no matter what the user
setting with regard to syncrep is that Postgres not make progress
unless it has synchronously replicated to a special piece of proxy
software.  This is useful if one wants to offload the exact location
and storage strategy for crash recovery to another piece of software.
That's the obvious next step after a cohesive delegation of

So, all in all, Postgres has no great way to cohesively delegate all
WAL-persistence and WAL-restoration and I don't know if the streaming
protocol + sync rep facilities can completely conveniently subsume all
those use cases (but I think it probably can without enormous
modification).  I think it should learn what it needs to learn to make
that happen.  It might even allow the existing shell-command based
(de-)archiver to live as a contrib.

[0]: Use case: When a small standby used for some reporting happens to
be the farthest ahead)

[1]: Details: a simple touched file to no-op the restore_command is
unsatisfying, because the restore_command may have already been
started by postgres, so now you have to make your restore_command
coordinate with your streaming replication proxy software to be safe
or wait "long enough" for a single segment to replay as so one can be
assured that the system is quiesced.  I see this is an anti-feature of
the current file-based archiving strategy)


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to