On Fri, Jun 15, 2012 at 3:53 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On 10 June 2012 19:47, Joshua Berkus <j...@agliodbs.com> wrote: > >> So currently we have a major limitation in binary replication, where it is >> not possible to "remaster" your system (that is, designate the most >> caught-up standby as the new master) based on streaming replication only. >> This is a major limitation because the requirement to copy physical logs >> over scp (or similar methods), manage and expire them more than doubles the >> administrative overhead of managing replication. This becomes even more of >> a problem if you're doing cascading replication. > > The "major limitation" was solved by repmgr close to 2 years ago now. > So while you're correct that the patch to fix that assumed that > archiving worked as well, it has been possible to operate happily > without it.
Remastering is one of the biggest thorns in my side over the last year. I don't think it's yet a trivially mechanized issue yet, but I do need to get there, and probably a few alterations in Postgres would help, although I have not itemized what they are (rather, I was intending to work around problems with what I have today). But since it is apropos to this discussion, here's what I've been thinking along these lines: Instead of using re-synchronization (e.g. repmgr in its relation to rsync), I intend to proxy and also inspect the streaming replication traffic and then quiesce all standbys and figure out what node is farthest ahead. Once I figure out the node that is farthest ahead, if it is not a node that is eligible for promotion to the master, I need to exchange its changes to nodes that are eligible for promotion[0], and then promote one of those, repointing all other standbys to that node. This must all take place nominally within a second or thirty. Conceptually it is simple, but mechanically it's somewhat intense, especially in relation to the inconvenience of doing this incorrectly. I surmise someone could come up with supporting mechanisms to make it less burdensome to write. One snarl is the interaction with the archive and restore commands: Postgres might, for example, have been in the middle of download and replaying a WAL segment even when I wish to be quiesced, and there's not a great way to stop it[1]. Ideally, I could replace those archive/dearchive commands with software that speaks the streaming replication protocol and just have less code involved overall. I think that is technically possible today, but maybe could be made easier, in particular being able to more easily chunk and align the WAL stream into units of some kind from the streaming protocol. Maybe it's already possible, but it will take a little thinking. I had already written off getting this level of cohesion in the next year (intending a detailed mix of archive_command and streaming protocol software), but it's not something that leaves me close to satisfied by any measure. Furthermore, some use cases demand that no matter what the user setting with regard to syncrep is that Postgres not make progress unless it has synchronously replicated to a special piece of proxy software. This is useful if one wants to offload the exact location and storage strategy for crash recovery to another piece of software. That's the obvious next step after a cohesive delegation of (de-)archiving. So, all in all, Postgres has no great way to cohesively delegate all WAL-persistence and WAL-restoration and I don't know if the streaming protocol + sync rep facilities can completely conveniently subsume all those use cases (but I think it probably can without enormous modification). I think it should learn what it needs to learn to make that happen. It might even allow the existing shell-command based (de-)archiver to live as a contrib. [0]: Use case: When a small standby used for some reporting happens to be the farthest ahead) [1]: Details: a simple touched file to no-op the restore_command is unsatisfying, because the restore_command may have already been started by postgres, so now you have to make your restore_command coordinate with your streaming replication proxy software to be safe or wait "long enough" for a single segment to replay as so one can be assured that the system is quiesced. I see this is an anti-feature of the current file-based archiving strategy) -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers