Re: [HACKERS] Simplifying replication

Josh Berkus Tue, 19 Oct 2010 09:17:01 -0700

Dimitri, Greg,

I want to say a big big +1 here. The way replication and PITR setup are
implemented now are a very good prototype, it's time to consolidate and
get to something usable by normal people, as opposed to PostgreSQL full
time geeks.

Well, one thing to be addressed is separating the PITR functionalityfrom replication. PITR needs a lot of features -- timelines, recoverystop points, etc. -- which replication doesn't need or want. I thinkthat focussing on streaming replication functionality and ignoring thearchive logs case is probably the best way to logically separate thesetwo. Presumably anyone who needs archive logs as well will be aprofessional DBA.

I could prepare a patch given some advice on the replication protocol
integration. For one, is streaming a base backup something that
walsender should care about?

Yeah, I thought there was a prototype for this somewhere. From a userperspective, using a 2nd pgport connection for the initial clone isfine. I don't know if we want to worry about it otherwise from aresource management perspective; presumably the cloning process is goingto be a pretty big performance hit on the master.

BTW, do we have a clear idea of how to implement pg_ping, and should it
reports current WAL location(s) of a standby?


pg_ping?

That needs a way to define a group of standby. There's nothing there
that makes them know about each other.

Let me clarify. I meant that if I try to make a *single* standby pointto a new master, and that new master was behind the standby when itfailed over, then the attempt to remaster should fail with an error.


I do *not* want to get into standby groups.  That way lies madness.  ;-)

Now say we have pg_ping (or another tool) returning the current recv,
applied and synced LSNs, it would be possible for any standby to figure
out which other ones must be shot in case you failover here. The
failover command could list those other standby in the group that you're
behind of, and with a force command allow you to still failover to this
one. Now you have to STONITH the one listed, but that's your problem
after all.

The LSN isn't enough; as others have pointed out, we have a fairlyserious failure case if a standby comes up as a master, acceptstransactions, and then we try to remaster a 2nd standby which wasactually ahead of the first standby at the time of master failure. Ihaven't seen a solution posted to that yet; maybe I missed it?


> Sorry, next time I'll make sure to bash Robert too. I don't have any
> problems with the basic ideas you're proposing, just concerns about when
> the right time to get into that whole giant subject is and who is going
> to work on.

If not now, when? The 2nd CommitFest is almost complete. If we'regoing to make any substantial changes, we need to have patches for the3rd commitfest. And I didn't see anyone discussing simplification untilI brought it up.

I don't realistically think that we're going to get 100% simplificationfor 9.1. But it would be nice to at least get some components, whichmeans getting agreement on how things should work, at least roughly.


--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

Reply via email to