Re: [HACKERS] Design for In-Core Logical Replication

Craig Ringer Wed, 20 Jul 2016 18:36:49 -0700

On 21 July 2016 at 01:20, Simon Riggs <[email protected]> wrote:

> On 20 July 2016 at 17:52, Rod Taylor <[email protected]> wrote:
>
>
>> I think it's important for communication channels to be defined
>> separately from the subscriptions.
>>
>
> I agree and believe it will be that way.
>
> Craig is working on allowing Replication Slots to failover between nodes,
> to provide exactly that requested behaviour.
>
>


First, I'd like to emphasise that logical replication has been stalled for
ages now because we can no longer make forward progress on core features
needed until we have in-core logical replication (they're dismissed as
irrelevant, no in core users, etc) - but we have also had difficulty
getting logical replication into core. To break this impasse we really need
logical replication in core and need to focus on getting the minimum viable
feature in place, not trying to make it do everything all at once.
Point-to-point replication with no forwarding should be just fine for the
first release. Lets not bog this in extra "must have" features that aren't
actually crucial.

That said:

I had a patch in it for 9.6 to provide the foundations for logical
replication to follow physical failover, but it got pulled at the last
minute. It'll be submitted for 10.0 along with some other enhancements to
make it usable without hacky extensions, most notably support for using a
physical replication slot and hot standby feedback to pin a master's
catalog_xmin where it's needed by slots on a physical replica.

That's for when we're combining physical and logical replication though,
e.g. "node A" is a master/standby pair, and "node B" is also a
master/standby pair.

For non-star logical topologies, which is what I think you might've been
referring to, it's necessary to have:

- Node identity
- Which nodes we want to receive data from
- How we connect to each node

all of which are separate things. Who's out there, what we want from them,
and how to get it.

pglogical doesn't really separate the latter two much at this point.
Subscriptions identify both the node to connect to and the data we want to
receive from a node; there's no selective data forwarding from one node to
another. Though there's room for that in pglogical's hooks/filters by using
filtering by replication origin, it just doesn't do it yet.

It sounds like that's what you're getting at. Wanting to be able to say
"node A wants to get data from node B and node C" separately to "node A
connects to node B to receive data", with the replication system somehow
working out that that means data written from C to B should be forwarded to
A.

Right?

If so, it's not always easy to figure that out. If you create connections
to both B and C, we then have to automagically work out that we should stop
forwarding data from C over our connection to B.

The plan with pglogical has been to allow connections to specify forwarding
options, so the connection explicitly says what nodes it wants to get data
from. It's users' job to ensure that they don't declare connections that
overlap. This is simpler to implement, but harder to admin.

One challenge with either approach is ensuring a consistent switchover. If
you have a single connection A=>B receiving data from [B,C], then you
switch to two connections A=>B and A=>C with neither forwarding, you must
ensure that the switchover occurs in such a way as that no data is
replicated twice or skipped. That's made easier by the fact that we have
replication origins and we can actually safely receive from both at the
same time then discard from one of them, even use upstream filtering to
avoid sending it over the wire twice. But it does take care and caution.

Note that none of this is actually for logical _failover_, where we lose a
node. For that we need some extra help in the form of placeholder slots
maintained on other peers. This can be done at the application /
replication system level without the need for new core features, but it
might not be something we can support in the first iteration.

I'm not sure how Petr's current design for in-core replication addresses
this, if it does, or whether it's presently focused only on point-to-point
replication like pglogical. As far as I'm concerned so long as it does
direct point-to-point replication with no forwarding that's good enough for
a first cut feature, so long as the UI, catalog and schema design leaves
room for adding more later.



> I also suspect multiple publications will be normal even if only 2 nodes.
>> Old slow moving data almost always got different treatment than fast-moving
>> data; even if only defining which set needs to hit the other node first and
>> which set can trickle through later.
>>
>
> Agreed
>
>
Yes, especially since we can currently only stream transactions one by one
in commit order after commit.

Even once we have interleaved xact streaming, though, there will still be
plenty of times we want to receive different sets of data from the same
node at different priorities/rates. Small data we want to receive quickly,
vs big data we receive when we get the chance to catch up. Of course it's
necessary to define non-overlapping replication sets for this.

That's something we can already do in pglogical. I'm not sure if Petr is
targeting replication set support as part of the first release of the
in-core version of logical replication; they're necessary to do things like
this.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Design for In-Core Logical Replication

Reply via email to