Re: [HACKERS] Replication identifiers, take 3

Andres Freund Fri, 26 Sep 2014 07:23:33 -0700

On 2014-09-26 09:53:09 -0400, Robert Haas wrote:
> On Fri, Sep 26, 2014 at 5:05 AM, Andres Freund <[email protected]> wrote:
> >> Let me try to summarize the information requirements for each of these
> >> things.  For #1, you need to know, after crash recovery, for each
> >> standby, the last commit LSN which the client has confirmed via a
> >> feedback message.
> >
> > I'm not sure I understand what you mean here? This is all happening on
> > the *standby*. The standby needs to know, after crash recovery, the
> > latest commit LSN from the primary that it has successfully replayed.
> 
> Ah, sorry, you're right: so, you need to know, after crash recovery,
> for each machine you are replicating *from*, the last transaction (in
> terms of LSN) from that server that you successfully replayed.


Precisely.

> > I don't think a solution which logs the change of origin will be
> > simpler. When the origin is in every record, you can filter without keep
> > track of any state. That's different if you can switch the origin per
> > tx. At the very least you need a in memory entry for the origin.
> 
> But again, that complexity pertains only to logical decoding.

> Somebody who wants to tweak the WAL format for an UPDATE in the future
> doesn't need to understand how this works, or care.

I agree that that's a worthy goal. But I don't see how this isn't the
case with what I propose? This isn't happening on the level of
individual rmgrs/ams - there've been two padding bytes after 'xl_rmid'
in struct XLogRecord for a long time.

There's also the significant advantage that not basing this on the xid
allows it to work correctly with records not tied to a
transaction. There's not that much of that happening yet, but I've
several features in mind:

* separately replicate 2PC commits. 2PC commits don't have an xid
  anymore... With some tooling on top replication 2PC in two phases
  allow for very cool stuff. Like optionally synchronous multimaster
  replication.
* I have a pending patch that allows to send 'messages' through logical
  decoding - yielding a really fast and persistent queue. For that it's
  useful have transactional *and* nontransactional messages.
* Sanely replicating CONCURRENTLY stuff gets harder if you tie things to
  the xid.

The absolutely, super, uber most convincing reason is:
It's trivial to build tools to analyze how much WAL traffic is generated
by which replication stream and how much by originates locally. A
pg_xlogdump --stats=replication_identifier wouldn't be hard ;)

> You know me: I've
> been a huge advocate of logical decoding.  But just like row-level
> security or BRIN indexes or any other feature, I think it needs to be
> designed in a way that minimizes the impact it has on the rest of the
> system.

Totally agreed. And that always will take some arguing...

> I simply don't believe your contention that this isn't adding
> any complexity to the code path for regular DML operations.  It's
> entirely possible we could need bit space in those records in the
> future for something that actually pertains to those operations; if
> you've burned it for logical decoding, it'll be difficult to claw it
> back.  And what if Tom gets around, some day, to doing that pluggable
> heap AM work?  Then every heap AM has got to allow for those bits, and
> maybe that doesn't happen to be free for them.

As explained above this isn't happening on the level of individual AMs.

> Admittedly, these are hypothetical scenarios, but I don't think
> they're particularly far-fetched.  And as a fringe benefit, if you do
> it the way that I'm proposing, you can use an OID instead of a 16-bit
> thing that we picked to be 16 bits because that happens to be 100% of
> the available bit-space.  Yeah, there's some complexity on decoding,
> but it's minimal: one more piece of fixed-size state to track per XID.
> That's trivial compared to what you've already got.

But it forces you to track the xids/transactions. With my proposal you
can ignore transaction *entirely* unless they manipulate the
catalog. For concurrent OLTP workloads that's quite the advantage.

> >> What's the point of the short-to-long mappings in the first place?  Is
> >> that only required because of the possibility that there might be
> >> multiple replication solutions in play on the same node?
> >
> > In my original proposal, 2 years+ back, I only used short numeric
> > ids. And people didn't like it because it requires coordination between
> > the replication solutions and possibly between servers. Using a string
> > identifier like in the above allows to easily build unique names; and
> > allows every solution to add the information it needs into replication
> > identifiers.
> 
> I get that, but what I'm asking is why those mappings can't be managed
> on a per-replication-solution basis.  I think that's just because
> there's a limited namespace and so coordination is needed between
> multiple replication solutions that might possibly be running on the
> same system.  But I want to confirm if that's actually what you're
> thinking.

Yes, that and that such a mapping needs to be done across all database
are the primary reasons. As it's currently impossible to create further
shared relations you'd have to invent something living in the data
directory on filesystem level... Brr.

I think it'd also be much worse for debugging if there'd be no way to
map such a internal identifier back to the replication solution in some
form.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 3

Reply via email to