Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

Andres Freund Wed, 20 Jun 2012 10:53:39 -0700

Hi Chris!

On Wednesday, June 20, 2012 07:06:28 PM Christopher Browne wrote:
> On Wed, Jun 20, 2012 at 11:50 AM, Andres Freund <[email protected]> 
wrote:
> > On Wednesday, June 20, 2012 05:34:42 PM Kevin Grittner wrote:
> >> Simon Riggs <[email protected]> wrote:
> >> > This is not transaction metadata, it is WAL record metadata
> >> > required for multi-master replication, see later point.
> >> > 
> >> > We need to add information to every WAL record that is used as the
> >> > source for generating LCRs.
> >> 
> >> If the origin ID of a transaction doesn't count as transaction
> >> metadata (i.e., data about the transaction), what does?  It may be a
> >> metadata element about which you have special concerns, but it is
> >> transaction metadata.  You don't plan on supporting individual WAL
> >> records within a transaction containing different values for origin
> >> ID, do you?  If not, why is it something to store in every WAL
> >> record rather than once per transaction?  That's not intended to be
> >> a rhetorical question.
> > 
> > Its definitely possible to store it per transaction (see the discussion
> > around http://archives.postgresql.org/message-
> > id/[email protected]) it just makes the filtering
> > via the originating node a considerably more complex thing. With our
> > proposal you can do it without any complexity involved, on a low level.
> > Storing it per transaction means you can only stream out the data to
> > other nodes *after* fully reassembling the transaction. Thats a pitty,
> > especially if we go for a design where the decoding happens in a proxy
> > instance.
> 
> I guess I'm not seeing the purpose to having the origin node id in the
> WAL stream either.
> 
> We have it in the Slony sl_log_* stream, however there is a crucial
> difference, in that sl_log_* is expressly a shared structure.  In
> contrast, WAL isn't directly sharable; you don't mix together multiple
> WAL streams.
> 
> It seems as though the point in time at which you need to know the
> origin ID is the moment at which you're deciding to read data from the
> WAL files, and knowing which stream you are reading from is an
> assertion that might be satisfied by looking at configuration that
> doesn't need to be in the WAL stream itself.  It might be *nice* for
> the WAL stream to be self-identifying, but that doesn't seem to be
> forcibly necessary.
> 
> The case where it *would* be needful is if you are in the process of
> assembling together updates coming in from multiple masters, and need
> to know:
>    - This INSERT was replicated from node #1, so should be ignored
> downstream - That INSERT was replicated from node #2, so should be ignored
> downstream - This UPDATE came from the local node, so needs to be passed
> to downstream users
Exactly that is the point. And you want to do that in an efficient manner 
without too much logic, thats why something simple like the record header is 
so appealing.


> > I also have to admit that I am very hesitant to start developing some
> > generic "transaction metadata" framework atm. That seems to be a good
> > way to spend a good part of time in discussion and disagreeing. Imo
> > thats something for later.

> Well, I see there being a use in there being at least 3 sorts of LCR
> records:
> a) Capturing literal SQL that is to replayed downstream
> b) Capturing tuple updates in a binary form that can be turned readily
> into heap updates on a replica.
> c) Capturing tuple data in some reasonably portable and readily
> re-writable form
I think we should provide the utilities to do all of those. a) is a 
consequence of being able to do c). 

That doesn't really have something to do with this subthread though? The part 
you quoted above was my response to the suggestion to add some generic 
framework to attach metadata to individual transactions on the generating 
side. We quite possibly will end up needing that but I personally don't think 
we should designing that part atm.

> b) Capturing tuple updates in a binary form that can be turned readily
> into heap updates on a replica.
>     Unfortunately, this form is likely not to play well when
> replicating across platforms or Postgres versions, so I suspect that
> this performance optimization should be implemented as a *last*
> resort, rather than first.  Michael Jackson had some "rules of
> optimization" that said "don't do it", and, for the expert, "don't do
> it YET..."
Well, apply is a bottleneck. Besides field experience I/We have benchmarked it 
and its rather plausible that it is. And I don't think we can magically make 
that faster in pg in general so my plan is to remove the biggest cost factor I 
can see.
And yes, it will have restrictions...

Regards,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

Reply via email to