Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

Steve Singer Sat, 16 Jun 2012 12:04:00 -0700

On 12-06-15 04:03 PM, Robert Haas wrote:

On Thu, Jun 14, 2012 at 4:13 PM, Andres Freund<and...@2ndquadrant.com>  wrote:

I don't plan to throw in loads of conflict resolution smarts. The aim is to get
to the place where all the infrastructure is there so that a MM solution can
be built by basically plugging in a conflict resolution mechanism. Maybe
providing a very simple one.
I think without in-core support its really, really hard to build a sensible MM
implementation. Which doesn't mean it has to live entirely in core.

Of course, several people have already done it, perhaps most notably Bucardo.


Anyway, it would be good to get opinions from more people here.  I am
sure I am not the only person with an opinion on the appropriateness
of trying to build a multi-master replication solution in core or,
indeed, the only person with an opinion on any of these other issues.


This sounds like a good place for me to chime in.

I feel that in-core support to capture changes and turn them into changerecords that can be replayed on other databases, without relying ontriggers and log tables, would be good to have.

I think we want some flexible enough that people write consumers of theLCRs to do conflict resolution for multi-master but I am not sure thatthe conflict resolution support actually belongs in core.

Most of the complexity of slony (both in terms of lines of code, andissues people encounter using it) comes not from the log triggers orreplay of the logged data but comes from the configuration of the cluster.

Controlling things like

* Which tables replicate from a node to which other nodes

* How do you change the cluster configuration on a running system(adding nodes, removing nodes, moving the origin of a table, addingtables to replication etc...)

This is the harder part of the problem, I think we need to first get theinfrastructure committed (that the current patch set deals with) tocapturing, transporting and translating the LCR's into the system beforeget too caught up in the configuration aspects. I think we will have ahard time agreeing on behaviours for some of that other stuff that areboth flexible for enough use cases and simple enough foradministrators. I'd like to see in-core support for a lot of that stuffbut I'm not holding my breath.

It is not good for those other opinions to be saved for a later date.

Hm. Yes, you could do that. But I have to say I don't really see a point.
Maybe the fact that I do envision multimaster systems at some point is
clouding my judgement though as its far less easy in that case.

Why?  I don't think that particularly changes anything.

It also complicates the wal format as you now need to specify whether you
transport a full or a primary-key only tuple...

Why?  If the schemas are in sync, the target knows what the PK is
perfectly well.  If not, you're probably in trouble anyway.

I think though that we do not want to enforce that mode of operation for
tightly coupled instances. For those I was thinking of using command triggers
to synchronize the catalogs.
One of the big screwups of the current replication solutions is exactly that
you cannot sensibly do DDL which is not a big problem if you have a huge
system with loads of different databases and very knowledgeable people et al.
but at the beginning it really sucks. I have no problem with making one of the
nodes the "schema master" in that case.
Also I would like to avoid the overhead of the proxy instance for use-cases
where you really want one node replicated as fully as possible with the slight
exception of being able to have summing tables, different indexes et al.

In my view, a logical replication solution is precisely one in which
the catalogs don't need to be in sync.  If the catalogs have to be in
sync, it's not logical replication.  ISTM that what you're talking
about is sort of a hybrid between physical replication (pages) and
logical replication (tuples) - you want to ship around raw binary
tuple data, but not entire pages.  The problem with that is it's going
to be tough to make robust.  Users could easily end up with answers
that are total nonsense, or probably even crash the server.


I see three catalogs in play here.
1. The catalog on the origin

2. The catalog on the proxy system (this is the catalog used totranslate the WAL records to LCR's). The proxy system will needessentially the same pgsql binaries (same architecture, importantcomplie flags etc..) as the origin

3. The catalog on the destination system(s).

The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't needto be in-sync with catalog 1. I think catalogs 2 and 3 are combined inthe current patch set (though I haven't yet looked at the codeclosely). I think the performance optimizations Andres has implementedto update tuples through low-level functions should be left for laterand that we should be generating SQL in the apply cache so we don'tstart assuming much about catalog 3.

guarantee.  And, without such a guarantee, I don't believe that we can
create a high-performance, robust, in-core replication solution.

Part of what people expect from a robust in-core solution is that itshould work with the the other in-core features. If we have to list abunch of in-core type as being incompatible with logical replicationthen people will look at logical replication with the same 'there bedragons here' attitude that scare many people away from the existingthird party replication solutions. Non-core or third party userdefined types are a slightly different matter because we can't controlwhat they do.



Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

Reply via email to