On Mar 22, 2016, at 10:10 AM, Craig Ringer wrote:

> On 22 March 2016 at 14:32, konstantin knizhnik <k.knizh...@postgrespro.ru> 
> wrote:
>> Ah you mean because with wal_log=true the origin advance is in different WAL 
>> record than commit? OK yeah you might be one transaction behind then, true.
> It actually means that we can not enforce database consistency. If we do 
> replorigin_advance  before commit and then crash happen, then we will loose 
> some changes.
> If we call replorigin_advance after commit but crash happen before, then some 
> changes can be applied multiple times. For example we can insert some record 
> twice (if there are no unique constraints).
> Look likes the only working scenario is to setup replication session for each 
> commit and use locking to prevent concurrent session setup for the same slot 
> by multiple process,  doesn't it?
> Yes.
> How would you expect it to work if you attempted to replorigin_advance 
> without a session? From multiple concurrent backends?

I would not work. But I wonder why I would need to call replorigin_advance 
without a session.
Please excuse me, I am not thinking now about the general case of using logical 
replication, I just focused on multimaster.
What I need is some efficient, durable and atomic mechanism for applying 
I see only two ways to provide atomicity:
1. Tracking of origins should be done inside xact as part of normal commit.
2. Use custom WAL records.

1) is easier now and it really works if I correctly synchronize access to 
slots. And surprisingly it even doesn't add substantial overhead.

> Parallel apply is complicated business. You have to make sure you apply xacts 
> in an order that's free from deadlocks and from insert/delete anomalies - 
> though you can at least detect those, ERROR that xact and all subsequent 
> ones, and retry.

Well, this is exactly what our multimaster does. We do not try to enforce order 
of applying xacts. But we detect global deadlocks and use 2PC to provide data 
So it is not task of logical replication, it is done by DTM overriding  
visibility checks and transaction commit protocol using XTM.

> For progress tracking to be consistent and correct you'd have to make sure 
> you committed strictly in the same order as upstream. Just before each commit 
> you can set the origin LSN and advance the replication origin, which will 
> commit atomically along with the commit it confirms. I don't really see the 
> problem.

Sorry, I do not completely understand you. What you mean by "will commit 
atomically along with the commit it confirms"? How this atomicity will be 

> I have tried it, fortunately it doesn't cause any noticeable performance 
> degradation. But unfortunately  can't consider such approach as elegant.
> Why it is actually necessary to bind replication slot to process? Why it is 
> not possible to have multiple concurrent sessions for the same slot?
> Especially since most slot changes LWLock- and/or spinlock-protected already.
> The client would have to manage replay confirmations appropriately so that it 
> doesn't confirm past the point where some other connection still needs it.
> We'd have to expose a "slot" column in pg_stat_replication and remove the 
> "pid" column from pg_replication_slots to handle the 1:n relationship between 
> slot clients and slots, and it'd be a pain to show which normal user backends 
> were using a slot. Not really sure how to handle that.
> To actually make this useful would require a lot more though. A way to 
> request that replay start from a new LSN without a full disconnect/reconnect 
> each time. Client-side parallel consume/apply. Inter-transaction ordering 
> information so the client can work out a viable xact apply order (possibly 
> using SSI information per the discussion with Kevin?). Etc.
> I haven't really looked into this and I suspect there are some hairy areas 
> involved in replaying a slot from more than one client. The reason I'm 
> interested in it personally is for initial replica state setup as Oleksandr 
> prototyped and described earlier. We could attach to the slot's initial 
> snapshot then issue a new replication command that, given a table name or 
> oid, scans the table from the snapshot and passes each tuple to a new 
> callback (like, but not the same as, the insert callback) on the output 
> plugin.
> That way clients could parallel-copy the initial state of the DB across the 
> same replication protocol they then consume new changes from, with no need to 
> make normal libpq connections and COPY initial state.
> I'm interested in being able to do parallel receive of new changes from the 
> slot too, but suspect that'd be a bunch harder.
> Also I concern about using sequential search for slot location in 
> replorigin_session_setup and many other functions - there is loop through all 
>       max_replication_slots.
> It seems to be not a problem when number of slots is less than 10. For 
> multimaster this assumption is true - even Oracle RAC rarely has two-digit 
> number of nodes.
> But if we want to perform sharding and use logical replication for providing 
> redundancy, then number of nodes and slots can be essentially larger.
> Sounds like premature optimisation. Deal with it if it comes up in profiles 
> in scale testing with 100 clients. I'll be surprised if it does.
> -- 
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

Reply via email to