On 22 March 2016 at 14:32, konstantin knizhnik <k.knizh...@postgrespro.ru>
wrote:


> Ah you mean because with wal_log=true the origin advance is in different
> WAL record than commit? OK yeah you might be one transaction behind then,
> true.
>
>
> It actually means that we can not enforce database consistency. If we do 
> replorigin_advance
>  before commit and then crash happen, then we will loose some changes.
> If we call replorigin_advance after commit but crash happen before, then
> some changes can be applied multiple times. For example we can insert some
> record twice (if there are no unique constraints).
> Look likes the only working scenario is to setup replication session for
> each commit and use locking to prevent concurrent session setup for the
> same slot by multiple process,  doesn't it?
>

Yes.

How would you expect it to work if you attempted to replorigin_advance
without a session? From multiple concurrent backends?

Parallel apply is complicated business. You have to make sure you apply
xacts in an order that's free from deadlocks and from insert/delete
anomalies - though you can at least detect those, ERROR that xact and all
subsequent ones, and retry. For progress tracking to be consistent and
correct you'd have to make sure you committed strictly in the same order as
upstream. Just before each commit you can set the origin LSN and advance
the replication origin, which will commit atomically along with the commit
it confirms. I don't really see the problem.


> I have tried it, fortunately it doesn't cause any noticeable performance
> degradation. But unfortunately  can't consider such approach as elegant.
> Why it is actually necessary to bind replication slot to process? Why it
> is not possible to have multiple concurrent sessions for the same slot?
>

Especially since most slot changes LWLock- and/or spinlock-protected
already.

The client would have to manage replay confirmations appropriately so that
it doesn't confirm past the point where some other connection still needs
it.

We'd have to expose a "slot" column in pg_stat_replication and remove the
"pid" column from pg_replication_slots to handle the 1:n relationship
between slot clients and slots, and it'd be a pain to show which normal
user backends were using a slot. Not really sure how to handle that.

To actually make this useful would require a lot more though. A way to
request that replay start from a new LSN without a full
disconnect/reconnect each time. Client-side parallel consume/apply.
Inter-transaction ordering information so the client can work out a viable
xact apply order (possibly using SSI information per the discussion with
Kevin?). Etc.

I haven't really looked into this and I suspect there are some hairy areas
involved in replaying a slot from more than one client. The reason I'm
interested in it personally is for initial replica state setup as Oleksandr
prototyped and described earlier. We could attach to the slot's initial
snapshot then issue a new replication command that, given a table name or
oid, scans the table from the snapshot and passes each tuple to a new
callback (like, but not the same as, the insert callback) on the output
plugin.

That way clients could parallel-copy the initial state of the DB across the
same replication protocol they then consume new changes from, with no need
to make normal libpq connections and COPY initial state.

I'm interested in being able to do parallel receive of new changes from the
slot too, but suspect that'd be a bunch harder.



> Also I concern about using sequential search for slot location
> in replorigin_session_setup and many other functions - there is loop
> through all  max_replication_slots.
> It seems to be not a problem when number of slots is less than 10. For
> multimaster this assumption is true - even Oracle RAC rarely has two-digit
> number of nodes.
> But if we want to perform sharding and use logical replication for
> providing redundancy, then number of nodes and slots can be essentially
> larger.
>

Sounds like premature optimisation. Deal with it if it comes up in profiles
in scale testing with 100 clients. I'll be surprised if it does.


-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply via email to