Hi,

We are using logical replication in multimaster and are faced with some interesting problem with "frozen" procArray->replication_slot_xmin. This variable is adjusted by ProcArraySetReplicationSlotXmin which is invoked by ReplicationSlotsComputeRequiredXmin, which is in turn is called by LogicalConfirmReceivedLocation. If transactions are executed at all nodes of multimaster, then everything works fine: replication_slot_xmin is advanced. But if we send transactions only to one multimaster node and broadcast this changes to other nodes, then no data is send through replications slot at this nodes. No data sends - no confirmations, LogicalConfirmReceivedLocation is not called and procArray->replication_slot_xmin preserves original value 599.

As a result GetOldestXmin function always returns 599, so autovacuum is actually blocked and our multimaster is not able to perform cleanup of XID->CSN map, which cause shared memory overflow. This situation happens only when write transactions are sent only to one node or if there are no write transactions at all.

Before implementing some workaround (for example forces all of ReplicationSlotsComputeRequiredXmin), I want to understand if it is real problem of logical replication or we are doing something wrong? BDR should be faced with the same problem if all updates are performed from one node...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to