sequences vs. synchronous replication

Tomas Vondra Fri, 17 Dec 2021 17:54:17 -0800

Hi,

while working on logical decoding of sequences, I ran into an issue withnextval() in a transaction that rolls back, described in [1]. But afterthinking about it a bit more (and chatting with Petr Jelinek), I thinkthis issue affects physical sync replication too.


Imagine you have a primary <-> sync_replica cluster, and you do this:

  CREATE SEQUENCE s;

  -- shutdown the sync replica

  BEGIN;
  SELECT nextval('s') FROM generate_series(1,50);
  ROLLBACK;

  BEGIN;
  SELECT nextval('s');
  COMMIT;

The natural expectation would be the COMMIT gets stuck, waiting for thesync replica (which is not running), right? But it does not.

The problem is exactly the same as in [1] - the aborted transactiongenerated WAL, but RecordTransactionAbort() ignores that and does notupdate LogwrtResult.Write, with the reasoning that aborted transactionsdo not matter. But sequences violate that, because we only write WALonce every 32 increments, so the following nextval() gets "committed"without waiting for the replica (because it did not produce WAL).

I'm not sure this is a clear data corruption bug, but it surely walksand quacks like one. My proposal is to fix this by tracking the lsn ofthe last LSN for a sequence increment, and then check that LSN inRecordTransactionCommit() before calling XLogFlush().



regards

[1]https://www.postgresql.org/message-id/ae3cab67-c31e-b527-dd73-08f196999ad4%40enterprisedb.com


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

sequences vs. synchronous replication

Reply via email to