On Wed, Dec 10, 2025 at 3:32 PM shveta malik <[email protected]> wrote:
>
> +1. This can be reproduced as well. When the logical-decoding state is
> cached, we may fail to log logical-info (unassigned XID case), causing
> certain rows not to be replicated to subscribers. The steps below
> demonstrate this.
>
> Backend1 of pub:
> -------------
> create table tab1(i int);
> create publication pub1 for table tab1;
>
> BEGIN;
> SELECT txid_current_if_assigned(); --xid not assigned yet.
> SHOW wal_level; SHOW effective_wal_level; --replica
>
> --pause here and do 'Step1' mentioned below on backend2.
> --logical decoding is now enabled except this backend.
> --now continue with backend1:
>
> insert into tab1 values(20);
> insert into tab1 values(30);
>
> --pause here and do 'Step2' mentioned below on backend2.
> --now continue with backend1:
>
> SELECT txid_current_if_assigned(); --xid gets assigned before above insert.
> SHOW wal_level; SHOW effective_wal_level; --it is still 'replica' in this
> txn.
> COMMIT;
>
> Step1 (it will enable logical decoding):
> ----------------------------
> Backend2 of pub:
> SELECT pg_create_logical_replication_slot('slot', 'pgoutput', false,
> false, false);
> show wal_level; show effective_wal_level; --logical now.
>
> Subscriber:
> create table tab1(i int);
> create subscription sub1 connection '...' publication pub1;
>
> Backend2 of pub: insert into tab1 values(10);
> ----------------------------
>
>
> Step2:
> --------------------------------
> Backend2 of pub: insert into tab1 values(40);
> --------------------------------
>
> At the end after backend1 commits:
> On pub, we have 4 rows in tab1:
> {10}, {20}, {30}, {40}
>
> On sub, we have 2 rows in tab1:
> {10}, {40}
>
> ~~
>
> If we stop caching the logical-decoding state within a transaction, we
> may still encounter issues, because the backend could observe logical
> decoding as disabled at one point and enabled at another.
>
I think such a problem won't happen at transaction-level if we ensure
that transaction-level cache is initialized at the time of
transaction-id assignment. However, if we want to wait for all
backends that have any open transaction during first logical
slot-creation then this should be addressed automatically. And, we
don't need to worry about the theoretical scenario where half the WAL
info is constructed before tranasaction_id assignment and the other
half after assignment. I feel waiting for all open transactions idea
sounds like we are going too far without the real need.
Having said that, if we still want to go with waiting for all open
transactions idea then let's document it along with logical slot
creation documentation. I checked and found that we don't need to
worry about wal_sender_timeout during slot_creation for that idea as
per current code, see following part of code:
CreateReplicationSlot()
{
...
/*
* Signal that we don't need the timeout mechanism. We're just
* creating the replication slot and don't yet accept feedback
* messages or send keepalives. As we possibly need to wait for
* further WAL the walsender would otherwise possibly be killed too
* soon.
*/
last_reply_timestamp = 0;
...
}
--
With Regards,
Amit Kapila.