Hello,

I started taking a brief look at the v2 patch, and it does appear to work for 
the basic case. Logical slot is synchronized across and I can connect to the 
promoted standby and stream changes afterwards.

It's not clear to me what the correct behavior is when a logical slot that has 
been synced to the replica and then it gets deleted on the writer. Would we 
expect this to be propagated or leave it up to the end-user to manage?

> +       rawname = pstrdup(standby_slot_names);
> +       SplitIdentifierString(rawname, ',', &namelist);
> +
> +       while (true)
> +       {
> +               int                     wait_slots_remaining;
> +               XLogRecPtr      oldest_flush_pos = InvalidXLogRecPtr;
> +               int                     rc;
> +
> +               wait_slots_remaining = list_length(namelist);
> +
> +               LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +               for (int i = 0; i < max_replication_slots; i++)
> +               {

Even though standby_slot_names is PGC_SIGHUP, we never reload/re-process the 
value. If we have a wrong entry in there, the backend becomes stuck until we 
re-establish the logical connection. Adding "postmaster/interrupt.h" with 
ConfigReloadPending / ProcessConfigFile does seem to work.

Another thing I noticed is that once it starts waiting in this block, Ctrl+C 
doesn't seem to terminate the backend?

pg_recvlogical -d postgres -p 5432 --slot regression_slot --start -f -
..
^Cpg_recvlogical: error: unexpected termination of replication stream: 

The logical backend connection is still present:

ps aux | grep 51263
   hsuchen 51263 80.7  0.0 320180 14304 ?        Rs   01:11   3:04 postgres: 
walsender hsuchen [local] START_REPLICATION

pstack 51263
#0  0x00007ffee99e79a5 in clock_gettime ()
#1  0x00007f8705e88246 in clock_gettime () from /lib64/libc.so.6
#2  0x000000000075f141 in WaitEventSetWait ()
#3  0x000000000075f565 in WaitLatch ()
#4  0x0000000000720aea in ReorderBufferProcessTXN ()
#5  0x00000000007142a6 in DecodeXactOp ()
#6  0x000000000071460f in LogicalDecodingProcessRecord ()

It can be terminated with a pg_terminate_backend though.

If we have a physical slot with name foo on the standby, and then a logical 
slot is created on the writer with the same slot_name it does error out on the 
replica although it prevents other slots from being synchronized which is 
probably fine.

2021-12-16 02:10:29.709 UTC [73788] LOG:  replication slot synchronization 
worker for database "postgres" has started
2021-12-16 02:10:29.713 UTC [73788] ERROR:  cannot use physical replication 
slot for logical decoding
2021-12-16 02:10:29.714 UTC [73037] DEBUG:  unregistering background worker 
"replication slot synchronization worker"

On 12/14/21, 2:26 PM, "Peter Eisentraut" <peter.eisentr...@enterprisedb.com> 
wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    On 28.11.21 07:52, Bharath Rupireddy wrote:
    > 1) Instead of a new LIST_SLOT command, can't we use
    > READ_REPLICATION_SLOT (slight modifications needs to be done to make
    > it support logical replication slots and to get more information from
    > the subscriber).

    I looked at that but didn't see an obvious way to consolidate them.
    This is something we could look at again later.

    > 2) How frequently the new bg worker is going to sync the slot info?
    > How can it ensure that the latest information exists say when the
    > subscriber is down/crashed before it picks up the latest slot
    > information?

    The interval is currently hardcoded, but could be a configuration
    setting.  In the v2 patch, there is a new setting that orders physical
    replication before logical so that the logical subscribers cannot get
    ahead of the physical standby.

    > 3) Instead of the subscriber pulling the slot info, why can't the
    > publisher (via the walsender or a new bg worker maybe?) push the
    > latest slot info? I'm not sure we want to add more functionality to
    > the walsender, if yes, isn't it going to be much simpler?

    This sounds like the failover slot feature, which was rejected.



Reply via email to