On Mon, Sep 25, 2023 at 7:46 PM Hayato Kuroda (Fujitsu)
<kuroda.hay...@fujitsu.com> wrote:
>
> Dear Ajin, Shveta,
>
> Thank you for rebasing the patch set! Here are new comments for v19_2-0001.
>

Thank You Kuroda-san for the feedback. Most of these are addressed in
v20. Please find my response inline.

> 01. WalSndWaitForStandbyNeeded()
>
> ```
>         if (SlotIsPhysical(MyReplicationSlot))
>                 return false;
> ```
>
> Is there a possibility that physical walsenders call this function?
> IIUC following is a stacktrace for the function, so the only logical 
> walsenders use it.
> If so, it should be Assert() instead of an if statement.
>
> logical_read_xlog_page()
> WalSndWaitForWal()
> WalSndWaitForStandbyNeeded()

It will only be called from logical-walsenders. Modified as you suggested.

>
> 02. WalSndWaitForStandbyNeeded()
>
> Can we set shouldwait in SlotSyncInitConfig()? synchronize_slot_names_list is
> searched whenever the function is called, but it is not changed automatically.
> If the slotname is compared with the list in the SlotSyncInitConfig(), the
> liner search can be reduced.

standby_slot_names and synchronize_slot_names will be removed in the
next version as per discussion in [1] and thus SlotSyncInitConfig()
will not be needed. It will be replaced by new functionality. So I am
currently leaving it as is.

>
> 03. WalSndWaitForStandbyConfirmation()
>
> We should add ProcessRepliesIfAny() during the loop, otherwise the walsender
> overlooks the death of an apply worker.
>

Done.

> 04. WalSndWaitForStandbyConfirmation()
>
> Not sure, but do we have to return early if walsenders got 
> PROCSIG_WALSND_INIT_STOPPING
> signal? I thought that if physical walsenders get stuck, logical walsenders 
> wait
> forever. At that time we cannot stop the primary server even if "pg_ctl stop"
> is executed.
>

yes, right.  I have added CHECK_FOR_INTERRUPTS() and 'got_STOPPING'
handling now which I think should suffice to process
PROCSIG_WALSND_INIT_STOPPING.

> 05. SlotSyncInitConfig()
>
> Why don't we free the memory for rawname, old standby_slot_names_list, and 
> synchronize_slot_names_list?
> They seem to be overwritten.
>

Skipped for the time being due to reasons stated in pt 2.

> 06. SlotSyncInitConfig()
>
> Both physical and logical walsenders call the func, but physical one do not 
> use
> lists, right? If so, can we add a quick exit for physical walsenders?
> Or, we should carefully remove where physical calls it.
>
> 07. StartReplication()
>
> I think we do not have to call SlotSyncInitConfig().
> Alternative approach is written in above.
>

I have removed it  from StartReplication()

> 08. the other
>
> Also, I found the unexpected behavior after both 0001 and 0002 were applied.
> Was it normal or not?
>
> 1. constructed below setup
> (ensured that logical slot existed on secondary)
> 2. stopped the primary
> 3. promoted the secondary server
> 4. disabled a subscription once
> 5. changed the connection string for subscriber
> 6. Inserted data to new primary
> 7. enabled the subscription again
> 8. got an ERROR: replication slot "sub" does not exist
>
> I expected that the logical replication would be restarted, but it could not.
> Was it real issue or my fault? The error would appear in secondary.log.
>
> ```
> Setup:
> primary--->secondary
>    |
>    |
> subscriber
> ```

I have attached the new test-script (v2), can you please try that on
the v20 set of patches. We should let the slot creation complete first
on standby and then try promotion. I have added a few extra lines in
v2 of your script for the same. In the test-case, primary's
restart-lsn was lagging behind
new-slot's restart_lsn on standby and thus standby was waiting for
primary to catch-up. Meanwhile standby got promoted and thus slot
creation got aborted. That is the reason you were not able to get the
logical replication working on the new primary. v20 has improved
handling and better logging for such a case. Please try the attached
test-script on v20.


[1]: 
https://www.postgresql.org/message-id/CAJpy0uA%2Bt3XP2M0qtEmrOG1gSwHghjHPno5AtwTXM-94-%2Bc6JQ%40mail.gmail.com

thanks
Shveta
#!/bin/bash

port_primary=5431
port_secondary=5432
port_subscriber=5433

echo '=========='
echo '=Clean up='
echo '=========='

pg_ctl stop -D data_primary
pg_ctl stop -D data_secondary
pg_ctl stop -D data_subscriber

rm -rf data_* *log

echo '======================='
echo '=Set up primary server='
echo '======================='

initdb -D data_primary -U postgres

cat << EOF >> data_primary/postgresql.conf
wal_level = logical
port = $port_primary
standby_slot_names = 'physical'
synchronize_slot_names = 'sub'
log_replication_commands = 'on'
EOF

cat << EOF >> data_primary/pg_hba.conf
host all,replication all 0.0.0.0/0 trust
EOF

pg_ctl -D data_primary start -w -l primary.log
psql -U postgres -p $port_primary -c "CREATE TABLE tbl (id int primary key);"
psql -U postgres -p $port_primary -c "INSERT INTO tbl VALUES (1)"
psql -U postgres -p $port_primary -c "CREATE PUBLICATION pub FOR ALL TABLES"
psql -U postgres -p $port_primary -c "SELECT * FROM pg_create_physical_replication_slot('physical');"

echo '========================='
echo '=Set up secondary server='
echo '========================='

pg_basebackup -D data_secondary -U postgres -p $port_primary

cat << EOF >> data_secondary/postgresql.conf
port = $port_secondary
primary_conninfo = 'user=postgres port=$port_primary application_name=secondary dbname=postgres'
primary_slot_name = 'physical'
hot_standby_feedback=on
EOF

cat << EOF >> data_secondary/standby.signal
EOF

pg_ctl -D data_secondary start   -w -l secondary.log

psql -U postgres -p $port_primary -c "SELECT pg_log_standby_snapshot();"
psql -U postgres -p $port_secondary -c "SELECT * FROM pg_create_physical_replication_slot('physical');"

echo '==================='
echo '=Set up subscirber='
echo '==================='

initdb -U postgres -D data_subscriber

cat << EOF >> data_subscriber/postgresql.conf
port = $port_subscriber
wal_level = logical
EOF

pg_ctl start -D data_subscriber -l subscriber.log

psql -U postgres -p $port_subscriber -c "CREATE TABLE tbl (id int primary key);"
psql -U postgres -p $port_subscriber -c "CREATE SUBSCRIPTION sub CONNECTION 'user=postgres dbname=postgres port=$port_primary' PUBLICATION pub WITH (copy_data = on)"

sleep 1s

pg_ctl -D data_secondary reload
sleep 1s
psql -U postgres -p $port_secondary -c "SELECT * FROM pg_replication_slots;"

sleep 2s
echo 'Moving lsns on primary so that slot-creation on secondary is successful'
psql -U postgres -p $port_primary -c "INSERT INTO tbl VALUES (2)"
psql -U postgres -p $port_primary -c "INSERT INTO tbl VALUES (3)"
psql -U postgres -p $port_primary -c "INSERT INTO tbl VALUES (4)"

#Allow sometime for successful slot creation on standby
sleep 10s

# Now promote...

pg_ctl -D data_primary stop -w
pg_ctl -D data_secondary promote -w
psql -U postgres -p $port_subscriber -c "ALTER SUBSCRIPTION sub DISABLE"
pg_ctl -D data_subscriber restart -l subscriber.log
psql -U postgres -p $port_secondary -c "INSERT INTO tbl VALUES (generate_series(21, 30))"
psql -U postgres -p $port_subscriber -c "ALTER SUBSCRIPTION sub CONNECTION 'user=postgres dbname=postgres port=$port_secondary'"
psql -U postgres -p $port_subscriber -c "ALTER SUBSCRIPTION sub ENABLE"

Reply via email to