While testing the below case with the hot standby setup (with the
latest code), I have noticed that the checkpointer process crashed
with the $subject error. As per my observation, we have registered the
SYNC_REQUEST when inserting some tuple into the table, and later on
ALTER SET TABLESPACE we have registered the SYNC_UNLINK_REQUEST, which
looks fine so far, then I have noticed that only when the standby is
connected the underlying table file w.r.t the old tablespace is
already deleted.  Now, in AbsorbFsyncRequests we don't do anything for
the SYNC_REQUEST even though we have SYNC_UNLINK_REQUEST for the same
file, but since the underlying file is already deleted the
checkpointer cashed while processing the SYNC_REQUEST.

I have spent some time on this but could not figure out how the
relfilenodenode file w.r.t. to the old tablespace is getting deleted
and if I disconnect the standby then it is not getting deleted, not
sure how walsender is playing a role in deleting the file even before
checkpointer process the unlink request.

postgres[8905]=# create tablespace tab location
'/home/dilipkumar/work/PG/install/bin/test';
CREATE TABLESPACE
postgres[8905]=# create tablespace tab1 location
'/home/dilipkumar/work/PG/install/bin/test1';
CREATE TABLESPACE
postgres[8905]=# create database test tablespace tab;
CREATE DATABASE
postgres[8905]=# \c test
You are now connected to database "test" as user "dilipkumar".
test[8912]=# create table t( a int PRIMARY KEY,b text);
CREATE TABLE
test[8912]=# insert into t values (generate_series(1,10), 'aaa');
INSERT 0 10
test[8912]=# alter table t set tablespace tab1 ;
ALTER TABLE
test[8912]=# CHECKPOINT ;
WARNING:  57P02: terminating connection because of crash of another
server process

log shows:
PANIC:  could not fsync file
"pg_tblspc/16384/PG_15_202112131/16386/16387": No such file or
directory

backtrace:
#0  0x00007f2f865ff387 in raise () from /lib64/libc.so.6
#1  0x00007f2f86600a78 in abort () from /lib64/libc.so.6
#2  0x0000000000b13da3 in errfinish (filename=0xcf283f "sync.c", ..
#3  0x0000000000978dc7 in ProcessSyncRequests () at sync.c:439
#4  0x00000000005949d2 in CheckPointGuts (checkPointRedo=67653624,
flags=108) at xlog.c:9590
#5  0x00000000005942fe in CreateCheckPoint (flags=108) at xlog.c:9318
#6  0x00000000008a80b7 in CheckpointerMain () at checkpointer.c:444

Note: This smaller test case is derived from one of the bigger
scenarios raised by Neha Sharma [1]

[1]https://www.postgresql.org/message-id/CANiYTQs0E8TcB11eU0C4eNN0tUd%3DSQqsqEtL1AVZP1%3DEnD-49A%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply via email to