Hello
I encountered a very lucky logical decoding error on the publisher:

2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] LOG:  starting logical decoding for slot "pubsub"
2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] DETAIL:  Streaming transactions committing after 
0/16AD5F8, reading WAL from 0/16AD5F8.
2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] STATEMENT:  START_REPLICATION SLOT "pubsub" LOGICAL 
0/16AD5F8 (proto_version '4', origin 'any', publication_names '"testpub"')
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] LOG:  logical decoding found consistent point at 
0/16AD5F8
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] DETAIL:  There are no running transactions.
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] STATEMENT:  START_REPLICATION SLOT "pubsub" LOGICAL 
0/16AD5F8 (proto_version '4', origin 'any', publication_names '"testpub"')
2023-09-05 09:58:39.187 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] ERROR:  could not create file 
"pg_replslot/pubsub/state.tmp": File exists

As I found out, the disk with the database ran out of space, but it was so 
lucky that postgresql did not go into crash recovery. Doubly lucky that logical 
walsender was able to create state.tmp, but could not write the contents and 
got "ERROR: could not write to file "pg_replslot/pubsub/state.tmp": No space 
left on device". The empty state.tmp remained on disk. When the problem with 
free disk space was solved, the publication remained inoperative. To fix it, 
one need to restart the database (RestoreSlotFromDisk always deletes state.tmp) 
or delete state.tmp manually.

Maybe in SaveSlotToPath (src/backend/replication/slot.c) it's also worth 
deleting state.tmp if it already exists? All operations are performed under 
LWLock and there should be no parallel access.

PS: I reproduced the error on HEAD by adding pg_usleep to SaveSlotToPath before 
writing to file. At this time, I filled up the virtual disk.

regards, Sergei


Reply via email to