Hello, The attached patch speeds up the removal of WAL files in the old timelines. I'll add this to the next CF.
BACKGROUND ================================================== We need to meet a severe availability requirement of a potential customer. They will use synchronous streaming replication. The allowed failover duration, from the failure through failure detection to the failover completion, is 10 seconds. Even one second is precious. During a testing on a fast machine with SSD, we observed about 2 seconds between these messages. There were no other messages between them. LOG: archive recovery complete LOG: MultiXact member wraparound protections are now enabled CAUSE ================================================== Examining the source code, RemoveNonParentXlogFiles() seems to account for the time. It syncs pg_wal directory every time it deletes a WAL file. max_wal_size was set to 48GB, so about 1,000 WAL files were probably deleted and hence the pg_wal directory was synced as much. FIX ================================================== unlink() the WAL files, then sync the pg_wal directory once at the end. Unfortunately, the original machine is now not available, so I confirmed the speedup on a VM with HDD. [time to remove 1,000 WAL files including the directory sync] nonpatched: 2.45 seconds patched: 0.81 seconds Regards Takayuki Tsunakawa
speedup_wal_removal.patch
Description: speedup_wal_removal.patch