On Wednesday, November 12, 2025 7:27 PM Pradeep Kumar <[email protected]> wote: > I've been investigating the assert failure in > ProcArraySetReplicationSlotXmin() and would like to share my approach and get > feedback. Instead of inverting the locks and what robert shared before [1]. > Instead of unconditionally updating procArray->replication_slot_xmin in > ProcArraySetReplicationSlotXmin() in procarray.c, I made the updates > conditional: > 1) Only update if the incoming xmin is valid > 2) Only update if it's older than the currently stored xmin > 3) Do the same for procArray->replication_slot_catalog_xmin ... > In above block of code ensures we always track the minimum xmin across all > active replication slots without losing data. And also no need to worry about > locks. And also while reproducing this issue [2] In SnapBuildInitialSnapshot() > while we computing safexid by calling > GetOldestSafeDecodingTransactionId(false) > will enters into first case and update the oldestSafeXid = > procArray->replication_slot_xmin. So it won't return nextXid. And also it > solves > this issue [2].
Thanks for evaluating new approach, but I think this approach could not work because we expect replication_slot_xmin to be set to an invalid number when the last slot is dropped, while this approach would disallow that, causing WALs to be retained. For a detailed explanation, please refer to [1]. While testing the patches across all branches, I noticed that an additional lock needs to be added in the launcher.c where ReplicationSlotsComputeRequiredXmin(true) was recently added for conflict detection slot. I have modified the original patch accordingly. BTW, I am not adding a test using an injection point because it does not seem practical to insert an injection point inner ReplicationSlotsComputeRequiredXmin. The reason is that the injection point function internally calls CHECK_FOR_INTERRUPTS(), but the key functions in the patch holds the lwlock, holding holds interrupts. I am sharing the patches for all branches for reference. [1] https://www.postgresql.org/message-id/TY4PR01MB169070EE618FA2908B3D2F2AE94C3A%40TY4PR01MB16907.jpnprd01.prod.outlook.com Best Regards, Hou zj
v3HEAD-0001-Fix-a-race-condition-of-updating-procArray-replic.patch
Description: v3HEAD-0001-Fix-a-race-condition-of-updating-procArray-replic.patch
v3PG18-17-0001-Fix-a-race-condition-of-updating-procArray-re.patch
Description: v3PG18-17-0001-Fix-a-race-condition-of-updating-procArray-re.patch
v3PG16-13-0001-Fix-a-race-condition-of-updating-procArray-re.patch
Description: v3PG16-13-0001-Fix-a-race-condition-of-updating-procArray-re.patch
