Hi Kuroda-san, On Wed, Apr 02, 2025 at 07:16:25AM +0000, Hayato Kuroda (Fujitsu) wrote: > Dear Amit, Bertrand, > > > You have not added any injection point for the above case. Isn't it > > possible that if running_xact record is logged concurrently to the > > pruning record, it should move the active slot on standby, and the > > same failure should occur in this case as well? > > I considered that the timing failure can happen. Reproducer: > > ``` > $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'D';]); > +$node_primary->safe_psql('testdb', 'CHECKPOINT'); > +sleep(20); > $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'E';]); > ```
Yeah, I was going to provide the exact same reproducer and then saw your email. > Based on the fact, I've updated to use injection_points for scenario 5. Of > course, > PG16/17 patches won't use the active slot for that scenario. Thanks for the updated patch! As far v4-0001: === 1 +# would advance an active replication slot's catalog_xmin s/would/could/? I mean the system also needs to be "slow" enough (so the sleep() in the reproducer) === 2 +# Injection_point is used to avoid seeing an xl_running_xacts even here. In +# scenario 5, we verify the case that the backend process detects the page has +# enough tuples; thus, page pruning happens. If the record is generated just +# before doing on-pruning, the catalog_xmin of the active slot would be +# updated; hence, the conflict would not occur. Not sure we need to explain what scenario 5 does here, but that does not hurt if you feel the need. Also maybe mention the last update in the comment and add some nuance (like proposed in === 1), something like? " # Injection_point is used to avoid seeing a xl_running_xacts here. Indeed, # if it is generated between the last 2 updates then the catalog_xmin of the active # slot could be updated; hence, the conflict could not occur. " Apart from that the tests looks good to me and all the problematic scenarios covered. As far PG17-v4-0001: === 3 -# seeing a xl_running_xacts that would advance an active replication slot's +# seeing the xl_running_xacts that would advance an active replication slot's why? === 4 It looks like that check_slots_conflict_reason() is not called with checks_active_slot as argument. === 5 I think that we could remove the need for the drop_active_slot parameter in drop_logical_slots() and just check if an active slot exists (and if so drop it). That said I'm not sure it's worth to go that far for backpatching. As far PG16-v4: === 6 Same as === 3 and === 5 (=== 4 does not apply as check_slots_conflict_reason() does not exist). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com