On Wed, Feb 8, 2023 at 1:35 AM Andres Freund <and...@anarazel.de> wrote: > > On 2023-02-07 11:49:03 -0800, Andres Freund wrote: > > On 2023-02-01 11:23:57 +0530, Amit Kapila wrote: > > > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > > > > > > > Attached updated patches. > > > > > > > > > > Thanks, Andres, others, do you see a better way to fix this problem? I > > > have reproduced it manually and the steps are shared at [1] and > > > Sawada-San also reproduced it, see [2]. > > > > > > [1] - > > > https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com > > > [2] - > > > https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com > > > > Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating > > over > > the slots. ReplicationSlotsComputeRequiredXmin() can be called at a > > non-neglegible frequency. Callers like CreateInitDecodingContext(), that > > pass > > already_locked=true worry me a lot less, because obviously that's not a very > > frequent operation. > > Separately from this change: > > I wonder if we ought to change the setup in CreateInitDecodingContext() to be > a > bit less intricate. One idea: > > Instead of having GetOldestSafeDecodingTransactionId() compute a value, that > we then enter into a slot, that then computes the global horizon via > ReplicationSlotsComputeRequiredXmin(), we could have a successor to > GetOldestSafeDecodingTransactionId() change procArray->replication_slot_xmin > (if needed). > > As long as CreateInitDecodingContext() prevents a concurent > ReplicationSlotsComputeRequiredXmin(), by holding ReplicationSlotControlLock > exclusively, that should suffice to ensure that no "wrong" horizon was > determined / no needed rows have been removed. And we'd not need a lock nested > inside ProcArrayLock anymore. > > > Not sure if it's sufficiently better to be worth bothering with though :( >
I am also not sure because it would improve concurrency for CreateInitDecodingContext() which shouldn't be called at a higher frequency. Also, to some extent, the current coding or the approach we are discussing is easier to follow as we would always update procArray->replication_slot_xmin after checking all the slots. -- With Regards, Amit Kapila.