On Thu, 9 Feb 2023 at 12:02, Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, Feb 8, 2023 at 1:13 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Wed, Feb 8, 2023 at 1:19 AM Andres Freund <and...@anarazel.de> wrote: > > > > > > On 2023-02-01 11:23:57 +0530, Amit Kapila wrote: > > > > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada <sawada.m...@gmail.com> > > > > wrote: > > > > > > > > > > Attached updated patches. > > > > > > > > > > > > > Thanks, Andres, others, do you see a better way to fix this problem? I > > > > have reproduced it manually and the steps are shared at [1] and > > > > Sawada-San also reproduced it, see [2]. > > > > > > > > [1] - > > > > https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com > > > > [2] - > > > > https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com > > > > > > Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating > > > over > > > the slots. ReplicationSlotsComputeRequiredXmin() can be called at a > > > non-neglegible frequency. Callers like CreateInitDecodingContext(), that > > > pass > > > already_locked=true worry me a lot less, because obviously that's not a > > > very > > > frequent operation. > > > > > > This is particularly not great because we need to acquire > > > ReplicationSlotControlLock while already holding ProcArrayLock. > > > > > > > > > But clearly there's a pretty large hole in the lock protection right now. > > > I'm > > > a bit confused about why we (Robert and I, or just I) thought it's ok to > > > do it > > > this way. > > > > > > > > > I wonder if we could instead invert the locks, and hold > > > ReplicationSlotControlLock until after ProcArraySetReplicationSlotXmin(), > > > and > > > acquire ProcArrayLock just for ProcArraySetReplicationSlotXmin(). > > > > > > > Along with inverting, doesn't this mean that we need to acquire > > ReplicationSlotControlLock in Exclusive mode instead of acquiring it > > in shared mode? My understanding of the above locking scheme is that > > in CreateInitDecodingContext(), we acquire ReplicationSlotControlLock > > in Exclusive mode before acquiring ProcArrayLock in Exclusive mode and > > release it after releasing ProcArrayLock. Then, > > ReplicationSlotsComputeRequiredXmin() acquires > > ReplicationSlotControlLock in Exclusive mode only when already_locked > > is false and releases it after a call to > > ProcArraySetReplicationSlotXmin(). ProcArraySetReplicationSlotXmin() > > won't change. > > I've attached the patch of this idea for discussion. In > GetOldestSafeDecodingTransactionId() called by > CreateInitDecodingContext(), we hold ReplicationSlotControlLock, > ProcArrayLock, and XidGenLock at a time. So we would need to be > careful about the ordering.
I have changed the status of the patch to "Waiting on Author" as Robert's issues were not addressed yet. Feel free to change the status accordingly after addressing them. Regards, Vignesh