Re: Issue with the PRNG used by Postgres

2024-04-16 Thread Andres Freund
Hi, On 2024-04-15 10:54:16 -0400, Robert Haas wrote: > On Fri, Apr 12, 2024 at 3:33 PM Andres Freund wrote: > > Here's a patch implementing this approach. I confirmed that before we > > trigger > > the stuck spinlock logic very quickly and after we don't. However, if most > > sleeps are

Re: Issue with the PRNG used by Postgres

2024-04-15 Thread Robert Haas
On Fri, Apr 12, 2024 at 3:33 PM Andres Freund wrote: > Here's a patch implementing this approach. I confirmed that before we trigger > the stuck spinlock logic very quickly and after we don't. However, if most > sleeps are interrupted, it can delay the stuck spinlock detection a good > bit. But

Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Andres Freund
Hi, On 2024-04-12 11:33:17 -0700, Andres Freund wrote: > I wonder if, for easy backpatching, the easiest solution is to just reset > errno before calling pg_usleep(), and only increment status->delays if > errno != EINTR. Given our pg_usleep() implementations, that'd preven the stuck > spinlock

Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Andres Freund
Hi, Given that I found several ways to spuriously trigger the stuck spinlock logic, I think we need to backpatch a fix. On 2024-04-11 21:41:39 -0700, Andres Freund wrote: > Tracking the total amount of time spent sleeping doesn't require any > additional syscalls, due to the nanosleep()... I

Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Alexander Lakhin
12.04.2024 08:05, Alexander Lakhin wrote: 2024-04-12 05:00:17.981 UTC [762336] PANIC:  stuck spinlock detected at WaitBufHdrUnlocked, bufmgr.c:5726 It looks like that spinlock issue caused by a race condition/deadlock. What I see when the test fails is: A client backend executing "DROP

Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Andres Freund
Hi, On 2024-04-12 09:43:46 -0400, Tom Lane wrote: > Andres Freund writes: > > Oh my. There's a workload that completely trivially hits this, without even > > trying hard. LISTEN/NOTIFY. > > Hm. Bug in the NOTIFY logic perhaps? I don't think it's a bug, async.c:SignalBackends() will signal

Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Tom Lane
Andres Freund writes: > Oh my. There's a workload that completely trivially hits this, without even > trying hard. LISTEN/NOTIFY. Hm. Bug in the NOTIFY logic perhaps? Sending that many signals can't be good from a performance POV, whether or not it triggers spinlock issues.

Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Andres Freund
Hi, On 2024-04-11 21:41:39 -0700, Andres Freund wrote: > FWIW, I just reproduced the scenario with signals. I added tracking of the > total time actually slept and lost to SpinDelayStatus, and added a function to > trigger a wait on a spinlock. > > To wait less, I set

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Alexander Lakhin
Hi Andres, 12.04.2024 07:41, Andres Freund wrote: FWIW, I just reproduced the scenario with signals. I added tracking of the total time actually slept and lost to SpinDelayStatus, and added a function to trigger a wait on a spinlock. To wait less, I set max_standby_streaming_delay=0.1, but

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Andres Freund
Hi, On 2024-04-11 20:47:37 -0700, Andres Freund wrote: > > So there's that. But that's not an argument that we need to be in a > > hurry to timeout; if the built-in reaction time is less than perhaps > > 10 minutes you're still miles ahead of the manual solution. > > The current timeout is of a

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Tom Lane
Andres Freund writes: > On 2024-04-11 23:15:38 -0400, Tom Lane wrote: >> On the third hand, it's still true that we have no comparable >> behavior for any other source of system lockups, and it's difficult >> to make a case that stuck spinlocks really need more concern than >> other kinds of

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Andres Freund
Hi, On 2024-04-11 23:15:38 -0400, Tom Lane wrote: > I wrote: > > ... But Robert's question remains: how does > > PANIC'ing after awhile make anything better? I flat out don't > > believe the idea that having a backend stuck on a spinlock > > would otherwise go undetected. > > Oh, wait. After

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Tom Lane
I wrote: > ... But Robert's question remains: how does > PANIC'ing after awhile make anything better? I flat out don't > believe the idea that having a backend stuck on a spinlock > would otherwise go undetected. Oh, wait. After thinking a bit longer I believe I recall the argument for this

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Tom Lane
Robert Haas writes: > On Thu, Apr 11, 2024 at 5:30 PM Andres Freund wrote: >> By far the most of the stuck spinlocks I've seen were due to bugs in >> out-of-core extensions. Absurdly enough, the next common thing probably is >> due >> to people using gdb to make an uninterruptible process break

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Robert Haas
On Thu, Apr 11, 2024 at 5:30 PM Andres Freund wrote: > I don't think that's a particularly apt comparison. If you have spinlocks that > cannot be acquired within tens of seconds, you're in a really bad situation, > regardless of whether you crash-restart or not. I agree with that. I just don't

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Andres Freund
Hi, On 2024-04-11 16:46:23 -0400, Robert Haas wrote: > On Thu, Apr 11, 2024 at 3:52 PM Andres Freund wrote: > > My suspicion is that most of the false positives are caused by lots of > > signals > > interrupting the pg_usleep()s. Because we measure the number of delays, not > > the actual time

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Andres Freund
Hi, On 2024-04-11 16:46:10 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2024-04-11 16:11:40 -0400, Tom Lane wrote: > >> We wouldn't need to fix it, if we simply removed the NUM_DELAYS > >> limit. Whatever kicked us off the sleep doesn't matter, we might > >> as well go check the

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Robert Haas
On Thu, Apr 11, 2024 at 3:52 PM Andres Freund wrote: > My suspicion is that most of the false positives are caused by lots of signals > interrupting the pg_usleep()s. Because we measure the number of delays, not > the actual time since we've been waiting for the spinlock, signals > interrupting

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Tom Lane
Andres Freund writes: > On 2024-04-11 16:11:40 -0400, Tom Lane wrote: >> We wouldn't need to fix it, if we simply removed the NUM_DELAYS >> limit. Whatever kicked us off the sleep doesn't matter, we might >> as well go check the spinlock. > I suspect we should fix it regardless of whether we

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Andres Freund
Hi, On 2024-04-11 16:11:40 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2024-04-11 15:24:28 -0400, Robert Haas wrote: > >> Or, rip out the whole, whole mechanism and just don't PANIC. > > > I continue believe that that'd be a quite bad idea. > > I'm warming to it myself. > > > My

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Tom Lane
Andres Freund writes: > On 2024-04-11 15:24:28 -0400, Robert Haas wrote: >> Or, rip out the whole, whole mechanism and just don't PANIC. > I continue believe that that'd be a quite bad idea. I'm warming to it myself. > My suspicion is that most of the false positives are caused by lots of

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Andres Freund
Hi, On 2024-04-10 21:52:59 -0400, Tom Lane wrote: > Less drastically, I wonder if we should call finish_spin_delay > at all in these off-label uses of perform_spin_delay. What > we're trying to measure there is the behavior of TAS() spin loops, > and I'm not sure that what LWLockWaitListLock and

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Tom Lane
Robert Haas writes: > On Wed, Apr 10, 2024 at 9:53 PM Tom Lane wrote: >> Maybe we should rip out the whole mechanism and hard-wire >> spins_per_delay at 1000 or so. > Or, rip out the whole, whole mechanism and just don't PANIC. By that you mean "remove the NUM_DELAYS limit", right? We still

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Andres Freund
Hi, On 2024-04-11 15:24:28 -0400, Robert Haas wrote: > On Wed, Apr 10, 2024 at 9:53 PM Tom Lane wrote: > > Maybe we should rip out the whole mechanism and hard-wire > > spins_per_delay at 1000 or so. > > Or, rip out the whole, whole mechanism and just don't PANIC. I continue believe that

Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Robert Haas
On Wed, Apr 10, 2024 at 9:53 PM Tom Lane wrote: > Maybe we should rip out the whole mechanism and hard-wire > spins_per_delay at 1000 or so. Or, rip out the whole, whole mechanism and just don't PANIC. I'm not 100% sure that's better, but I think it's worth considering. The thing is, if you're

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Andrey M. Borodin
> On 10 Apr 2024, at 21:48, Parag Paul wrote: > > Yes, the probability of this happening is astronomical, but in production > with 128 core servers with 7000 max_connections, with petabyte scale data, > this did repro 2 times in the last month. We had to move to a local approach > to

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
I wrote: > I'm not worried about it being slower, but about whether it could > report "stuck spinlock" in cases where the existing code succeeds. On fourth thought ... the number of tries to acquire the lock, or in this case number of tries to observe the lock free, is not NUM_DELAYS but

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Robert Haas writes: > I just want to mention that I have heard of "stuck spinlock" happening > in production just because the server was busy. And I think that's not > intended. The timeout is supposed to be high enough that you only hit > it if there's a bug in the code. At least AIUI. But it

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Robert Haas
On Wed, Apr 10, 2024 at 4:40 PM Tom Lane wrote: > I'm not worried about it being slower, but about whether it could > report "stuck spinlock" in cases where the existing code succeeds. > While that seems at least theoretically possible, it seems like > if you hit it you have got problems that

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Andres Freund writes: > On 2024-04-10 16:05:21 -0400, Tom Lane wrote: >> Yeah. So what's the conclusion? Leave it alone? Commit to >> HEAD only? > I think we should certainly fix it. I don't really have an opinion about > backpatching, it's just on the line between the two for me. > Hm. The

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Andres Freund
Hi, On 2024-04-10 16:05:21 -0400, Tom Lane wrote: > Andres Freund writes: > > I think it could exascerbate the issue. Parag reported ~7k connections on a > > 128 core machine. The buffer replacement logic in < 16 tries to lock the old > > and new lock partitions at once. That can lead to quite

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Andres Freund writes: > I think it could exascerbate the issue. Parag reported ~7k connections on a > 128 core machine. The buffer replacement logic in < 16 tries to lock the old > and new lock partitions at once. That can lead to quite bad "chains" of > dependent lwlocks, occasionally putting

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Andres Freund
Hi, On 2024-04-10 14:02:20 -0400, Tom Lane wrote: > On third thought ... while I still think this is a misuse of > perform_spin_delay and we should change it, I'm not sure it'll do > anything to address Parag's problem, because IIUC he's seeing actual > "stuck spinlock" reports. That implies

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Robert Haas writes: > The blog post to which Parag linked includes this histogram as an > example of a low-Hamming-weight situation: That's an interesting post indeed, but I'm not sure how relevant it is to us, because it is about Xoshiro not Xoroshiro, and the latter is what we use. The last

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Andres Freund writes: > On 2024-04-10 13:03:05 -0400, Tom Lane wrote: >> So I think we need something like the attached. > LGTM. On third thought ... while I still think this is a misuse of perform_spin_delay and we should change it, I'm not sure it'll do anything to address Parag's problem,

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Andres Freund
Hi, On 2024-04-10 13:03:05 -0400, Tom Lane wrote: > After thinking about this some more, it is fairly clear that that *is* > a mistake that can cause a thundering-herd problem. > Assume we have two or more backends waiting in perform_spin_delay, and for > whatever reason the scheduler wakes them

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Andres Freund writes: > Hi, > On 2024-04-10 12:28:10 -0400, Tom Lane wrote: >> I don't think it's correct to re-initialize the SpinDelayStatus each >> time around the outer loop. That state should persist through the >> entire acquire operation, as it does in a regular spinlock acquire. >> As

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Parag Paul writes: > Yes, the probability of this happening is astronomical, but in production > with 128 core servers with 7000 max_connections, with petabyte scale data, > this did repro 2 times in the last month. We had to move to a local > approach to manager our ratelimiting counters. > This

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Andres Freund
Hi, On 2024-04-10 12:28:10 -0400, Tom Lane wrote: > Actually ... Parag mentioned that this was specifically about > lwlock.c's usage of spinlocks. It doesn't really use a spinlock, > but it does use s_lock.c's delay logic, and I think it's got the > usage pattern wrong: > > while (true) >

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Andres Freund
Hi, On 2024-04-10 09:48:42 -0700, Parag Paul wrote: > Yes, the probability of this happening is astronomical, but in production > with 128 core servers with 7000 max_connections, with petabyte scale data, > this did repro 2 times in the last month. We had to move to a local > approach to manager

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
I wrote: > I don't think it's correct to re-initialize the SpinDelayStatus each > time around the outer loop. That state should persist through the > entire acquire operation, as it does in a regular spinlock acquire. > As this stands, it resets the delay to minimum each time around the > outer

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Parag Paul
Yes, the probability of this happening is astronomical, but in production with 128 core servers with 7000 max_connections, with petabyte scale data, this did repro 2 times in the last month. We had to move to a local approach to manager our ratelimiting counters. This is not reproducible very

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Robert Haas
On Wed, Apr 10, 2024 at 12:40 PM Parag Paul wrote: > The reason why this could be a problem is a flaw in the RNG with the enlarged > Hamming belt. > I attached an image here, with the RNG outputs from 2 backends. I ran our > code for weeks, and collected ther > values generated by the RNG over

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Parag Paul
hi Robert, We are using xoroshiro128 and not moved to the next state of art. We did see a lot of low values as put in my last message. -Parag On Wed, Apr 10, 2024 at 9:37 AM Robert Haas wrote: > On Wed, Apr 10, 2024 at 12:02 PM Tom Lane wrote: > > As I said to Parag, I see exactly no reason to

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Robert Haas
On Wed, Apr 10, 2024 at 12:02 PM Tom Lane wrote: > As I said to Parag, I see exactly no reason to believe that that's a > problem, unless it happens *a lot*, like hundreds of times in a row. > If it does, that's an RNG problem not s_lock's fault. Now, I'm not > going to say that xoroshiro can't

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Actually ... Parag mentioned that this was specifically about lwlock.c's usage of spinlocks. It doesn't really use a spinlock, but it does use s_lock.c's delay logic, and I think it's got the usage pattern wrong: while (true) { /* always try once to acquire lock directly */

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Andres Freund
Hi, On 2024-04-10 07:55:16 -0700, Parag Paul wrote: > This is a little bit more complex than that. The spinlocks are taken in the > LWLock(Mutex) code, when the lock is not available right away. > The spinlock is taken to attach the current backend to the wait list of the > LWLock. This means,

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Robert Haas writes: > Oh, yeah ... right. But then why does the comment say that it's > increasing the delay between a random fraction between 1X and 2X? I think the comment is meant to say that the new delay value will be 1X to 2X the old value. If you want to suggest different phrasing, feel

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Parag Paul writes: > So, if RNG generated 0.001 and cur_delay =1000. > Result will be > 1000 + int(1000*0.01 + 5) = (int)(1000 + (0.1+.5)) = (int)1000.6 = 1000 > <-- back to the same value Yes, with a sufficiently small RNG result, the sleep delay will not increase that time through the

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Robert Haas
On Wed, Apr 10, 2024 at 11:09 AM Tom Lane wrote: > No, I think you are misreading it, because the assignment is += not =. > The present coding is > > /* increase delay by a random fraction between 1X and 2X */ > status->cur_delay += (int) (status->cur_delay * >

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Parag Paul
hi Tom, First of all thanks for you response. I did not misread it. The 0.5 is added to the result of the multiplication which then uses C integer casting, which does not round off, but just drops the decimal portion. status->cur_delay += (int) (status->cur_delay *

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Tom Lane
Robert Haas writes: > I'm not convinced that we should try to improve the RNG, but surely we > need to put parentheses around pg_prng_double(_global_prng_state) + > 0.5. IIUC, the current logic is making us multiply the spin delay by a > value between 0 and 1 when what was intended was that it

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Parag Paul
Thank you Robert. I am in the process of patching this. -Parag On Wed, Apr 10, 2024 at 7:43 AM Robert Haas wrote: > On Tue, Apr 9, 2024 at 5:05 PM Andres Freund wrote: > > ISTM that the fix here is to not use a spinlock for whatever the > contention is > > on, rather than improve the RNG. > >

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Parag Paul
Hi Andres, This is a little bit more complex than that. The spinlocks are taken in the LWLock(Mutex) code, when the lock is not available right away. The spinlock is taken to attach the current backend to the wait list of the LWLock. This means, that this cannot be controlled. The repro when it

Re: Issue with the PRNG used by Postgres

2024-04-10 Thread Robert Haas
On Tue, Apr 9, 2024 at 5:05 PM Andres Freund wrote: > ISTM that the fix here is to not use a spinlock for whatever the contention is > on, rather than improve the RNG. I'm not convinced that we should try to improve the RNG, but surely we need to put parentheses around

Re: Issue with the PRNG used by Postgres

2024-04-09 Thread Andres Freund
Hi, On 2024-04-08 22:52:09 -0700, Parag Paul wrote: > We have an interesting problem, where PG went to PANIC due to stuck > spinlock case. > On careful analysis and hours of trying to reproduce this(something that > showed up in production after almost 2 weeks of stress run), I did some >

Issue with the PRNG used by Postgres

2024-04-08 Thread Parag Paul
hi all, We have an interesting problem, where PG went to PANIC due to stuck spinlock case. On careful analysis and hours of trying to reproduce this(something that showed up in production after almost 2 weeks of stress run), I did some statistical analysis on the RNG generator that PG uses to