Re: [HACKERS] LWLock deadlock and gdb advice
On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund and...@anarazel.de wrote: On 2015-08-02 17:04:07 +0200, Andres Freund wrote: I've attached a version of the patch that should address Heikki's concern. It imo also improves the API and increases debuggability by not having stale variable values in the variables anymore. (also attached is a minor optimization that Heikki has noticed) I've run these (from your git commit) for over 60 hours with no problem, so I think it's solved. If there are still problems, hopefully they will come out in other tests. I don't know why the gin test was efficient at provoking the problem while none of the other ones (btree-upsert, gist, brin, btree-foreign key, btree-prepare transaction) I've played around with. Perhaps it is just due to the amount of WAL which gin generates. Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-08-05 11:22:55 -0700, Jeff Janes wrote: On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund and...@anarazel.de wrote: On 2015-08-02 17:04:07 +0200, Andres Freund wrote: I've attached a version of the patch that should address Heikki's concern. It imo also improves the API and increases debuggability by not having stale variable values in the variables anymore. (also attached is a minor optimization that Heikki has noticed) I've run these (from your git commit) for over 60 hours with no problem, so I think it's solved. Cool! Thanks for the testing. If there are still problems, hopefully they will come out in other tests. I don't know why the gin test was efficient at provoking the problem while none of the other ones (btree-upsert, gist, brin, btree-foreign key, btree-prepare transaction) I've played around with. Perhaps it is just due to the amount of WAL which gin generates. I'm not sure either... I've been able to reproduce a version of the issue by judiciously sprinkling pg_usleep()'s over the code. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-08-02 12:33:06 -0400, Tom Lane wrote: Andres Freund and...@anarazel.de writes: I plan to commit the patch tomorrow, so it's included in alpha2. Please try to commit anything you want in alpha2 *today*. I'd prefer to see some successful buildfarm cycles on such patches before we wrap. Ok, will do that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
Andres Freund and...@anarazel.de writes: I plan to commit the patch tomorrow, so it's included in alpha2. Please try to commit anything you want in alpha2 *today*. I'd prefer to see some successful buildfarm cycles on such patches before we wrap. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
Hi Jeff, Heikki, On 2015-07-31 09:48:28 -0700, Jeff Janes wrote: I had run it for 24 hours, while it usually took less than 8 hours to look up before. I did see it within a few minutes one time when I checked out a new HEAD and then forgot to re-apply your or Heikki's patch. But now I've got the same situation again, after 15 hours, with your patch. This is probably all down to luck. The only differences that I can think of is that I advanced the base to e8e86fbc8b3619da54c, and turned on JJ_vac and set log_autovacuum_min_duration=0. It's quite possible that you hit the remaining race-condition that Heikki was talking about. So that'd make it actually likely to be hit slightly later - but as you say this is just a game of chance. I've attached a version of the patch that should address Heikki's concern. It imo also improves the API and increases debuggability by not having stale variable values in the variables anymore. (also attached is a minor optimization that Heikki has noticed) I plan to commit the patch tomorrow, so it's included in alpha2. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-08-02 17:04:07 +0200, Andres Freund wrote: I've attached a version of the patch that should address Heikki's concern. It imo also improves the API and increases debuggability by not having stale variable values in the variables anymore. (also attached is a minor optimization that Heikki has noticed) ... From 17b4e0692ebd91a92448b0abb8cf4438a93a29d6 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Fri, 31 Jul 2015 20:20:43 +0200 Subject: [PATCH 1/2] Fix issues around the variable support in the lwlock infrastructure. The lwlock scalability work introduced two race conditions into the lwlock variable support provided for xlog.c. First, and harmlessly on most platforms, it set/read the variable without the spinlock in some places. Secondly, due to the removal of the spinlock, it was possible that a backend missed changes to the variable's state if it changed in the wrong moment because checking the lock's state, the variable's state and the queuing are not protected by a single spinlock acquisition anymore. To fix first move resetting the variable's from LWLockAcquireWithVar to WALInsertLockRelease, via a new function LWLockReleaseClearVar. That prevents issues around waiting for a variable's value to change when a new locker has acquired the lock, but not yet set the value. Secondly re-check that the variable hasn't changed after enqueing, that prevents the issue that the lock has been released and already re-acquired by the time the woken up backend checks for the lock's state. Reported-By: Jeff Janes Analyzed-By: Heikki Linnakangas Reviewed-By: Heikki Linnakangas Discussion: 5592db35.2060...@iki.fi Backpatch: 9.5, where the lwlock scalability went in --- src/backend/access/transam/xlog.c | 34 --- src/backend/storage/lmgr/lwlock.c | 193 +- src/include/storage/lwlock.h | 2 +- 3 files changed, 129 insertions(+), 100 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1dd31b3..939813e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1408,9 +1408,7 @@ WALInsertLockAcquire(void) * The insertingAt value is initially set to 0, as we don't know our * insert location yet. */ - immed = LWLockAcquireWithVar(WALInsertLocks[MyLockNo].l.lock, - WALInsertLocks[MyLockNo].l.insertingAt, - 0); + immed = LWLockAcquire(WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE); if (!immed) { /* @@ -1435,26 +1433,28 @@ WALInsertLockAcquireExclusive(void) int i; /* - * When holding all the locks, we only update the last lock's insertingAt - * indicator. The others are set to 0x, which is higher - * than any real XLogRecPtr value, to make sure that no-one blocks waiting - * on those. + * When holding all the locks, all but the last lock's insertingAt + * indicator is set to 0x, which is higher than any real + * XLogRecPtr value, to make sure that no-one blocks waiting on those. */ for (i = 0; i NUM_XLOGINSERT_LOCKS - 1; i++) { - LWLockAcquireWithVar(WALInsertLocks[i].l.lock, - WALInsertLocks[i].l.insertingAt, - PG_UINT64_MAX); + LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE); + LWLockUpdateVar(WALInsertLocks[i].l.lock, + WALInsertLocks[i].l.insertingAt, + PG_UINT64_MAX); } - LWLockAcquireWithVar(WALInsertLocks[i].l.lock, - WALInsertLocks[i].l.insertingAt, - 0); + /* Variable value reset to 0 at release */ + LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE); holdingAllLocks = true; } /* * Release our insertion lock (or locks, if we're holding them all). + * + * NB: Reset all variables to 0, so they cause LWLockWaitForVar to block the + * next time the lock is acquired. */ static void WALInsertLockRelease(void) @@ -1464,13 +1464,17 @@ WALInsertLockRelease(void) int i; for (i = 0; i NUM_XLOGINSERT_LOCKS; i++) - LWLockRelease(WALInsertLocks[i].l.lock); + LWLockReleaseClearVar(WALInsertLocks[i].l.lock, + WALInsertLocks[i].l.insertingAt, + 0); holdingAllLocks = false; } else { - LWLockRelease(WALInsertLocks[MyLockNo].l.lock); + LWLockReleaseClearVar(WALInsertLocks[MyLockNo].l.lock, + WALInsertLocks[MyLockNo].l.insertingAt, + 0); } } diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index e5566d1..ae03eb1 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -10,13 +10,15 @@ * locking should be done with the full lock manager --- which depends on * LWLocks to protect its shared state. * - * In addition to exclusive and shared modes, lightweight locks can be used - * to wait until a variable changes value. The variable is initially set - * when the lock is acquired with LWLockAcquireWithVar, and can be updated + * In addition to exclusive and shared modes, lightweight locks can
Re: [HACKERS] LWLock deadlock and gdb advice
On Thu, Jul 30, 2015 at 8:22 PM, Andres Freund and...@anarazel.de wrote: On 2015-07-30 09:03:01 -0700, Jeff Janes wrote: On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote: What do you think about something roughly like the attached? I've not evaluated the code, but applying it does solve the problem I was seeing. Cool, thanks for testing! How long did you run the test and how long did it, on average, previously take for the issue to occur? I had run it for 24 hours, while it usually took less than 8 hours to look up before. I did see it within a few minutes one time when I checked out a new HEAD and then forgot to re-apply your or Heikki's patch. But now I've got the same situation again, after 15 hours, with your patch. This is probably all down to luck. The only differences that I can think of is that I advanced the base to e8e86fbc8b3619da54c, and turned on JJ_vac and set log_autovacuum_min_duration=0. checkpointer, 31516 (gdb) bt #0 0x003dcb6eaf27 in semop () from /lib64/libc.so.6 #1 0x006481df in PGSemaphoreLock (sema=0x7f84ccd3c5f8) at pg_sema.c:387 #2 0x006a5754 in LWLockAcquireCommon (l=0x7f84c42edf80, valptr=0x7f84c42edfa8, val=18446744073709551615) at lwlock.c:1041 #3 LWLockAcquireWithVar (l=0x7f84c42edf80, valptr=0x7f84c42edfa8, val=18446744073709551615) at lwlock.c:915 #4 0x004cbd86 in WALInsertLockAcquireExclusive () at xlog.c:1445 #5 0x004d4195 in CreateCheckPoint (flags=64) at xlog.c:8181 #6 0x0064ea53 in CheckpointerMain () at checkpointer.c:493 #7 0x004e328a in AuxiliaryProcessMain (argc=2, argv=0x7fff6eed1f50) at bootstrap.c:428 #8 0x006563f7 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:5167 #9 0x0065884a in reaper (postgres_signal_arg=value optimized out) at postmaster.c:2670 (gdb) f 2 (gdb) p *(lock-owner) $1 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x7f84ccd27c88, sem = {semId = 1480359975, semNum = 12}, waitStatus = 0, procLatch = {is_set = 1, is_shared = 1 '\001', owner_pid = 31540}, lxid = 3209, pid = 31540, pgprocno = 92, backendId = 12, databaseId = 16384, roleId = 10, recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 2 '\002', lwWaitLink = {prev = 0x7f84c42edf10, next = 0x7f84ccd38e60}, waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0}, myProcLocks = {{prev = 0x7f84ccd38bb8, next = 0x7f84ccd38bb8}, {prev = 0x7f84ccd38bc8, next = 0x7f84ccd38bc8}, {prev = 0x7f84ccd38bd8, next = 0x7f84ccd38bd8}, {prev = 0x7f84ccd38be8, next = 0x7f84ccd38be8}, {prev = 0x7f84ccd38bf8, next = 0x7f84ccd38bf8}, {prev = 0x7f84ccd38c08, next = 0x7f84ccd38c08}, {prev = 0x7f84ccd38c18, next = 0x7f84ccd38c18}, {prev = 0x7f84ccd38c28, next = 0x7f84ccd38c28}, {prev = 0x7f84ccd38c38, next = 0x7f84ccd38c38}, {prev = 0x7f84ccd38c48, next = 0x7f84ccd38c48}, {prev = 0x7f84ccd38c58, next = 0x7f84ccd38c58}, {prev = 0x7f84ccd38c68, next = 0x7f84ccd38c68}, {prev = 0x7f84ccd38c78, next = 0x7f84ccd38c78}, {prev = 0x7f84ccd38c88, next = 0x7f84ccd38c88}, {prev = 0x7f84ccb0ceb0, next = 0x7f84ccb0ceb0}, {prev = 0x7f84ccd38ca8, next = 0x7f84ccd38ca8}}, subxids = {xids = {0 repeats 64 times}}, backendLock = 0x7f84c42e9800, fpLockBits = 160528697655296, fpRelId = {0, 0, 0, 0, 0, 0, 0, 0, 0, 2679, 2610, 16413, 16411, 16415, 16414, 16408}, fpVXIDLock = 1 '\001', fpLocalTransactionId = 3209} backend 31540: #0 0x003dcb6eaf27 in semop () from /lib64/libc.so.6 #1 0x006481df in PGSemaphoreLock (sema=0x7f84ccd38b38) at pg_sema.c:387 #2 0x006a53d4 in LWLockWaitForVar (lock=0x7f84c42edf00, valptr=0x7f84c42edf28, oldval=0, newval=0x7fff6eeceaf8) at lwlock.c:1438 #3 0x004d3276 in WaitXLogInsertionsToFinish (upto=183341449216) at xlog.c:1576 #4 0x004d3b33 in AdvanceXLInsertBuffer (upto=183345636352, opportunistic=value optimized out) at xlog.c:1888 #5 0x004d8df2 in GetXLogBuffer (ptr=183345636352) at xlog.c:1669 #6 0x004d8fea in CopyXLogRecordToWAL (rdata=0xbd2410, fpw_lsn=value optimized out) at xlog.c:1276 #7 XLogInsertRecord (rdata=0xbd2410, fpw_lsn=value optimized out) at xlog.c:1008 #8 0x004dc72c in XLogInsert (rmid=13 '\r', info=32 ' ') at xloginsert.c:453 #9 0x00478f41 in ginPlaceToPage (btree=0x7fff6eecee20, stack=0x1be11e0, insertdata=value optimized out, updateblkno=value optimized out, childbuf=0, buildStats=0x0) at ginbtree.c:418 #10 0x004798a3 in ginInsertValue (btree=0x7fff6eecee20, stack=0x1be11e0, insertdata=0x7fff6eeceea0, buildStats=0x0) at ginbtree.c:748 #11 0x004734aa in ginEntryInsert (ginstate=0x7fff6eecf200, attnum=39408, key=1, category=value optimized out, items=0x1c81c20, nitem=54, buildStats=0x0) at gininsert.c:234 #12 0x0047edbb in ginInsertCleanup (ginstate=0x7fff6eecf200, vac_delay=value
Re: [HACKERS] LWLock deadlock and gdb advice
On 07/30/2015 09:14 PM, Andres Freund wrote: On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote: In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held, until it has updated the variable. And LWLockWaitForVar() holds the spinlock when it checks that the lock is held and that the variable's value matches. So it cannot happen on 9.4. The first paragraph talked about the same value, but that was just referring to it not yet having been cleared i think... To reiterate, with 9.5, it's possible that a backend is sleeping in LWLockWaitForVar(oldvar=123), even though the lock is currently held by another backend with value 124. That seems wrong, or surprising at the very least. With my patch that can't really happen that way though? The value is re-checked after queuing. If it has changed by then we're done. And if it hasn't yet changed we're guaranteed to be woken up once it's being changed? Ok, let me try explaining it again. With your patch, LWLockAcquireWithVar looks like this (simplified, assuming the lock is free): 1. Set LW_VAL_EXCLUSIVE 3. (Acquire spinlock) Set *valptr = val (Release spinlock) LWLockWaitForVar looks like this: 1. Check if lock is free 2. (Acquire Spinlock) Read *valptr, compare with oldval (Release spinlock) 3. LWLockQueueSelf 4. Re-check that *valptr is still equal to oldval 5. sleep. This is the race condition: Backend A Backend B - - LWLockAcquireWithVar(123) Set LW_VAL_EXCLUSIVE Set *valptr = 123 LWLockWaitForVar(123) See that the lock is held Read *valptr, it matches LWLockQueueSelf LWLockRelease() LWLockAcquireWithVar(124) Set LW_VAL_EXCLUSIVE wakes up See that the lock is still/again held Read *valptr, it's still 123 LWLockQueueSelf Re-check that *valptr is still 123. go to sleep. Set *valptr = 124 Now, Backend B's LWLockWaitForVar(123) call is sleeping, even though the lock was free'd and reacquired with different value, 124. It won't wake up until A updates the value or releases the lock again. This was not possible in 9.4. It was possible in 9.4 too when the value was same in both LWLockAcquireWithVar() calls, and I think that's acceptable, but not when the values differ. I generaly don't mind adding some sort of flag clearing or such, but I'd rather not have it in the retry loop in the general LWLockAttemptLock path - I found that very small amounts of instructions in there have a measurable impact. I doubt clearing a bit would matter there, although you might have a better instinct on that... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 07/29/2015 09:35 PM, Andres Freund wrote: On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote: Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on it. The lock holder releases the lock, and wakes up A. But before A wakes up and sees that the lock is free, another backend acquires the lock again. It runs LWLockAcquireWithVar to the point just before setting the variable's value. Now A wakes up, sees that the lock is still (or again) held, and that the variable's value still matches the old one, and goes back to sleep. The new lock holder won't wake it up until it updates the value again, or to releases the lock. I'm not sure whether this observation is about my patch or the general lwlock variable mechanism. In my opinion that behaviour exists today both in 9.4 and 9.5. In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held, until it has updated the variable. And LWLockWaitForVar() holds the spinlock when it checks that the lock is held and that the variable's value matches. So it cannot happen on 9.4. To reiterate, with 9.5, it's possible that a backend is sleeping in LWLockWaitForVar(oldvar=123), even though the lock is currently held by another backend with value 124. That seems wrong, or surprising at the very least. But I think that's fine because that race seems pretty fundamental. After all, you could have called LWLockWaitForVar() just after the second locker had set the variable to the same value. I'm not talking about setting it to the same value. I'm talking about setting it to a different value. (I talked about setting it to the same value later in the email, and I agree that's a pretty fundamental problem and exists with 9.4 as well). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-30 09:03:01 -0700, Jeff Janes wrote: On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote: What do you think about something roughly like the attached? I've not evaluated the code, but applying it does solve the problem I was seeing. Cool, thanks for testing! How long did you run the test and how long did it, on average, previously take for the issue to occur? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-29 14:22:23 +0200, Andres Freund wrote: On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? I'll try. Shouldn't be too hard. What do you think about something roughly like the attached? I've not evaluated the code, but applying it does solve the problem I was seeing. Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote: In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held, until it has updated the variable. And LWLockWaitForVar() holds the spinlock when it checks that the lock is held and that the variable's value matches. So it cannot happen on 9.4. The first paragraph talked about the same value, but that was just referring to it not yet having been cleared i think... To reiterate, with 9.5, it's possible that a backend is sleeping in LWLockWaitForVar(oldvar=123), even though the lock is currently held by another backend with value 124. That seems wrong, or surprising at the very least. With my patch that can't really happen that way though? The value is re-checked after queuing. If it has changed by then we're done. And if it hasn't yet changed we're guaranteed to be woken up once it's being changed? I generaly don't mind adding some sort of flag clearing or such, but I'd rather not have it in the retry loop in the general LWLockAttemptLock path - I found that very small amounts of instructions in there have a measurable impact. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote: Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on it. The lock holder releases the lock, and wakes up A. But before A wakes up and sees that the lock is free, another backend acquires the lock again. It runs LWLockAcquireWithVar to the point just before setting the variable's value. Now A wakes up, sees that the lock is still (or again) held, and that the variable's value still matches the old one, and goes back to sleep. The new lock holder won't wake it up until it updates the value again, or to releases the lock. I'm not sure whether this observation is about my patch or the general lwlock variable mechanism. In my opinion that behaviour exists today both in 9.4 and 9.5. But I think that's fine because that race seems pretty fundamental. After all, you could have called LWLockWaitForVar() just after the second locker had set the variable to the same value. You didn't like the new LW_FLAG_VAR_SET flag and the API changes I proposed? I do slightly dislike the additional bit of math in AttemptLock. But what I really disliked about the patch is the reliance on holding the spinlock over longer times and conditionally acquiring spinlocks. I didn't see a need for the flags change to fix the issue at hand, that's why I didn't incorporate it. I'm not fundamentally against it. Either way, there is a race condition that if the new lock holder sets the variable to the *same* value as before, the old waiter won't necessarily wake up even though the lock was free or had a different value in between. But that's easier to explain and understand than the fact that the value set by LWLockAcquireWithVar() might not be seen by a waiter, until you release the lock or update it again. I can't really se a use for the API that'd allow that and care about the waits. Because quite fundamentally you could just started waiting after the variable was set to the value at the same time. Another idea is to have LWLockAcquire check the wait queue after setting the variable, and wake up anyone who's already queued up. Ugh, not a fan of that. The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK in principle. You'll want to change LWLockConflictsWithVar() so that the spinlock is held only over the value = *valptr line, though; the other stuff just modifies local variables and don't need to be protected by the spinlock. good point., Also, when you enter LWLockWaitForVar(), you're checking if the lock is held twice in a row; first at the top of the function, and again inside LWLockConflictsWithVar. You could just remove the quick test at the top. Yea, I was thinking about removing it. The first check was there previously, so I left it in place. We do execute a bit more code once we've disabled interrupts (extraWaits, re-enabling interrupts). I don't think it'll matter much, but it seemed like an independent thing. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-29 09:23:32 -0700, Jeff Janes wrote: On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've reproduced it again against commit b2ed8edeecd715c8a23ae462. It took 5 hours on a 8 core Intel(R) Xeon(R) CPU E5-2650. I also reproduced it in 3 hours on the same machine with both JJ_torn_page and JJ_xid set to zero (i.e. turned off, no induced crashes), so the fault-injection patch should not be necessary to get the issue.. Hm, that's a single socket or dual socket E5-2650 system? That CPU itself has 8 cores, and can work in a dual socket system, that's why I ask. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund and...@anarazel.de wrote: Hi, On 2015-07-19 11:49:14 -0700, Jeff Janes wrote: After applying this patch to commit fdf28853ae6a397497b79f, it has survived testing long enough to convince that this fixes the problem. What was the actual workload breaking with the bug? I ran a small variety and I couldn't reproduce it yet. I'm not saying there's no bug, I just would like to be able to test my version of the fixes... It was the torn-page fault-injection code here: https://drive.google.com/open?id=0Bzqrh1SO9FcEfkxFb05uQnJ2cWg0MEpmOXlhbFdyNEItNmpuek1zU2gySGF3Vk1oYXNNLUE It is not a minimal set, I don't know if all parts of this are necessary to rerproduce it. The whole crash-recovery cycling might not even be important. I've reproduced it again against commit b2ed8edeecd715c8a23ae462. It took 5 hours on a 8 core Intel(R) Xeon(R) CPU E5-2650. I also reproduced it in 3 hours on the same machine with both JJ_torn_page and JJ_xid set to zero (i.e. turned off, no induced crashes), so the fault-injection patch should not be necessary to get the issue.. Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On Wed, Jul 29, 2015 at 9:26 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-29 09:23:32 -0700, Jeff Janes wrote: On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've reproduced it again against commit b2ed8edeecd715c8a23ae462. It took 5 hours on a 8 core Intel(R) Xeon(R) CPU E5-2650. I also reproduced it in 3 hours on the same machine with both JJ_torn_page and JJ_xid set to zero (i.e. turned off, no induced crashes), so the fault-injection patch should not be necessary to get the issue.. Hm, that's a single socket or dual socket E5-2650 system? That CPU itself has 8 cores, and can work in a dual socket system, that's why I ask. It is a virtual machine, and I think a property of the VM software is that any given virtual machine can only run on one socket of the underlying hardware. The real machine is dual socket, though. Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On 07/29/2015 04:10 PM, Andres Freund wrote: On 2015-07-29 14:22:23 +0200, Andres Freund wrote: On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? I'll try. Shouldn't be too hard. What do you think about something roughly like the attached? Hmm. Imagine this: Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on it. The lock holder releases the lock, and wakes up A. But before A wakes up and sees that the lock is free, another backend acquires the lock again. It runs LWLockAcquireWithVar to the point just before setting the variable's value. Now A wakes up, sees that the lock is still (or again) held, and that the variable's value still matches the old one, and goes back to sleep. The new lock holder won't wake it up until it updates the value again, or to releases the lock. I think that's OK given the current usage of this in WALInsertLocks, but it's scary. At the very least you need comments to explain that race condition. You didn't like the new LW_FLAG_VAR_SET flag and the API changes I proposed? That eliminates the race condition. Either way, there is a race condition that if the new lock holder sets the variable to the *same* value as before, the old waiter won't necessarily wake up even though the lock was free or had a different value in between. But that's easier to explain and understand than the fact that the value set by LWLockAcquireWithVar() might not be seen by a waiter, until you release the lock or update it again. Another idea is to have LWLockAcquire check the wait queue after setting the variable, and wake up anyone who's already queued up. You could just call LWLockUpdateVar() from LWLockAcquireCommon to set the variable. I would prefer the approach from my previous patch more, though. That would avoid having to acquire the spinlock in LWLockAcquire altogether, and I actually like the API change from that independently from any correctness issues. The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK in principle. You'll want to change LWLockConflictsWithVar() so that the spinlock is held only over the value = *valptr line, though; the other stuff just modifies local variables and don't need to be protected by the spinlock. Also, when you enter LWLockWaitForVar(), you're checking if the lock is held twice in a row; first at the top of the function, and again inside LWLockConflictsWithVar. You could just remove the quick test at the top. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 07/29/2015 02:39 PM, Andres Freund wrote: On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote: Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. This passes make check, but I haven't done any testing beyond that. Does this look sane to you? The prime thing I dislike about the patch is how long it now holds the spinlock in WaitForVar. I don't understand why that's necessary? There's no need to hold a spinlock until after the mustwait = (pg_atomic_read_u32(lock-state) LW_VAL_EXCLUSIVE) != 0; unless I miss something? In an earlier email you say: After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. But that's not a problem. The updater in that case will have queued a wakeup for all waiters, including WaitForVar()? If you release the spinlock before LWLockQueueSelf(), then no. It's possible that the backend you wanted to wait for updates the variable's value before you've queued up. Or even releases the lock, and it gets re-acquired by another backend, before you've queued up. Or are you thinking of just moving the SpinLockRelease to after LWLockQueueSelf(), i.e. to the other side of the mustwait = ... line? That'd probably be ok. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
Hi, Finally getting to this. On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote: Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. This passes make check, but I haven't done any testing beyond that. Does this look sane to you? The prime thing I dislike about the patch is how long it now holds the spinlock in WaitForVar. I don't understand why that's necessary? There's no need to hold a spinlock until after the mustwait = (pg_atomic_read_u32(lock-state) LW_VAL_EXCLUSIVE) != 0; unless I miss something? In an earlier email you say: After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. But that's not a problem. The updater in that case will have queued a wakeup for all waiters, including WaitForVar()? I'll try to reproduce the problem now. But I do wonder if it's possibly just the missing spinlock during the update. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? I'll try. Shouldn't be too hard. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-29 14:22:23 +0200, Andres Freund wrote: On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? I'll try. Shouldn't be too hard. What do you think about something roughly like the attached? - Andres From 2879408d8da7714ab6af6238dfe1c8a535800af3 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 29 Jul 2015 15:06:54 +0200 Subject: [PATCH] Other way to fix the issues around the broken LWLock variable support. --- src/backend/storage/lmgr/lwlock.c | 105 +- 1 file changed, 69 insertions(+), 36 deletions(-) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index e5566d1..c64f20d 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -1066,7 +1066,15 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val) /* If there's a variable associated with this lock, initialize it */ if (valptr) + { +#ifdef LWLOCK_STATS + lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); +#else + SpinLockAcquire(lock-mutex); +#endif *valptr = val; + SpinLockRelease(lock-mutex); + } TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), T_ID(lock), mode); @@ -1259,6 +1267,60 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) } /* + * Does the lwlock in its current state need to wait for the variable value to + * change? + * + * If we don't need to wait, and it's because the value of the variable has + * changed, store the current value in newval. + * + * *result is set to true if the lock was free, and false otherwise. + */ +static bool +LWLockConflictsWithVar(LWLock *lock, + uint64 *valptr, uint64 oldval, uint64 *newval, + bool *result) +{ + bool mustwait; + uint64 value; + + mustwait = (pg_atomic_read_u32(lock-state) LW_VAL_EXCLUSIVE) != 0; + + if (!mustwait) + { + *result = true; + return false; + } + + /* + * Perform comparison using spinlock as we can't rely on atomic 64 bit + * reads/stores. On platforms with a way to do atomic 64 bit reads/writes + * the spinlock can be optimized away. + */ +#ifdef LWLOCK_STATS + lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); +#else + SpinLockAcquire(lock-mutex); +#endif + + *result = false; + + value = *valptr; + if (value != oldval) + { + mustwait = false; + *newval = value; + } + else + { + mustwait = true; + } + + SpinLockRelease(lock-mutex); + + return mustwait; +} + +/* * LWLockWaitForVar - Wait until lock is free, or a variable is updated. * * If the lock is held and *valptr equals oldval, waits until the lock is @@ -1313,39 +1375,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) for (;;) { bool mustwait; - uint64 value; - - mustwait = (pg_atomic_read_u32(lock-state) LW_VAL_EXCLUSIVE) != 0; - - if (mustwait) - { - /* - * Perform comparison using spinlock as we can't rely on atomic 64 - * bit reads/stores. - */ -#ifdef LWLOCK_STATS - lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); -#else - SpinLockAcquire(lock-mutex); -#endif - /* - * XXX: We can significantly optimize this on platforms with 64bit - * atomics. - */ - value = *valptr; - if (value != oldval) - { -result = false; -mustwait = false; -*newval = value; - } - else -mustwait = true; - SpinLockRelease(lock-mutex); - } - else - mustwait = false; + mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval, + result); if (!mustwait) break;/* the lock was free or value didn't match */ @@ -1365,12 +1397,13 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) pg_atomic_fetch_or_u32(lock-state, LW_FLAG_RELEASE_OK); /* - * We're now guaranteed to be woken up if necessary. Recheck the - * lock's state. + * We're now guaranteed to be woken up if necessary. Recheck the lock + * and variables state. */ - mustwait = (pg_atomic_read_u32(lock-state) LW_VAL_EXCLUSIVE) != 0; + mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval, + result); - /* Ok, lock is free after we queued ourselves. Undo queueing. */ + /* Ok, no conflict after we queued ourselves. Undo queueing. */ if (!mustwait) { LOG_LWDEBUG(LWLockWaitForVar, lock, free, undoing queue); -- 2.4.0.rc2.1.g3d6bc9a -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote: On 07/29/2015 02:39 PM, Andres Freund wrote: In an earlier email you say: After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. But that's not a problem. The updater in that case will have queued a wakeup for all waiters, including WaitForVar()? If you release the spinlock before LWLockQueueSelf(), then no. It's possible that the backend you wanted to wait for updates the variable's value before you've queued up. Or even releases the lock, and it gets re-acquired by another backend, before you've queued up. For normal locks the equivalent problem is solved by re-checking wether a conflicting lock is still held after enqueuing. Why don't we do the same here? Doing it that way has the big advantage that we can just remove the spinlocks entirely on platforms with atomic 64bit reads/writes. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 07/29/2015 03:08 PM, Andres Freund wrote: On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote: On 07/29/2015 02:39 PM, Andres Freund wrote: In an earlier email you say: After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. But that's not a problem. The updater in that case will have queued a wakeup for all waiters, including WaitForVar()? If you release the spinlock before LWLockQueueSelf(), then no. It's possible that the backend you wanted to wait for updates the variable's value before you've queued up. Or even releases the lock, and it gets re-acquired by another backend, before you've queued up. For normal locks the equivalent problem is solved by re-checking wether a conflicting lock is still held after enqueuing. Why don't we do the same here? Doing it that way has the big advantage that we can just remove the spinlocks entirely on platforms with atomic 64bit reads/writes. Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
Hi, On 2015-07-19 11:49:14 -0700, Jeff Janes wrote: After applying this patch to commit fdf28853ae6a397497b79f, it has survived testing long enough to convince that this fixes the problem. What was the actual workload breaking with the bug? I ran a small variety and I couldn't reproduce it yet. I'm not saying there's no bug, I just would like to be able to test my version of the fixes... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund and...@anarazel.de wrote: Hi, On 2015-07-19 11:49:14 -0700, Jeff Janes wrote: After applying this patch to commit fdf28853ae6a397497b79f, it has survived testing long enough to convince that this fixes the problem. What was the actual workload breaking with the bug? I ran a small variety and I couldn't reproduce it yet. I'm not saying there's no bug, I just would like to be able to test my version of the fixes... It was the torn-page fault-injection code here: https://drive.google.com/open?id=0Bzqrh1SO9FcEfkxFb05uQnJ2cWg0MEpmOXlhbFdyNEItNmpuek1zU2gySGF3Vk1oYXNNLUE It is not a minimal set, I don't know if all parts of this are necessary to rerproduce it. The whole crash-recovery cycling might not even be important. Compiled with: ./configure --enable-debug --with-libxml --with-perl --with-python --with-ldap --with-openssl --with-gssapi --prefix=/home/jjanes/pgsql/torn_bisect/ (Also with or without --enable-cassert). I just ran sh do.sh do.out and eventually it stopped producing output, and I find everything hung up. Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-16 PM 04:03, Jeff Janes wrote: On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet. I ran this for a while without casserts and it seems to work. But with casserts, I get failures in the autovac process on the GIN index. I don't see how this is related to the LWLock issue, but I didn't see it without your patch. Perhaps the system just didn't survive long enough to uncover it without the patch (although it shows up pretty quickly). It could just be an overzealous Assert, since the casserts off didn't show problems. bt and bt full are shown below. I got a similar assert failure but with a btree index (pg_attribute_relid_attnum_index). The backtrace looks like Jeff's: (gdb) bt #0 0x003969632625 in raise () from /lib64/libc.so.6 #1 0x003969633e05 in abort () from /lib64/libc.so.6 #2 0x0092eb9e in ExceptionalCondition (conditionName=0x9c2220 !(((PageHeader) (page))-pd_special = (__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9c0c41 FailedAssertion, fileName=0x9c0c10 nbtree.c, lineNumber=903) at assert.c:54 #3 0x004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9, orig_blkno=9) at nbtree.c:903 #4 0x004e0067 in btvacuumscan (info=0x7fff2c765cd0, stats=0x279f7d0, callback=0x668f6d lazy_tid_reaped, callback_state=0x279e338, cycleid=49190) at nbtree.c:821 #5 0x004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676 #6 0x00939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0, collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at fmgr.c:1375 #7 0x004d2a01 in index_bulk_delete (info=0x7fff2c765cd0, stats=0x0, callback=0x668f6d lazy_tid_reaped, callback_state=0x279e338) at indexam.c:690 #8 0x0066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0, stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367 #9 0x006678a8 in lazy_scan_heap (onerel=0x274ec90, vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at vacuumlazy.c:1098 #10 0x006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99, params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244 #11 0x00665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0, options=99, params=0x27bdc88) at vacuum.c:1388 #12 0x006643ce in vacuum (options=99, relation=0x7fff2c7662a0, relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18, isTopLevel=1 '\001') at vacuum.c:293 #13 0x0075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80, bstrategy=0x27bdd18) at autovacuum.c:2807 #14 0x0075c632 in do_autovacuum () at autovacuum.c:2328 #15 0x0075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1647 #16 0x0075b0a5 in StartAutoVacWorker () at autovacuum.c:1454 #17 0x0076f9cc in StartAutovacuumWorker () at postmaster.c:5261 #18 0x0076f28a in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:4918 #19 signal handler called #20 0x0039696e1353 in __select_nocancel () from /lib64/libc.so.6 #21 0x0076ace7 in ServerLoop () at postmaster.c:1582 #22 0x0076a538 in PostmasterMain (argc=3, argv=0x26f9330) at postmaster.c:1263 #23 0x006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223 Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet. I ran this for a while without casserts and it seems to work. But with casserts, I get failures in the autovac process on the GIN index. I don't see how this is related to the LWLock issue, but I didn't see it without your patch. Perhaps the system just didn't survive long enough to uncover it without the patch (although it shows up pretty quickly). It could just be an overzealous Assert, since the casserts off didn't show problems. bt and bt full are shown below. Cheers, Jeff #0 0x003dcb632625 in raise () from /lib64/libc.so.6 #1 0x003dcb633e05 in abort () from /lib64/libc.so.6 #2 0x00930b7a in ExceptionalCondition ( conditionName=0x9a1440 !(((PageHeader) (page))-pd_special = (__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc FailedAssertion, fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54 #3 0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at ginvacuum.c:713 It now looks like this *is* unrelated to the LWLock issue. The assert that it is tripping over was added just recently (302ac7f27197855afa8c) and so I had not been testing under its presence until now. It looks like it is finding all-zero pages (index extended but then a crash before initializing the page?) and it doesn't like them. (gdb) f 3 (gdb) p *(char[8192]*)(page) $11 = '\000' repeats 8191 times Presumably before this assert, such pages would just be permanently orphaned. Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/30/2015 11:24 PM, Andres Freund wrote: On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote: Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Yeah. It's probably more efficient to not release the spinlock between checking the value and LWLockQueueSelf() though. Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that way in a bunch of callsites... So that'd be harder. Additionally I'm planning to get rid of the spinlocks around queuing (they show up as bottlenecks in contended workloads), so it seems more future proof not to assume that either way. On top of that I think we should, when available (or using the same type of fallback as for 32bit?), use 64 bit atomics for the var anyway. I'll take a stab at fixing this tomorrow. Thanks! Do you mean both or just the second issue? Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet. This passes make check, but I haven't done any testing beyond that. Does this look sane to you? After applying this patch to commit fdf28853ae6a397497b79f, it has survived testing long enough to convince that this fixes the problem. Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet. I ran this for a while without casserts and it seems to work. But with casserts, I get failures in the autovac process on the GIN index. I don't see how this is related to the LWLock issue, but I didn't see it without your patch. Perhaps the system just didn't survive long enough to uncover it without the patch (although it shows up pretty quickly). It could just be an overzealous Assert, since the casserts off didn't show problems. bt and bt full are shown below. Cheers, Jeff #0 0x003dcb632625 in raise () from /lib64/libc.so.6 #1 0x003dcb633e05 in abort () from /lib64/libc.so.6 #2 0x00930b7a in ExceptionalCondition ( conditionName=0x9a1440 !(((PageHeader) (page))-pd_special = (__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc FailedAssertion, fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54 #3 0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at ginvacuum.c:713 #4 0x0093b53a in FunctionCall2Coll (flinfo=0x7fffee073e60, collation=0, arg1=140737186840448, arg2=0) at fmgr.c:1323 #5 0x004d5c7a in index_vacuum_cleanup (info=0x7fffee073f80, stats=0x0) at indexam.c:717 #6 0x00664f69 in lazy_cleanup_index (indrel=0x7faafbcace20, stats=0x0, vacrelstats=0x28809c8) at vacuumlazy.c:1400 #7 0x00664142 in lazy_scan_heap (onerel=0x7faafbcab6d0, vacrelstats=0x28809c8, Irel=0x2881090, nindexes=2, scan_all=1 '\001') at vacuumlazy.c: #8 0x00662905 in lazy_vacuum_rel (onerel=0x7faafbcab6d0, options=65, params=0x286ea00, bstrategy=0x286ea90) at vacuumlazy.c:247 #9 0x006623e4 in vacuum_rel (relid=16409, relation=0x7fffee074550, options=65, params=0x286ea00) at vacuum.c:1377 #10 0x00660bea in vacuum (options=65, relation=0x7fffee074550, relid=16409, params=0x286ea00, va_cols=0x0, bstrategy=0x286ea90, isTopLevel=1 '\001') at vacuum.c:295 #11 0x0075caa9 in autovacuum_do_vac_analyze (tab=0x286e9f8, bstrategy=0x286ea90) at autovacuum.c:2811 #12 0x0075be67 in do_autovacuum () at autovacuum.c:2331 #13 0x0075ac1c in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1648 #14 0x0075a84d in StartAutoVacWorker () at autovacuum.c:1455 #15 0x0076f745 in StartAutovacuumWorker () at postmaster.c:5261 #16 0x0076effd in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:4918 #17 signal handler called #18 0x003dcb6e1353 in __select_nocancel () from /lib64/libc.so.6 #19 0x0076a8f0 in ServerLoop () at postmaster.c:1582 #20 0x0076a141 in PostmasterMain (argc=4, argv=0x27b2cf0) at postmaster.c:1263 #21 0x006c1e6e in main (argc=4, argv=0x27b2cf0) at main.c:223 #0 0x003dcb632625 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x003dcb633e05 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00930b7a in ExceptionalCondition ( conditionName=0x9a1440 !(((PageHeader) (page))-pd_special = (__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc FailedAssertion, fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54 No locals. #3 0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at ginvacuum.c:713 buffer = 186 page = 0x7faafc565b20 info = 0x7fffee073f80 stats = 0x2880858 index = 0x7faafbcace20 needLock = 1 '\001' npages = 22569 blkno = 12025 totFreePages = 11502 ginstate = {index = 0x7faafbcace20, oneCol = 1 '\001', origTupdesc = 0x7faafbcad150, tupdesc = {0x7faafbcad150, 0x0 repeats 31 times}, compareFn = {{fn_addr = 0x90224d bttextcmp, fn_oid = 360, fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000', fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0} repeats 31 times}, extractValueFn = {{ fn_addr = 0x494adc
Re: [HACKERS] LWLock deadlock and gdb advice
On 06/30/2015 11:24 PM, Andres Freund wrote: On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote: Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Yeah. It's probably more efficient to not release the spinlock between checking the value and LWLockQueueSelf() though. Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that way in a bunch of callsites... So that'd be harder. Additionally I'm planning to get rid of the spinlocks around queuing (they show up as bottlenecks in contended workloads), so it seems more future proof not to assume that either way. On top of that I think we should, when available (or using the same type of fallback as for 32bit?), use 64 bit atomics for the var anyway. I'll take a stab at fixing this tomorrow. Thanks! Do you mean both or just the second issue? Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet. This passes make check, but I haven't done any testing beyond that. Does this look sane to you? - Heikki From ce90bd9a1e2c8b5783ebeea83594da7d3a1c63de Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakangas@iki.fi Date: Wed, 15 Jul 2015 18:33:28 +0300 Subject: [PATCH 1/1] Fix LWLock variable support, broken by the lwlock scalability patch --- src/backend/access/transam/xlog.c | 15 ++--- src/backend/storage/lmgr/lwlock.c | 135 ++ src/include/storage/lwlock.h | 1 - 3 files changed, 83 insertions(+), 68 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1dd31b3..2f34346 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1408,9 +1408,7 @@ WALInsertLockAcquire(void) * The insertingAt value is initially set to 0, as we don't know our * insert location yet. */ - immed = LWLockAcquireWithVar(WALInsertLocks[MyLockNo].l.lock, - WALInsertLocks[MyLockNo].l.insertingAt, - 0); + immed = LWLockAcquire(WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE); if (!immed) { /* @@ -1442,13 +1440,12 @@ WALInsertLockAcquireExclusive(void) */ for (i = 0; i NUM_XLOGINSERT_LOCKS - 1; i++) { - LWLockAcquireWithVar(WALInsertLocks[i].l.lock, - WALInsertLocks[i].l.insertingAt, - PG_UINT64_MAX); + LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE); + LWLockUpdateVar(WALInsertLocks[i].l.lock, + WALInsertLocks[i].l.insertingAt, + PG_UINT64_MAX); } - LWLockAcquireWithVar(WALInsertLocks[i].l.lock, - WALInsertLocks[i].l.insertingAt, - 0); + LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE); holdingAllLocks = true; } diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 46cab49..92a1aef 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -11,12 +11,12 @@ * LWLocks to protect its shared state. * * In addition to exclusive and shared modes, lightweight locks can be used - * to wait until a variable changes value. The variable is initially set - * when the lock is acquired with LWLockAcquireWithVar, and can be updated - * without releasing the lock by calling LWLockUpdateVar. LWLockWaitForVar - * waits for the variable to be updated, or until the lock is free. The - * meaning of the variable is up to the caller, the lightweight lock code - * just assigns and compares it. + * to wait until a variable changes value. The variable is initially put + * in a not-set state when the lock is acquired with LWLockAcquire, and + * can be updated without releasing the lock by calling LWLockUpdateVar. + * LWLockWaitForVar waits for the variable to be updated, or until the lock + * is free. The meaning of the variable is up to the caller, the lightweight + * lock code just assigns and compares it. * * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California @@ -99,6 +99,7 @@ extern slock_t *ShmemLock; #define LW_FLAG_HAS_WAITERS ((uint32) 1 30) #define
Re: [HACKERS] LWLock deadlock and gdb advice
On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. I hadn't planned on running into this problem, so had not compiled accordingly. I thought LOCK_DEBUG would produce too much output, but now I see it doesn't print anything unless Trace_lwlocks is on (but it still makes more info available to gdb, as Amit mentioned), so I think that should be OK. Since I messed up the gdb session causing the postmaster to SIGKILL all the children and destroy the evidence, I'll repeat the run compiled with LOCK_DEBUG and see what it looks like in the morning. I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6 and is blocked on 7. I'm not sure where to go from here. 24841: (gdb) f 2 (gdb) p held_lwlocks $1 = {{lock = 0x7f70e11fae40, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13df080, mode = LW_EXCLUSIVE}, {lock = 0x7f70e11d4280, mode = LW_EXCLUSIVE}, {lock = 0x7f70e11d4280, mode = LW_EXCLUSIVE}, {lock = 0x0, mode = LW_EXCLUSIVE} repeats 196 times} (gdb) p T_NAME((LWLock *) held_lwlocks[1]) $2 = 0x84cee1 WALInsertLocks (gdb) p T_ID((LWLock *) held_lwlocks[1]) (gdb) p lock $3 = (LWLock *) 0x7f70e13df000 (gdb) p T_NAME((LWLock *) lock) $7 = 0x84cee1 WALInsertLocks (gdb) p T_ID((LWLock *) lock) $8 = 6 (gdb) p *(lock) $1 = {mutex = 0 '\000', tranche = 1, state = {value = 1627389952}, nwaiters = {value = 2}, waiters = {head = {prev = 0x7f70e9e29e58, next = 0x7f70e9e2a140}}, owner = 0x7f70e9e2d260} (gdb) p *(lock-owner) $5 = {links = {prev = 0x0, next = 0x0}, sem = {semId = 539754537, semNum = 0}, waitStatus = 0, procLatch = {is_set = 0, is_shared = 1 '\001', owner_pid = 24820}, lxid = 0, pid = 24820, pgprocno = 112, backendId = -1, databaseId = 0, roleId = 0, recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 0 '\000', lwWaitLink = { prev = 0x7f70e13df090, next = 0x7f70e13df090}, waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0}, myProcLocks = {{prev = 0x7f70e9e2d2f0, next = 0x7f70e9e2d2f0}, {prev = 0x7f70e9e2d300, next = 0x7f70e9e2d300}, {prev = 0x7f70e9e2d310, next = 0x7f70e9e2d310}, { prev = 0x7f70e9e2d320, next = 0x7f70e9e2d320}, {prev = 0x7f70e9e2d330, next = 0x7f70e9e2d330}, {prev = 0x7f70e9e2d340, next = 0x7f70e9e2d340}, {prev = 0x7f70e9e2d350, next = 0x7f70e9e2d350}, {prev = 0x7f70e9e2d360, next = 0x7f70e9e2d360}, {prev = 0x7f70e9e2d370, next = 0x7f70e9e2d370}, {prev = 0x7f70e9e2d380, next = 0x7f70e9e2d380}, { prev = 0x7f70e9e2d390, next = 0x7f70e9e2d390}, {prev = 0x7f70e9e2d3a0, next = 0x7f70e9e2d3a0}, {prev = 0x7f70e9e2d3b0, next = 0x7f70e9e2d3b0}, {prev = 0x7f70e9e2d3c0, next = 0x7f70e9e2d3c0}, {prev = 0x7f70e9e2d3d0, next = 0x7f70e9e2d3d0}, {prev = 0x7f70e9e2d3e0, next = 0x7f70e9e2d3e0}}, subxids = {xids = {0 repeats 64 times}}, backendLock = 0x7f70e13dad00, fpLockBits = 0, fpRelId = {0 repeats 16 times}, fpVXIDLock = 0 '\000', fpLocalTransactionId = 0} #0 0x003dcb6eaf27 in semop () from /lib64/libc.so.6 #1 0x00671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at pg_sema.c:387 #2 0x006d79ac in LWLockWaitForVar (lock=0x7f70e13df000, valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406 #3 0x004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at xlog.c:1576 #4 0x004ee94b in AdvanceXLInsertBuffer (upto=41594920960, opportunistic=value optimized out) at xlog.c:1888 #5 0x004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669 #6 0x004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1313 #7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1008 #8 0x004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at xloginsert.c:453 #9 0x0047e210 in ginPlaceToPage (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=value optimized out, updateblkno=value optimized out, childbuf=0, buildStats=0x0) at ginbtree.c:418 #10 0x0047f50d in ginInsertValue (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at ginbtree.c:748 #11 0x00475deb in ginEntryInsert (ginstate=0x7fffb33cc470, attnum=5544, key=1, category=value optimized out, items=0x7f70e10d9140, nitem=51, buildStats=0x0) at gininsert.c:234 #12 0x0048602c in ginInsertCleanup
Re: [HACKERS] LWLock deadlock and gdb advice
On 06/30/2015 07:37 PM, Alvaro Herrera wrote: Jeff Janes wrote: I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) Hmm, interesting -- pg_stat_statement is trying to re-do an operation that involves updating a GIN index, while WAL-logging for the original update is still ongoing, it seems. I don't think pg_stat_statement has anything to do with this. You can see from the backtrace that pg_stat_statement is enabled, as the call went through the pgss_ExecutorRun executor hook, but that's all. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
Jeff Janes wrote: I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) Hmm, interesting -- pg_stat_statement is trying to re-do an operation that involves updating a GIN index, while WAL-logging for the original update is still ongoing, it seems. #0 0x003dcb6eaf27 in semop () from /lib64/libc.so.6 #1 0x00671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at pg_sema.c:387 #2 0x006d79ac in LWLockWaitForVar (lock=0x7f70e13df000, valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406 #3 0x004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at xlog.c:1576 #4 0x004ee94b in AdvanceXLInsertBuffer (upto=41594920960, opportunistic=value optimized out) at xlog.c:1888 #5 0x004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669 #6 0x004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1313 #7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1008 #8 0x004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at xloginsert.c:453 #9 0x0047e210 in ginPlaceToPage (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=value optimized out, updateblkno=value optimized out, childbuf=0, buildStats=0x0) at ginbtree.c:418 #10 0x0047f50d in ginInsertValue (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at ginbtree.c:748 #11 0x00475deb in ginEntryInsert (ginstate=0x7fffb33cc470, attnum=5544, key=1, category=value optimized out, items=0x7f70e10d9140, nitem=51, buildStats=0x0) at gininsert.c:234 #12 0x0048602c in ginInsertCleanup (ginstate=0x7fffb33cc470, vac_delay=value optimized out, stats=0x0) at ginfast.c:843 #13 0x004871b9 in ginHeapTupleFastInsert (ginstate=0x7fffb33cc470, collector=value optimized out) at ginfast.c:436 #14 0x0047625a in gininsert (fcinfo=value optimized out) at gininsert.c:531 #15 0x007dc483 in FunctionCall6Coll (flinfo=value optimized out, collation=value optimized out, arg1=value optimized out, arg2=value optimized out, arg3=value optimized out, arg4=value optimized out, arg5=140122789534360, arg6=0) at fmgr.c:1436 #16 0x004b2247 in index_insert (indexRelation=0x7f70e1190e60, values=0x7fffb33cef50, isnull=0x7fffb33cf050 , heap_t_ctid=0x1a4a794, heapRelation=0x7f70e1185a98, checkUnique=UNIQUE_CHECK_NO) at indexam.c:226 #17 0x005d2e67 in ExecInsertIndexTuples (slot=0x1a497f0, tupleid=0x1a4a794, estate=0x1a40768, noDupErr=0 '\000', specConflict=0x0, arbiterIndexes=0x0) at execIndexing.c:384 #18 0x005f0161 in ExecUpdate (tupleid=0x7fffb33cf250, oldtuple=0x0, slot=0x1a497f0, planSlot=0x1a42498, epqstate=0x1a40a70, estate=0x1a40768, canSetTag=1 '\001') at nodeModifyTable.c:978 #19 0x005f0b2a in ExecModifyTable (node=0x1a409d0) at nodeModifyTable.c:1436 #20 0x005d6cc8 in ExecProcNode (node=0x1a409d0) at execProcnode.c:389 #21 0x005d5402 in ExecutePlan (queryDesc=0x1a2b220, direction=value optimized out, count=0) at execMain.c:1549 #22 standard_ExecutorRun (queryDesc=0x1a2b220, direction=value optimized out, count=0) at execMain.c:337 #23 0x7f70ea39af3b in pgss_ExecutorRun (queryDesc=0x1a2b220, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:881 #24 0x006ea38f in ProcessQuery (plan=0x1a2b3d8, sourceText=0x1a2af68 update foo set count=count+1 where text_array @ $1, params=0x1a2afe8, dest=value optimized out, completionTag=0x7fffb33cf660 ) at pquery.c:185 #25 0x006ea5ec in PortalRunMulti (portal=0x19a6128, isTopLevel=1 '\001', dest=0xc55020, altdest=0xc55020, completionTag=0x7fffb33cf660 ) at pquery.c:1279 #26 0x006eadb3 in PortalRun (portal=0x19a6128, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1990630, altdest=0x1990630, completionTag=0x7fffb33cf660 ) at pquery.c:816 #27 0x006e6ffb in exec_execute_message (portal_name=0x1990218 , max_rows=9223372036854775807) at postgres.c:1988 #28 0x006e8c15 in PostgresMain (argc=value optimized out, argv=value optimized out, dbname=0x19a3210 jjanes, username=value optimized out) at postgres.c:4088 #29 0x006855dd in BackendRun (argc=value optimized out, argv=value optimized out) at postmaster.c:4159 #30 BackendStartup (argc=value optimized out, argv=value optimized out) at postmaster.c:3835 #31 ServerLoop (argc=value optimized out, argv=value optimized out) at postmaster.c:1609 #32 PostmasterMain (argc=value optimized out, argv=value optimized out) at postmaster.c:1254 #33 0x00610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development,
Re: [HACKERS] LWLock deadlock and gdb advice
On 06/30/2015 07:05 PM, Jeff Janes wrote: On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. I hadn't planned on running into this problem, so had not compiled accordingly. I thought LOCK_DEBUG would produce too much output, but now I see it doesn't print anything unless Trace_lwlocks is on (but it still makes more info available to gdb, as Amit mentioned), so I think that should be OK. Since I messed up the gdb session causing the postmaster to SIGKILL all the children and destroy the evidence, I'll repeat the run compiled with LOCK_DEBUG and see what it looks like in the morning. I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6 and is blocked on 7. I'm not sure where to go from here. The user backend 24841 is waiting in a LWLockWaitForVar() call, on WALInsertLock 6, and oldval==0. The checkpointer is holding WALInsertLock 6, but it has set the insertingat value to PG_UINT64_MAX. The LWLockWaitForVar() call should not be blocked on that, because 0 != PG_UINT64_MAX. Looks like the LWLock-scalability patch that made LWlock acquisiton to use atomic ops instead of a spinlock broke this. LWLockWaitForVar() is supposed to guarantee that it always wakes up from sleep, when the variable's value changes. But there is now a race condition in LWLockWaitForVar that wasn't there in 9.4: if (mustwait) { /* * Perform comparison using spinlock as we can't rely on atomic 64 * bit reads/stores. */ #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */ value = *valptr; if (value != oldval) { result = false; mustwait = false; *newval = value; } else mustwait = true; SpinLockRelease(lock-mutex); } else mustwait = false; if (!mustwait) break; /* the lock was free or value didn't match */ /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. NB: We're using nearly * the same twice-in-a-row lock acquisition protocol as * LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false); After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock while it updates the Var. That's not cool either :-(. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote: /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */ value = *valptr; if (value != oldval) { result = false; mustwait = false; *newval = value; } else mustwait = true; SpinLockRelease(lock-mutex); } else mustwait = false; if (!mustwait) break; /* the lock was free or value didn't match */ /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. NB: We're using nearly * the same twice-in-a-row lock acquisition protocol as * LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false); After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock while it updates the Var. That's not cool either :-(. Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but it really isn't. This var stuff isn't fitting all that well into the lwlock code. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 06/30/2015 10:09 PM, Andres Freund wrote: On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote: /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */ value = *valptr; if (value != oldval) { result = false; mustwait = false; *newval = value; } else mustwait = true; SpinLockRelease(lock-mutex); } else mustwait = false; if (!mustwait) break; /* the lock was free or value didn't match */ /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. NB: We're using nearly * the same twice-in-a-row lock acquisition protocol as * LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false); After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Yeah. It's probably more efficient to not release the spinlock between checking the value and LWLockQueueSelf() though. Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock while it updates the Var. That's not cool either :-(. Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but it really isn't. This var stuff isn't fitting all that well into the lwlock code. I'll take a stab at fixing this tomorrow. I take the blame for not documenting the semantics of LWLockAcquireWithVar() and friends well enough... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote: Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Yeah. It's probably more efficient to not release the spinlock between checking the value and LWLockQueueSelf() though. Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that way in a bunch of callsites... So that'd be harder. Additionally I'm planning to get rid of the spinlocks around queuing (they show up as bottlenecks in contended workloads), so it seems more future proof not to assume that either way. On top of that I think we should, when available (or using the same type of fallback as for 32bit?), use 64 bit atomics for the var anyway. I'll take a stab at fixing this tomorrow. Thanks! Do you mean both or just the second issue? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. I hadn't planned on running into this problem, so had not compiled accordingly. I thought LOCK_DEBUG would produce too much output, but now I see it doesn't print anything unless Trace_lwlocks is on (but it still makes more info available to gdb, as Amit mentioned), so I think that should be OK. Since I messed up the gdb session causing the postmaster to SIGKILL all the children and destroy the evidence, I'll repeat the run compiled with LOCK_DEBUG and see what it looks like in the morning. Thanks, Jeff
[HACKERS] LWLock deadlock and gdb advice
I have a 9.5alpha1 cluster which is locked up. All the user back ends seem to be waiting on semop, eventually on WALInsertLockAcquire. Is there a way to use gdb to figure out who holds the lock they are waiting for? It is compiled with both debug and cassert. I am hoping someone can give me recipe similar to the one for Examining backend memory use in https://wiki.postgresql.org/wiki/Developer_FAQ example backtrace: #0 0x003dcb6eaf27 in semop () from /lib64/libc.so.6 #1 0x0067190f in PGSemaphoreLock (sema=0x7f28a98b9468) at pg_sema.c:387 #2 0x006d4b0c in LWLockAcquireCommon (l=0x7f28a0e6d600, valptr=0x7f28a0e6d618, val=0) at lwlock.c:1042 #3 LWLockAcquireWithVar (l=0x7f28a0e6d600, valptr=0x7f28a0e6d618, val=0) at lwlock.c:916 #4 0x004f3c4f in WALInsertLockAcquire (rdata=0xc5c130, fpw_lsn=0) at xlog.c:1411 #5 XLogInsertRecord (rdata=0xc5c130, fpw_lsn=0) at xlog.c:948 #6 0x004f7aac in XLogInsert (rmid=13 '\r', info=32 ' ') at xloginsert.c:453 #7 0x0047e0b0 in ginPlaceToPage (btree=0x7fffca9263e0, stack=0x2c94ff8, insertdata=value optimized out, updateblkno=value optimized out, childbuf=0, buildStats=0x0) at ginbtree.c:418 #8 0x0047f3ad in ginInsertValue (btree=0x7fffca9263e0, stack=0x2c94ff8, insertdata=0x7fffca926460, buildStats=0x0) at ginbtree.c:748 #9 0x00475c8b in ginEntryInsert (ginstate=0x7fffca9267e0, attnum=29784, key=1, category=value optimized out, items=0x7f28a0c7b458, nitem=47, buildStats=0x0) at gininsert.c:234 #10 0x00485ecc in ginInsertCleanup (ginstate=0x7fffca9267e0, vac_delay=value optimized out, stats=0x0) at ginfast.c:843 #11 0x00487059 in ginHeapTupleFastInsert (ginstate=0x7fffca9267e0, collector=value optimized out) at ginfast.c:436 #12 0x004760fa in gininsert (fcinfo=value optimized out) at gininsert.c:531 Cheers, Jeff
Re: [HACKERS] LWLock deadlock and gdb advice
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. Otherwise, I suggest dereferencing the l argument to LWLockAcquireWithVar() or something -- set the frame to 3 in your example backtrace, using f 3. Then, p *l. You'll get the tranche id there, and so on. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On Tue, Jun 30, 2015 at 6:25 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. If you define LOCK_DEBUG, then you can check owner of the lock [1], which will tell you about the Exclusive owner of that lock and can help you in debugging the problem. [1] #ifdef LOCK_DEBUG struct PGPROC *owner; /* last exlusive owner of the lock */ #endif } LWLock; With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com