Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-05 Thread Jeff Janes
On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund and...@anarazel.de wrote:

 On 2015-08-02 17:04:07 +0200, Andres Freund wrote:
  I've attached a version of the patch that should address Heikki's
  concern. It imo also improves the API and increases debuggability by not
  having stale variable values in the variables anymore. (also attached is
  a minor optimization that Heikki has noticed)


I've run these (from your git commit) for over 60 hours with no problem, so
I think it's solved.

If there are still problems, hopefully they will come out in other tests.
I don't know why the gin test was efficient at provoking the problem while
none of the other ones (btree-upsert, gist, brin, btree-foreign key,
btree-prepare transaction) I've played around with.  Perhaps it is just due
to the amount of WAL which gin generates.

Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-05 Thread Andres Freund
On 2015-08-05 11:22:55 -0700, Jeff Janes wrote:
 On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund and...@anarazel.de wrote:
 
  On 2015-08-02 17:04:07 +0200, Andres Freund wrote:
   I've attached a version of the patch that should address Heikki's
   concern. It imo also improves the API and increases debuggability by not
   having stale variable values in the variables anymore. (also attached is
   a minor optimization that Heikki has noticed)
 
 
 I've run these (from your git commit) for over 60 hours with no problem, so
 I think it's solved.

Cool! Thanks for the testing.

 If there are still problems, hopefully they will come out in other tests.
 I don't know why the gin test was efficient at provoking the problem while
 none of the other ones (btree-upsert, gist, brin, btree-foreign key,
 btree-prepare transaction) I've played around with.  Perhaps it is just due
 to the amount of WAL which gin generates.

I'm not sure either... I've been able to reproduce a version of the
issue by judiciously sprinkling pg_usleep()'s over the code.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Andres Freund
On 2015-08-02 12:33:06 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  I plan to commit the patch tomorrow, so it's included in alpha2.
 
 Please try to commit anything you want in alpha2 *today*.  I'd
 prefer to see some successful buildfarm cycles on such patches
 before we wrap.

Ok, will do that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 I plan to commit the patch tomorrow, so it's included in alpha2.

Please try to commit anything you want in alpha2 *today*.  I'd
prefer to see some successful buildfarm cycles on such patches
before we wrap.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Andres Freund
Hi Jeff, Heikki,

On 2015-07-31 09:48:28 -0700, Jeff Janes wrote:
 I had run it for 24 hours, while it usually took less than 8 hours to look
 up before.  I did see it within a few minutes one time when I checked out a
 new HEAD and then forgot to re-apply your or Heikki's patch.
 
 But now I've got the same situation again, after 15 hours, with your
 patch.  This is probably all down to luck.  The only differences that I can
 think of is that I advanced the base to e8e86fbc8b3619da54c, and turned on
 JJ_vac and set log_autovacuum_min_duration=0.

It's quite possible that you hit the remaining race-condition that
Heikki was talking about. So that'd make it actually likely to be hit
slightly later - but as you say this is just a game of chance.

I've attached a version of the patch that should address Heikki's
concern. It imo also improves the API and increases debuggability by not
having stale variable values in the variables anymore. (also attached is
a minor optimization that Heikki has noticed)

I plan to commit the patch tomorrow, so it's included in alpha2.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Andres Freund
On 2015-08-02 17:04:07 +0200, Andres Freund wrote:
 I've attached a version of the patch that should address Heikki's
 concern. It imo also improves the API and increases debuggability by not
 having stale variable values in the variables anymore. (also attached is
 a minor optimization that Heikki has noticed)

...
From 17b4e0692ebd91a92448b0abb8cf4438a93a29d6 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 31 Jul 2015 20:20:43 +0200
Subject: [PATCH 1/2] Fix issues around the variable support in the lwlock
 infrastructure.

The lwlock scalability work introduced two race conditions into the
lwlock variable support provided for xlog.c. First, and harmlessly on
most platforms, it set/read the variable without the spinlock in some
places. Secondly, due to the removal of the spinlock, it was possible
that a backend missed changes to the variable's state if it changed in
the wrong moment because checking the lock's state, the variable's state
and the queuing are not protected by a single spinlock acquisition
anymore.

To fix first move resetting the variable's from LWLockAcquireWithVar to
WALInsertLockRelease, via a new function LWLockReleaseClearVar. That
prevents issues around waiting for a variable's value to change when a
new locker has acquired the lock, but not yet set the value. Secondly
re-check that the variable hasn't changed after enqueing, that prevents
the issue that the lock has been released and already re-acquired by the
time the woken up backend checks for the lock's state.

Reported-By: Jeff Janes
Analyzed-By: Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: 5592db35.2060...@iki.fi
Backpatch: 9.5, where the lwlock scalability went in
---
 src/backend/access/transam/xlog.c |  34 ---
 src/backend/storage/lmgr/lwlock.c | 193 +-
 src/include/storage/lwlock.h  |   2 +-
 3 files changed, 129 insertions(+), 100 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1dd31b3..939813e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1408,9 +1408,7 @@ WALInsertLockAcquire(void)
 	 * The insertingAt value is initially set to 0, as we don't know our
 	 * insert location yet.
 	 */
-	immed = LWLockAcquireWithVar(WALInsertLocks[MyLockNo].l.lock,
- WALInsertLocks[MyLockNo].l.insertingAt,
- 0);
+	immed = LWLockAcquire(WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 	if (!immed)
 	{
 		/*
@@ -1435,26 +1433,28 @@ WALInsertLockAcquireExclusive(void)
 	int			i;
 
 	/*
-	 * When holding all the locks, we only update the last lock's insertingAt
-	 * indicator.  The others are set to 0x, which is higher
-	 * than any real XLogRecPtr value, to make sure that no-one blocks waiting
-	 * on those.
+	 * When holding all the locks, all but the last lock's insertingAt
+	 * indicator is set to 0x, which is higher than any real
+	 * XLogRecPtr value, to make sure that no-one blocks waiting on those.
 	 */
 	for (i = 0; i  NUM_XLOGINSERT_LOCKS - 1; i++)
 	{
-		LWLockAcquireWithVar(WALInsertLocks[i].l.lock,
-			 WALInsertLocks[i].l.insertingAt,
-			 PG_UINT64_MAX);
+		LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
+		LWLockUpdateVar(WALInsertLocks[i].l.lock,
+		WALInsertLocks[i].l.insertingAt,
+		PG_UINT64_MAX);
 	}
-	LWLockAcquireWithVar(WALInsertLocks[i].l.lock,
-		 WALInsertLocks[i].l.insertingAt,
-		 0);
+	/* Variable value reset to 0 at release */
+	LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
 
 	holdingAllLocks = true;
 }
 
 /*
  * Release our insertion lock (or locks, if we're holding them all).
+ *
+ * NB: Reset all variables to 0, so they cause LWLockWaitForVar to block the
+ * next time the lock is acquired.
  */
 static void
 WALInsertLockRelease(void)
@@ -1464,13 +1464,17 @@ WALInsertLockRelease(void)
 		int			i;
 
 		for (i = 0; i  NUM_XLOGINSERT_LOCKS; i++)
-			LWLockRelease(WALInsertLocks[i].l.lock);
+			LWLockReleaseClearVar(WALInsertLocks[i].l.lock,
+  WALInsertLocks[i].l.insertingAt,
+  0);
 
 		holdingAllLocks = false;
 	}
 	else
 	{
-		LWLockRelease(WALInsertLocks[MyLockNo].l.lock);
+		LWLockReleaseClearVar(WALInsertLocks[MyLockNo].l.lock,
+			  WALInsertLocks[MyLockNo].l.insertingAt,
+			  0);
 	}
 }
 
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5566d1..ae03eb1 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -10,13 +10,15 @@
  * locking should be done with the full lock manager --- which depends on
  * LWLocks to protect its shared state.
  *
- * In addition to exclusive and shared modes, lightweight locks can be used
- * to wait until a variable changes value.  The variable is initially set
- * when the lock is acquired with LWLockAcquireWithVar, and can be updated
+ * In addition to exclusive and shared modes, lightweight locks can 

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-31 Thread Jeff Janes
On Thu, Jul 30, 2015 at 8:22 PM, Andres Freund and...@anarazel.de wrote:

 On 2015-07-30 09:03:01 -0700, Jeff Janes wrote:
  On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de
 wrote:
   What do you think about something roughly like the attached?
  
 
  I've not evaluated the code, but applying it does solve the problem I was
  seeing.

 Cool, thanks for testing! How long did you run the test and how long did
 it, on average, previously take for the issue to occur?


I had run it for 24 hours, while it usually took less than 8 hours to look
up before.  I did see it within a few minutes one time when I checked out a
new HEAD and then forgot to re-apply your or Heikki's patch.

But now I've got the same situation again, after 15 hours, with your
patch.  This is probably all down to luck.  The only differences that I can
think of is that I advanced the base to e8e86fbc8b3619da54c, and turned on
JJ_vac and set log_autovacuum_min_duration=0.


checkpointer, 31516
(gdb) bt
#0  0x003dcb6eaf27 in semop () from /lib64/libc.so.6
#1  0x006481df in PGSemaphoreLock (sema=0x7f84ccd3c5f8) at
pg_sema.c:387
#2  0x006a5754 in LWLockAcquireCommon (l=0x7f84c42edf80,
valptr=0x7f84c42edfa8, val=18446744073709551615) at lwlock.c:1041
#3  LWLockAcquireWithVar (l=0x7f84c42edf80, valptr=0x7f84c42edfa8,
val=18446744073709551615) at lwlock.c:915
#4  0x004cbd86 in WALInsertLockAcquireExclusive () at xlog.c:1445
#5  0x004d4195 in CreateCheckPoint (flags=64) at xlog.c:8181
#6  0x0064ea53 in CheckpointerMain () at checkpointer.c:493
#7  0x004e328a in AuxiliaryProcessMain (argc=2,
argv=0x7fff6eed1f50) at bootstrap.c:428
#8  0x006563f7 in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:5167
#9  0x0065884a in reaper (postgres_signal_arg=value optimized
out) at postmaster.c:2670

(gdb) f 2
(gdb) p *(lock-owner)
$1 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x7f84ccd27c88,
sem = {semId = 1480359975, semNum = 12}, waitStatus = 0, procLatch =
{is_set = 1,
is_shared = 1 '\001', owner_pid = 31540}, lxid = 3209, pid = 31540,
pgprocno = 92, backendId = 12, databaseId = 16384, roleId = 10,
  recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 2
'\002', lwWaitLink = {prev = 0x7f84c42edf10, next = 0x7f84ccd38e60},
  waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0,
waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0},
  myProcLocks = {{prev = 0x7f84ccd38bb8, next = 0x7f84ccd38bb8}, {prev =
0x7f84ccd38bc8, next = 0x7f84ccd38bc8}, {prev = 0x7f84ccd38bd8,
  next = 0x7f84ccd38bd8}, {prev = 0x7f84ccd38be8, next =
0x7f84ccd38be8}, {prev = 0x7f84ccd38bf8, next = 0x7f84ccd38bf8}, {prev =
0x7f84ccd38c08,
  next = 0x7f84ccd38c08}, {prev = 0x7f84ccd38c18, next =
0x7f84ccd38c18}, {prev = 0x7f84ccd38c28, next = 0x7f84ccd38c28}, {prev =
0x7f84ccd38c38,
  next = 0x7f84ccd38c38}, {prev = 0x7f84ccd38c48, next =
0x7f84ccd38c48}, {prev = 0x7f84ccd38c58, next = 0x7f84ccd38c58}, {prev =
0x7f84ccd38c68,
  next = 0x7f84ccd38c68}, {prev = 0x7f84ccd38c78, next =
0x7f84ccd38c78}, {prev = 0x7f84ccd38c88, next = 0x7f84ccd38c88}, {prev =
0x7f84ccb0ceb0,
  next = 0x7f84ccb0ceb0}, {prev = 0x7f84ccd38ca8, next =
0x7f84ccd38ca8}}, subxids = {xids = {0 repeats 64 times}}, backendLock =
0x7f84c42e9800,
  fpLockBits = 160528697655296, fpRelId = {0, 0, 0, 0, 0, 0, 0, 0, 0, 2679,
2610, 16413, 16411, 16415, 16414, 16408}, fpVXIDLock = 1 '\001',
  fpLocalTransactionId = 3209}


backend 31540:

#0  0x003dcb6eaf27 in semop () from /lib64/libc.so.6
#1  0x006481df in PGSemaphoreLock (sema=0x7f84ccd38b38) at
pg_sema.c:387
#2  0x006a53d4 in LWLockWaitForVar (lock=0x7f84c42edf00,
valptr=0x7f84c42edf28, oldval=0, newval=0x7fff6eeceaf8) at lwlock.c:1438
#3  0x004d3276 in WaitXLogInsertionsToFinish (upto=183341449216) at
xlog.c:1576
#4  0x004d3b33 in AdvanceXLInsertBuffer (upto=183345636352,
opportunistic=value optimized out) at xlog.c:1888
#5  0x004d8df2 in GetXLogBuffer (ptr=183345636352) at xlog.c:1669
#6  0x004d8fea in CopyXLogRecordToWAL (rdata=0xbd2410,
fpw_lsn=value optimized out) at xlog.c:1276
#7  XLogInsertRecord (rdata=0xbd2410, fpw_lsn=value optimized out) at
xlog.c:1008
#8  0x004dc72c in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#9  0x00478f41 in ginPlaceToPage (btree=0x7fff6eecee20,
stack=0x1be11e0, insertdata=value optimized out, updateblkno=value
optimized out,
childbuf=0, buildStats=0x0) at ginbtree.c:418
#10 0x004798a3 in ginInsertValue (btree=0x7fff6eecee20,
stack=0x1be11e0, insertdata=0x7fff6eeceea0, buildStats=0x0) at
ginbtree.c:748
#11 0x004734aa in ginEntryInsert (ginstate=0x7fff6eecf200,
attnum=39408, key=1, category=value optimized out, items=0x1c81c20,
nitem=54,
buildStats=0x0) at gininsert.c:234
#12 0x0047edbb in ginInsertCleanup (ginstate=0x7fff6eecf200,
vac_delay=value 

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-31 Thread Heikki Linnakangas

On 07/30/2015 09:14 PM, Andres Freund wrote:

On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote:

In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held,
until it has updated the variable. And LWLockWaitForVar() holds the spinlock
when it checks that the lock is held and that the variable's value matches.
So it cannot happen on 9.4.


The first paragraph talked about the same value, but that was just
referring to it not yet having been cleared i think...


To reiterate, with 9.5, it's possible that a backend is sleeping in
LWLockWaitForVar(oldvar=123), even though the lock is currently held by
another backend with value 124. That seems wrong, or surprising at the very
least.


With my patch that can't really happen that way though? The value is
re-checked after queuing. If it has changed by then we're done. And if
it hasn't yet changed we're guaranteed to be woken up once it's being
changed?


Ok, let me try explaining it again. With your patch, 
LWLockAcquireWithVar looks like this (simplified, assuming the lock is 
free):


1. Set LW_VAL_EXCLUSIVE
3. (Acquire spinlock) Set *valptr = val (Release spinlock)

LWLockWaitForVar looks like this:

1. Check if lock is free
2. (Acquire Spinlock) Read *valptr, compare with oldval (Release spinlock)
3. LWLockQueueSelf
4. Re-check that *valptr is still equal to oldval
5. sleep.

This is the race condition:

Backend A   Backend B
-   -
LWLockAcquireWithVar(123)
  Set LW_VAL_EXCLUSIVE
  Set *valptr = 123
LWLockWaitForVar(123)
  See that the lock is held
  Read *valptr, it matches
  LWLockQueueSelf
LWLockRelease()
LWLockAcquireWithVar(124)
  Set LW_VAL_EXCLUSIVE
  wakes up
  See that the lock is still/again held
  Read *valptr, it's still 123
  LWLockQueueSelf
  Re-check that *valptr is still 123.
  go to sleep.
  Set *valptr = 124


Now, Backend B's LWLockWaitForVar(123) call is sleeping, even though the 
lock was free'd and reacquired with different value, 124. It won't wake 
up until A updates the value or releases the lock again.


This was not possible in 9.4. It was possible in 9.4 too when the value 
was same in both LWLockAcquireWithVar() calls, and I think that's 
acceptable, but not when the values differ.



I generaly don't mind adding some sort of flag clearing or such, but I'd
rather not have it in the retry loop in the general LWLockAttemptLock
path - I found that very small amounts of instructions in there have a
measurable impact.


I doubt clearing a bit would matter there, although you might have a 
better instinct on that...


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Heikki Linnakangas

On 07/29/2015 09:35 PM, Andres Freund wrote:

On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote:

Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on
it. The lock holder releases the lock, and wakes up A. But before A wakes up
and sees that the lock is free, another backend acquires the lock again. It
runs LWLockAcquireWithVar to the point just before setting the variable's
value. Now A wakes up, sees that the lock is still (or again) held, and that
the variable's value still matches the old one, and goes back to sleep. The
new lock holder won't wake it up until it updates the value again, or to
releases the lock.


I'm not sure whether this observation is about my patch or the general
lwlock variable mechanism. In my opinion that behaviour exists today
both in 9.4 and 9.5.


In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held, 
until it has updated the variable. And LWLockWaitForVar() holds the 
spinlock when it checks that the lock is held and that the variable's 
value matches. So it cannot happen on 9.4.


To reiterate, with 9.5, it's possible that a backend is sleeping in 
LWLockWaitForVar(oldvar=123), even though the lock is currently held by 
another backend with value 124. That seems wrong, or surprising at the 
very least.



But I think that's fine because that race seems pretty
fundamental. After all, you could have called LWLockWaitForVar() just
after the second locker had set the variable to the same value.


I'm not talking about setting it to the same value. I'm talking about 
setting it to a different value. (I talked about setting it to the same 
value later in the email, and I agree that's a pretty fundamental 
problem and exists with 9.4 as well).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Andres Freund
On 2015-07-30 09:03:01 -0700, Jeff Janes wrote:
 On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote:
  What do you think about something roughly like the attached?
 
 
 I've not evaluated the code, but applying it does solve the problem I was
 seeing.

Cool, thanks for testing! How long did you run the test and how long did
it, on average, previously take for the issue to occur?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Jeff Janes
On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote:

 On 2015-07-29 14:22:23 +0200, Andres Freund wrote:
  On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
   Ah, ok, that should work, as long as you also re-check the variable's
 value
   after queueing. Want to write the patch, or should I?
 
  I'll try. Shouldn't be too hard.

 What do you think about something roughly like the attached?


I've not evaluated the code, but applying it does solve the problem I was
seeing.

Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Andres Freund
On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote:
 In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held,
 until it has updated the variable. And LWLockWaitForVar() holds the spinlock
 when it checks that the lock is held and that the variable's value matches.
 So it cannot happen on 9.4.

The first paragraph talked about the same value, but that was just
referring to it not yet having been cleared i think...

 To reiterate, with 9.5, it's possible that a backend is sleeping in
 LWLockWaitForVar(oldvar=123), even though the lock is currently held by
 another backend with value 124. That seems wrong, or surprising at the very
 least.

With my patch that can't really happen that way though? The value is
re-checked after queuing. If it has changed by then we're done. And if
it hasn't yet changed we're guaranteed to be woken up once it's being
changed?


I generaly don't mind adding some sort of flag clearing or such, but I'd
rather not have it in the retry loop in the general LWLockAttemptLock
path - I found that very small amounts of instructions in there have a
measurable impact.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Andres Freund
On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote:
 Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on
 it. The lock holder releases the lock, and wakes up A. But before A wakes up
 and sees that the lock is free, another backend acquires the lock again. It
 runs LWLockAcquireWithVar to the point just before setting the variable's
 value. Now A wakes up, sees that the lock is still (or again) held, and that
 the variable's value still matches the old one, and goes back to sleep. The
 new lock holder won't wake it up until it updates the value again, or to
 releases the lock.

I'm not sure whether this observation is about my patch or the general
lwlock variable mechanism. In my opinion that behaviour exists today
both in 9.4 and 9.5.

But I think that's fine because that race seems pretty
fundamental. After all, you could have called LWLockWaitForVar() just
after the second locker had set the variable to the same value.


 You didn't like the new LW_FLAG_VAR_SET flag and the API changes I
 proposed?

I do slightly dislike the additional bit of math in AttemptLock. But
what I really disliked about the patch is the reliance on holding the
spinlock over longer times and conditionally acquiring spinlocks.  I
didn't see a need for the flags change to fix the issue at hand, that's
why I didn't incorporate it. I'm not fundamentally against it.

 Either way, there is a race condition that if the new lock holder sets the
 variable to the *same* value as before, the old waiter won't necessarily
 wake up even though the lock was free or had a different value in between.
 But that's easier to explain and understand than the fact that the value set
 by LWLockAcquireWithVar() might not be seen by a waiter, until you release
 the lock or update it again.

I can't really se a use for the API that'd allow that and care about the
waits. Because quite fundamentally you could just started waiting after
the variable was set to the value at the same time.

 Another idea is to have LWLockAcquire check the wait queue after setting the
 variable, and wake up anyone who's already queued up.

Ugh, not a fan of that.

 The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK in
 principle. You'll want to change LWLockConflictsWithVar() so that the
 spinlock is held only over the value = *valptr line, though; the other
 stuff just modifies local variables and don't need to be protected by the
 spinlock.

good point.,

 Also, when you enter LWLockWaitForVar(), you're checking if the
 lock is held twice in a row; first at the top of the function, and again
 inside LWLockConflictsWithVar. You could just remove the quick test at the
 top.

Yea, I was thinking about removing it. The first check was there
previously, so I left it in place. We do execute a bit more code once
we've disabled interrupts (extraWaits, re-enabling interrupts). I don't
think it'll matter much, but it seemed like an independent thing.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 09:23:32 -0700, Jeff Janes wrote:
 On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 I've reproduced it again against commit b2ed8edeecd715c8a23ae462.
 
 It took 5 hours on a 8 core Intel(R) Xeon(R) CPU E5-2650.
 
 I also reproduced it in 3 hours on the same machine with both JJ_torn_page
 and JJ_xid set to zero (i.e. turned off, no induced crashes), so the
 fault-injection patch should not be necessary to get the issue..

Hm, that's a single socket or dual socket E5-2650 system? That CPU
itself has 8 cores, and can work in a dual socket system, that's why I
ask.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Jeff Janes
On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund and...@anarazel.de wrote:

 Hi,

 On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:
  After applying this patch to commit fdf28853ae6a397497b79f, it has
 survived
  testing long enough to convince that this fixes the problem.

 What was the actual workload breaking with the bug? I ran a small
 variety and I couldn't reproduce it yet. I'm not saying there's no bug,
 I just would like to be able to test my version of the fixes...


 It was the torn-page fault-injection code here:


 https://drive.google.com/open?id=0Bzqrh1SO9FcEfkxFb05uQnJ2cWg0MEpmOXlhbFdyNEItNmpuek1zU2gySGF3Vk1oYXNNLUE

 It is not a minimal set, I don't know if all parts of this are necessary
 to rerproduce it.  The whole crash-recovery cycling might not even be
 important.



I've reproduced it again against commit b2ed8edeecd715c8a23ae462.

It took 5 hours on a 8 core Intel(R) Xeon(R) CPU E5-2650.

I also reproduced it in 3 hours on the same machine with both JJ_torn_page
and JJ_xid set to zero (i.e. turned off, no induced crashes), so the
fault-injection patch should not be necessary to get the issue..


Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Jeff Janes
On Wed, Jul 29, 2015 at 9:26 AM, Andres Freund and...@anarazel.de wrote:

 On 2015-07-29 09:23:32 -0700, Jeff Janes wrote:
  On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com
 wrote:
  I've reproduced it again against commit b2ed8edeecd715c8a23ae462.
 
  It took 5 hours on a 8 core Intel(R) Xeon(R) CPU E5-2650.
 
  I also reproduced it in 3 hours on the same machine with both
 JJ_torn_page
  and JJ_xid set to zero (i.e. turned off, no induced crashes), so the
  fault-injection patch should not be necessary to get the issue..

 Hm, that's a single socket or dual socket E5-2650 system? That CPU
 itself has 8 cores, and can work in a dual socket system, that's why I
 ask.


It is a virtual machine, and I think a property of the VM software is that
any given virtual machine can only run on one socket of the underlying
hardware.  The real machine is dual socket, though.

Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Heikki Linnakangas

On 07/29/2015 04:10 PM, Andres Freund wrote:

On 2015-07-29 14:22:23 +0200, Andres Freund wrote:

On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:

Ah, ok, that should work, as long as you also re-check the variable's value
after queueing. Want to write the patch, or should I?


I'll try. Shouldn't be too hard.


What do you think about something roughly like the attached?


Hmm. Imagine this:

Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting 
on it. The lock holder releases the lock, and wakes up A. But before A 
wakes up and sees that the lock is free, another backend acquires the 
lock again. It runs LWLockAcquireWithVar to the point just before 
setting the variable's value. Now A wakes up, sees that the lock is 
still (or again) held, and that the variable's value still matches the 
old one, and goes back to sleep. The new lock holder won't wake it up 
until it updates the value again, or to releases the lock.


I think that's OK given the current usage of this in WALInsertLocks, but 
it's scary. At the very least you need comments to explain that race 
condition. You didn't like the new LW_FLAG_VAR_SET flag and the API 
changes I proposed? That eliminates the race condition.


Either way, there is a race condition that if the new lock holder sets 
the variable to the *same* value as before, the old waiter won't 
necessarily wake up even though the lock was free or had a different 
value in between. But that's easier to explain and understand than the 
fact that the value set by LWLockAcquireWithVar() might not be seen by a 
waiter, until you release the lock or update it again.


Another idea is to have LWLockAcquire check the wait queue after setting 
the variable, and wake up anyone who's already queued up. You could just 
call LWLockUpdateVar() from LWLockAcquireCommon to set the variable. I 
would prefer the approach from my previous patch more, though. That 
would avoid having to acquire the spinlock in LWLockAcquire altogether, 
and I actually like the API change from that independently from any 
correctness issues.


The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK 
in principle. You'll want to change LWLockConflictsWithVar() so that the 
spinlock is held only over the value = *valptr line, though; the other 
stuff just modifies local variables and don't need to be protected by 
the spinlock. Also, when you enter LWLockWaitForVar(), you're checking 
if the lock is held twice in a row; first at the top of the function, 
and again inside LWLockConflictsWithVar. You could just remove the 
quick test at the top.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Heikki Linnakangas

On 07/29/2015 02:39 PM, Andres Freund wrote:

On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote:

Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears the
LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you
want to set the variable immediately.

This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?


The prime thing I dislike about the patch is how long it now holds the
spinlock in WaitForVar. I don't understand why that's necessary? There's
no need to hold a spinlock until after the
mustwait = (pg_atomic_read_u32(lock-state)  LW_VAL_EXCLUSIVE) != 0;
unless I miss something?

In an earlier email you say:

After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.


But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?


If you release the spinlock before LWLockQueueSelf(), then no. It's 
possible that the backend you wanted to wait for updates the variable's 
value before you've queued up. Or even releases the lock, and it gets 
re-acquired by another backend, before you've queued up.


Or are you thinking of just moving the SpinLockRelease to after 
LWLockQueueSelf(), i.e. to the other side of the mustwait = ... line? 
That'd probably be ok.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
Hi,

Finally getting to this.

On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote:
 Previously, LWLockAcquireWithVar set the variable associated with the lock
 atomically with acquiring it. Before the lwlock-scalability changes, that
 was straightforward because you held the spinlock anyway, but it's a lot
 harder/expensive now. So I changed the way acquiring a lock with a variable
 works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
 the current lock holder has updated the variable. The LWLockAcquireWithVar
 function is gone - you now just use LWLockAcquire(), which always clears the
 LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you
 want to set the variable immediately.

 This passes make check, but I haven't done any testing beyond that. Does
 this look sane to you?

The prime thing I dislike about the patch is how long it now holds the
spinlock in WaitForVar. I don't understand why that's necessary? There's
no need to hold a spinlock until after the
   mustwait = (pg_atomic_read_u32(lock-state)  LW_VAL_EXCLUSIVE) != 0;
unless I miss something?

In an earlier email you say:
 After the spinlock is released above, but before the LWLockQueueSelf() call,
 it's possible that another backend comes in, acquires the lock, changes the
 variable's value, and releases the lock again. In 9.4, the spinlock was not
 released until the process was queued.

But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?

I'll try to reproduce the problem now. But I do wonder if it's possibly
just the missing spinlock during the update.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
 Ah, ok, that should work, as long as you also re-check the variable's value
 after queueing. Want to write the patch, or should I?

I'll try. Shouldn't be too hard.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 14:22:23 +0200, Andres Freund wrote:
 On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
  Ah, ok, that should work, as long as you also re-check the variable's value
  after queueing. Want to write the patch, or should I?
 
 I'll try. Shouldn't be too hard.

What do you think about something roughly like the attached?

- Andres
From 2879408d8da7714ab6af6238dfe1c8a535800af3 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 29 Jul 2015 15:06:54 +0200
Subject: [PATCH] Other way to fix the issues around the broken LWLock variable
 support.

---
 src/backend/storage/lmgr/lwlock.c | 105 +-
 1 file changed, 69 insertions(+), 36 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5566d1..c64f20d 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1066,7 +1066,15 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
 
 	/* If there's a variable associated with this lock, initialize it */
 	if (valptr)
+	{
+#ifdef LWLOCK_STATS
+		lwstats-spin_delay_count += SpinLockAcquire(lock-mutex);
+#else
+		SpinLockAcquire(lock-mutex);
+#endif
 		*valptr = val;
+		SpinLockRelease(lock-mutex);
+	}
 
 	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), T_ID(lock), mode);
 
@@ -1259,6 +1267,60 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 }
 
 /*
+ * Does the lwlock in its current state need to wait for the variable value to
+ * change?
+ *
+ * If we don't need to wait, and it's because the value of the variable has
+ * changed, store the current value in newval.
+ *
+ * *result is set to true if the lock was free, and false otherwise.
+ */
+static bool
+LWLockConflictsWithVar(LWLock *lock,
+	   uint64 *valptr, uint64 oldval, uint64 *newval,
+	   bool *result)
+{
+	bool		mustwait;
+	uint64		value;
+
+	mustwait = (pg_atomic_read_u32(lock-state)  LW_VAL_EXCLUSIVE) != 0;
+
+	if (!mustwait)
+	{
+		*result = true;
+		return false;
+	}
+
+	/*
+	 * Perform comparison using spinlock as we can't rely on atomic 64 bit
+	 * reads/stores.  On platforms with a way to do atomic 64 bit reads/writes
+	 * the spinlock can be optimized away.
+	 */
+#ifdef LWLOCK_STATS
+	lwstats-spin_delay_count += SpinLockAcquire(lock-mutex);
+#else
+	SpinLockAcquire(lock-mutex);
+#endif
+
+	*result = false;
+
+	value = *valptr;
+	if (value != oldval)
+	{
+		mustwait = false;
+		*newval = value;
+	}
+	else
+	{
+		mustwait = true;
+	}
+
+	SpinLockRelease(lock-mutex);
+
+	return mustwait;
+}
+
+/*
  * LWLockWaitForVar - Wait until lock is free, or a variable is updated.
  *
  * If the lock is held and *valptr equals oldval, waits until the lock is
@@ -1313,39 +1375,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 	for (;;)
 	{
 		bool		mustwait;
-		uint64		value;
-
-		mustwait = (pg_atomic_read_u32(lock-state)  LW_VAL_EXCLUSIVE) != 0;
-
-		if (mustwait)
-		{
-			/*
-			 * Perform comparison using spinlock as we can't rely on atomic 64
-			 * bit reads/stores.
-			 */
-#ifdef LWLOCK_STATS
-			lwstats-spin_delay_count += SpinLockAcquire(lock-mutex);
-#else
-			SpinLockAcquire(lock-mutex);
-#endif
 
-			/*
-			 * XXX: We can significantly optimize this on platforms with 64bit
-			 * atomics.
-			 */
-			value = *valptr;
-			if (value != oldval)
-			{
-result = false;
-mustwait = false;
-*newval = value;
-			}
-			else
-mustwait = true;
-			SpinLockRelease(lock-mutex);
-		}
-		else
-			mustwait = false;
+		mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
+		result);
 
 		if (!mustwait)
 			break;/* the lock was free or value didn't match */
@@ -1365,12 +1397,13 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		pg_atomic_fetch_or_u32(lock-state, LW_FLAG_RELEASE_OK);
 
 		/*
-		 * We're now guaranteed to be woken up if necessary. Recheck the
-		 * lock's state.
+		 * We're now guaranteed to be woken up if necessary. Recheck the lock
+		 * and variables state.
 		 */
-		mustwait = (pg_atomic_read_u32(lock-state)  LW_VAL_EXCLUSIVE) != 0;
+		mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
+		result);
 
-		/* Ok, lock is free after we queued ourselves. Undo queueing. */
+		/* Ok, no conflict after we queued ourselves. Undo queueing. */
 		if (!mustwait)
 		{
 			LOG_LWDEBUG(LWLockWaitForVar, lock, free, undoing queue);
-- 
2.4.0.rc2.1.g3d6bc9a


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote:
 On 07/29/2015 02:39 PM, Andres Freund wrote:
 In an earlier email you say:
 After the spinlock is released above, but before the LWLockQueueSelf() call,
 it's possible that another backend comes in, acquires the lock, changes the
 variable's value, and releases the lock again. In 9.4, the spinlock was not
 released until the process was queued.
 
 But that's not a problem. The updater in that case will have queued a
 wakeup for all waiters, including WaitForVar()?
 
 If you release the spinlock before LWLockQueueSelf(), then no. It's possible
 that the backend you wanted to wait for updates the variable's value before
 you've queued up. Or even releases the lock, and it gets re-acquired by
 another backend, before you've queued up.

For normal locks the equivalent problem is solved by re-checking wether
a conflicting lock is still held after enqueuing. Why don't we do the
same here? Doing it that way has the big advantage that we can just
remove the spinlocks entirely on platforms with atomic 64bit
reads/writes.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Heikki Linnakangas

On 07/29/2015 03:08 PM, Andres Freund wrote:

On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote:

On 07/29/2015 02:39 PM, Andres Freund wrote:

In an earlier email you say:

After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.


But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?


If you release the spinlock before LWLockQueueSelf(), then no. It's possible
that the backend you wanted to wait for updates the variable's value before
you've queued up. Or even releases the lock, and it gets re-acquired by
another backend, before you've queued up.


For normal locks the equivalent problem is solved by re-checking wether
a conflicting lock is still held after enqueuing. Why don't we do the
same here? Doing it that way has the big advantage that we can just
remove the spinlocks entirely on platforms with atomic 64bit
reads/writes.


Ah, ok, that should work, as long as you also re-check the variable's 
value after queueing. Want to write the patch, or should I?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-28 Thread Andres Freund
Hi,

On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:
 After applying this patch to commit fdf28853ae6a397497b79f, it has survived
 testing long enough to convince that this fixes the problem.

What was the actual workload breaking with the bug? I ran a small
variety and I couldn't reproduce it yet. I'm not saying there's no bug,
I just would like to be able to test my version of the fixes...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-28 Thread Jeff Janes
On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund and...@anarazel.de wrote:

 Hi,

 On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:
  After applying this patch to commit fdf28853ae6a397497b79f, it has
 survived
  testing long enough to convince that this fixes the problem.

 What was the actual workload breaking with the bug? I ran a small
 variety and I couldn't reproduce it yet. I'm not saying there's no bug,
 I just would like to be able to test my version of the fixes...


It was the torn-page fault-injection code here:

https://drive.google.com/open?id=0Bzqrh1SO9FcEfkxFb05uQnJ2cWg0MEpmOXlhbFdyNEItNmpuek1zU2gySGF3Vk1oYXNNLUE

It is not a minimal set, I don't know if all parts of this are necessary to
rerproduce it.  The whole crash-recovery cycling might not even be
important.

Compiled with:

./configure --enable-debug --with-libxml --with-perl --with-python
--with-ldap --with-openssl --with-gssapi
--prefix=/home/jjanes/pgsql/torn_bisect/

(Also with or without --enable-cassert).

I just ran sh do.sh  do.out and eventually it stopped producing output,
and I find everything hung up.

Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-26 Thread Amit Langote
On 2015-07-16 PM 04:03, Jeff Janes wrote:
 On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 

 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the lock
 atomically with acquiring it. Before the lwlock-scalability changes, that
 was straightforward because you held the spinlock anyway, but it's a lot
 harder/expensive now. So I changed the way acquiring a lock with a variable
 works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
 the current lock holder has updated the variable. The LWLockAcquireWithVar
 function is gone - you now just use LWLockAcquire(), which always clears
 the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
 you want to set the variable immediately. LWLockWaitForVar() always waits
 if the flag is not set, i.e. it will not return regardless of the
 variable's value, if the current lock-holder has not updated it yet.


 I ran this for a while without casserts and it seems to work.  But with
 casserts, I get failures in the autovac process on the GIN index.
 
 I don't see how this is related to the LWLock issue, but I didn't see it
 without your patch.  Perhaps the system just didn't survive long enough to
 uncover it without the patch (although it shows up pretty quickly).  It
 could just be an overzealous Assert, since the casserts off didn't show
 problems.
 
 bt and bt full are shown below.
 

I got a similar assert failure but with a btree index
(pg_attribute_relid_attnum_index). The backtrace looks like Jeff's:

(gdb) bt
#0  0x003969632625 in raise () from /lib64/libc.so.6
#1  0x003969633e05 in abort () from /lib64/libc.so.6
#2  0x0092eb9e in ExceptionalCondition (conditionName=0x9c2220
!(((PageHeader) (page))-pd_special = (__builtin_offsetof
(PageHeaderData, pd_linp))),
errorType=0x9c0c41 FailedAssertion, fileName=0x9c0c10 nbtree.c,
lineNumber=903) at assert.c:54
#3  0x004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9,
orig_blkno=9) at nbtree.c:903
#4  0x004e0067 in btvacuumscan (info=0x7fff2c765cd0,
stats=0x279f7d0, callback=0x668f6d lazy_tid_reaped,
callback_state=0x279e338, cycleid=49190)
at nbtree.c:821
#5  0x004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676
#6  0x00939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0,
collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at
fmgr.c:1375
#7  0x004d2a01 in index_bulk_delete (info=0x7fff2c765cd0,
stats=0x0, callback=0x668f6d lazy_tid_reaped, callback_state=0x279e338)
at indexam.c:690
#8  0x0066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0,
stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367
#9  0x006678a8 in lazy_scan_heap (onerel=0x274ec90,
vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at
vacuumlazy.c:1098
#10 0x006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99,
params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244
#11 0x00665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0,
options=99, params=0x27bdc88) at vacuum.c:1388
#12 0x006643ce in vacuum (options=99, relation=0x7fff2c7662a0,
relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18,
isTopLevel=1 '\001')
at vacuum.c:293
#13 0x0075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80,
bstrategy=0x27bdd18) at autovacuum.c:2807
#14 0x0075c632 in do_autovacuum () at autovacuum.c:2328
#15 0x0075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1647
#16 0x0075b0a5 in StartAutoVacWorker () at autovacuum.c:1454
#17 0x0076f9cc in StartAutovacuumWorker () at postmaster.c:5261
#18 0x0076f28a in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#19 signal handler called
#20 0x0039696e1353 in __select_nocancel () from /lib64/libc.so.6
#21 0x0076ace7 in ServerLoop () at postmaster.c:1582
#22 0x0076a538 in PostmasterMain (argc=3, argv=0x26f9330) at
postmaster.c:1263
#23 0x006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-19 Thread Jeff Janes
On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi
 wrote:


 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the
 lock atomically with acquiring it. Before the lwlock-scalability changes,
 that was straightforward because you held the spinlock anyway, but it's a
 lot harder/expensive now. So I changed the way acquiring a lock with a
 variable works. There is now a separate flag, LW_FLAG_VAR_SET, which
 indicates that the current lock holder has updated the variable. The
 LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(),
 which always clears the LW_FLAG_VAR_SET flag, and you can call
 LWLockUpdateVar() after that if you want to set the variable immediately.
 LWLockWaitForVar() always waits if the flag is not set, i.e. it will not
 return regardless of the variable's value, if the current lock-holder has
 not updated it yet.


 I ran this for a while without casserts and it seems to work.  But with
 casserts, I get failures in the autovac process on the GIN index.

 I don't see how this is related to the LWLock issue, but I didn't see it
 without your patch.  Perhaps the system just didn't survive long enough to
 uncover it without the patch (although it shows up pretty quickly).  It
 could just be an overzealous Assert, since the casserts off didn't show
 problems.


 bt and bt full are shown below.

 Cheers,

 Jeff

 #0  0x003dcb632625 in raise () from /lib64/libc.so.6
 #1  0x003dcb633e05 in abort () from /lib64/libc.so.6
 #2  0x00930b7a in ExceptionalCondition (
 conditionName=0x9a1440 !(((PageHeader) (page))-pd_special =
 (__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc
 FailedAssertion,
 fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54
 #3  0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
 ginvacuum.c:713


It now looks like this *is* unrelated to the LWLock issue.  The assert that
it is tripping over was added just recently (302ac7f27197855afa8c) and so I
had not been testing under its presence until now.  It looks like it is
finding all-zero pages (index extended but then a crash before initializing
the page?) and it doesn't like them.

(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' repeats 8191 times

Presumably before this assert, such pages would just be permanently
orphaned.

Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-19 Thread Jeff Janes
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 06/30/2015 11:24 PM, Andres Freund wrote:

 On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:

 Hm. Right. A recheck of the value after the queuing should be sufficient
 to fix? That's how we deal with the exact same scenarios for the normal
 lock state, so that shouldn't be very hard to add.


 Yeah. It's probably more efficient to not release the spinlock between
 checking the value and LWLockQueueSelf() though.


 Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
 way in a bunch of callsites... So that'd be harder.  Additionally I'm
 planning to get rid of the spinlocks around queuing (they show up as
 bottlenecks in contended workloads), so it seems more future proof not
 to assume that either way.  On top of that I think we should, when
 available (or using the same type of fallback as for 32bit?), use 64 bit
 atomics for the var anyway.

  I'll take a stab at fixing this tomorrow.


 Thanks! Do you mean both or just the second issue?


 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the lock
 atomically with acquiring it. Before the lwlock-scalability changes, that
 was straightforward because you held the spinlock anyway, but it's a lot
 harder/expensive now. So I changed the way acquiring a lock with a variable
 works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
 the current lock holder has updated the variable. The LWLockAcquireWithVar
 function is gone - you now just use LWLockAcquire(), which always clears
 the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
 you want to set the variable immediately. LWLockWaitForVar() always waits
 if the flag is not set, i.e. it will not return regardless of the
 variable's value, if the current lock-holder has not updated it yet.

 This passes make check, but I haven't done any testing beyond that. Does
 this look sane to you?



After applying this patch to commit fdf28853ae6a397497b79f, it has survived
testing long enough to convince that this fixes the problem.

Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-16 Thread Jeff Janes
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote:


 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the lock
 atomically with acquiring it. Before the lwlock-scalability changes, that
 was straightforward because you held the spinlock anyway, but it's a lot
 harder/expensive now. So I changed the way acquiring a lock with a variable
 works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
 the current lock holder has updated the variable. The LWLockAcquireWithVar
 function is gone - you now just use LWLockAcquire(), which always clears
 the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
 you want to set the variable immediately. LWLockWaitForVar() always waits
 if the flag is not set, i.e. it will not return regardless of the
 variable's value, if the current lock-holder has not updated it yet.


I ran this for a while without casserts and it seems to work.  But with
casserts, I get failures in the autovac process on the GIN index.

I don't see how this is related to the LWLock issue, but I didn't see it
without your patch.  Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly).  It
could just be an overzealous Assert, since the casserts off didn't show
problems.

bt and bt full are shown below.

Cheers,

Jeff

#0  0x003dcb632625 in raise () from /lib64/libc.so.6
#1  0x003dcb633e05 in abort () from /lib64/libc.so.6
#2  0x00930b7a in ExceptionalCondition (
conditionName=0x9a1440 !(((PageHeader) (page))-pd_special =
(__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc
FailedAssertion,
fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54
#3  0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
#4  0x0093b53a in FunctionCall2Coll (flinfo=0x7fffee073e60,
collation=0, arg1=140737186840448, arg2=0) at fmgr.c:1323
#5  0x004d5c7a in index_vacuum_cleanup (info=0x7fffee073f80,
stats=0x0) at indexam.c:717
#6  0x00664f69 in lazy_cleanup_index (indrel=0x7faafbcace20,
stats=0x0, vacrelstats=0x28809c8) at vacuumlazy.c:1400
#7  0x00664142 in lazy_scan_heap (onerel=0x7faafbcab6d0,
vacrelstats=0x28809c8, Irel=0x2881090, nindexes=2, scan_all=1 '\001') at
vacuumlazy.c:
#8  0x00662905 in lazy_vacuum_rel (onerel=0x7faafbcab6d0,
options=65, params=0x286ea00, bstrategy=0x286ea90) at vacuumlazy.c:247
#9  0x006623e4 in vacuum_rel (relid=16409, relation=0x7fffee074550,
options=65, params=0x286ea00) at vacuum.c:1377
#10 0x00660bea in vacuum (options=65, relation=0x7fffee074550,
relid=16409, params=0x286ea00, va_cols=0x0, bstrategy=0x286ea90,
isTopLevel=1 '\001')
at vacuum.c:295
#11 0x0075caa9 in autovacuum_do_vac_analyze (tab=0x286e9f8,
bstrategy=0x286ea90) at autovacuum.c:2811
#12 0x0075be67 in do_autovacuum () at autovacuum.c:2331
#13 0x0075ac1c in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1648
#14 0x0075a84d in StartAutoVacWorker () at autovacuum.c:1455
#15 0x0076f745 in StartAutovacuumWorker () at postmaster.c:5261
#16 0x0076effd in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#17 signal handler called
#18 0x003dcb6e1353 in __select_nocancel () from /lib64/libc.so.6
#19 0x0076a8f0 in ServerLoop () at postmaster.c:1582
#20 0x0076a141 in PostmasterMain (argc=4, argv=0x27b2cf0) at
postmaster.c:1263
#21 0x006c1e6e in main (argc=4, argv=0x27b2cf0) at main.c:223


#0  0x003dcb632625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x003dcb633e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00930b7a in ExceptionalCondition (
conditionName=0x9a1440 !(((PageHeader) (page))-pd_special =
(__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc
FailedAssertion,
fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54
No locals.
#3  0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
buffer = 186
page = 0x7faafc565b20 
info = 0x7fffee073f80
stats = 0x2880858
index = 0x7faafbcace20
needLock = 1 '\001'
npages = 22569
blkno = 12025
totFreePages = 11502
ginstate = {index = 0x7faafbcace20, oneCol = 1 '\001', origTupdesc
= 0x7faafbcad150, tupdesc = {0x7faafbcad150, 0x0 repeats 31 times},
  compareFn = {{fn_addr = 0x90224d bttextcmp, fn_oid = 360,
fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
  fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
  fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} repeats 31 times}, extractValueFn = {{
  fn_addr = 0x494adc 

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-15 Thread Heikki Linnakangas

On 06/30/2015 11:24 PM, Andres Freund wrote:

On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:

Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.


Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.


Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder.  Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way.  On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.


I'll take a stab at fixing this tomorrow.


Thanks! Do you mean both or just the second issue?


Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the 
lock atomically with acquiring it. Before the lwlock-scalability 
changes, that was straightforward because you held the spinlock anyway, 
but it's a lot harder/expensive now. So I changed the way acquiring a 
lock with a variable works. There is now a separate flag, 
LW_FLAG_VAR_SET, which indicates that the current lock holder has 
updated the variable. The LWLockAcquireWithVar function is gone - you 
now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET 
flag, and you can call LWLockUpdateVar() after that if you want to set 
the variable immediately. LWLockWaitForVar() always waits if the flag is 
not set, i.e. it will not return regardless of the variable's value, if 
the current lock-holder has not updated it yet.


This passes make check, but I haven't done any testing beyond that. Does 
this look sane to you?


- Heikki

From ce90bd9a1e2c8b5783ebeea83594da7d3a1c63de Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Wed, 15 Jul 2015 18:33:28 +0300
Subject: [PATCH 1/1] Fix LWLock variable support, broken by the lwlock
 scalability patch

---
 src/backend/access/transam/xlog.c |  15 ++---
 src/backend/storage/lmgr/lwlock.c | 135 ++
 src/include/storage/lwlock.h  |   1 -
 3 files changed, 83 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1dd31b3..2f34346 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1408,9 +1408,7 @@ WALInsertLockAcquire(void)
 	 * The insertingAt value is initially set to 0, as we don't know our
 	 * insert location yet.
 	 */
-	immed = LWLockAcquireWithVar(WALInsertLocks[MyLockNo].l.lock,
- WALInsertLocks[MyLockNo].l.insertingAt,
- 0);
+	immed = LWLockAcquire(WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 	if (!immed)
 	{
 		/*
@@ -1442,13 +1440,12 @@ WALInsertLockAcquireExclusive(void)
 	 */
 	for (i = 0; i  NUM_XLOGINSERT_LOCKS - 1; i++)
 	{
-		LWLockAcquireWithVar(WALInsertLocks[i].l.lock,
-			 WALInsertLocks[i].l.insertingAt,
-			 PG_UINT64_MAX);
+		LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
+		LWLockUpdateVar(WALInsertLocks[i].l.lock,
+		WALInsertLocks[i].l.insertingAt,
+		PG_UINT64_MAX);
 	}
-	LWLockAcquireWithVar(WALInsertLocks[i].l.lock,
-		 WALInsertLocks[i].l.insertingAt,
-		 0);
+	LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
 
 	holdingAllLocks = true;
 }
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 46cab49..92a1aef 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -11,12 +11,12 @@
  * LWLocks to protect its shared state.
  *
  * In addition to exclusive and shared modes, lightweight locks can be used
- * to wait until a variable changes value.  The variable is initially set
- * when the lock is acquired with LWLockAcquireWithVar, and can be updated
- * without releasing the lock by calling LWLockUpdateVar.  LWLockWaitForVar
- * waits for the variable to be updated, or until the lock is free.  The
- * meaning of the variable is up to the caller, the lightweight lock code
- * just assigns and compares it.
+ * to wait until a variable changes value.  The variable is initially put
+ * in a not-set state when the lock is acquired with LWLockAcquire, and
+ * can be updated without releasing the lock by calling LWLockUpdateVar.
+ * LWLockWaitForVar waits for the variable to be updated, or until the lock
+ * is free.  The meaning of the variable is up to the caller, the lightweight
+ * lock code just assigns and compares it.
  *
  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -99,6 +99,7 @@ extern slock_t *ShmemLock;
 
 #define LW_FLAG_HAS_WAITERS			((uint32) 1  30)
 #define 

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Jeff Janes
On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  Is there a way to use gdb to figure out who holds the lock they are
 waiting
  for?

 Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
 defined? That might do it.


 I hadn't planned on running into this problem, so had not compiled
 accordingly.

 I thought LOCK_DEBUG would produce too much output, but now I see it
 doesn't print anything unless Trace_lwlocks is on (but it still makes more
 info available to gdb, as Amit mentioned), so I think that should be OK.

 Since I messed up the gdb session causing the postmaster to SIGKILL all
 the children and destroy the evidence, I'll repeat the run compiled with
 LOCK_DEBUG and see what it looks like in the morning.


I've gotten the LWLock deadlock again.  User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 .  So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)

The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6
and is blocked on 7.

I'm not sure where to go from here.


24841:

(gdb) f 2

(gdb) p held_lwlocks
$1 = {{lock = 0x7f70e11fae40, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13df080,
mode = LW_EXCLUSIVE}, {lock = 0x7f70e11d4280, mode = LW_EXCLUSIVE}, {lock =
0x7f70e11d4280,
mode = LW_EXCLUSIVE}, {lock = 0x0, mode = LW_EXCLUSIVE} repeats 196
times}
(gdb) p T_NAME((LWLock *) held_lwlocks[1])
$2 = 0x84cee1 WALInsertLocks
(gdb) p T_ID((LWLock *) held_lwlocks[1])

(gdb) p lock
$3 = (LWLock *) 0x7f70e13df000
(gdb) p T_NAME((LWLock *) lock)
$7 = 0x84cee1 WALInsertLocks
(gdb) p T_ID((LWLock *) lock)
$8 = 6

(gdb) p *(lock)
$1 = {mutex = 0 '\000', tranche = 1, state = {value = 1627389952}, nwaiters
= {value = 2}, waiters = {head = {prev = 0x7f70e9e29e58, next =
0x7f70e9e2a140}},
  owner = 0x7f70e9e2d260}

(gdb) p *(lock-owner)
$5 = {links = {prev = 0x0, next = 0x0}, sem = {semId = 539754537, semNum =
0}, waitStatus = 0, procLatch = {is_set = 0, is_shared = 1 '\001',
owner_pid = 24820}, lxid = 0,
  pid = 24820, pgprocno = 112, backendId = -1, databaseId = 0, roleId = 0,
recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 0
'\000', lwWaitLink = {
prev = 0x7f70e13df090, next = 0x7f70e13df090}, waitLock = 0x0,
waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0,
syncRepState = 0, syncRepLinks = {prev = 0x0,
next = 0x0}, myProcLocks = {{prev = 0x7f70e9e2d2f0, next =
0x7f70e9e2d2f0}, {prev = 0x7f70e9e2d300, next = 0x7f70e9e2d300}, {prev =
0x7f70e9e2d310, next = 0x7f70e9e2d310}, {
  prev = 0x7f70e9e2d320, next = 0x7f70e9e2d320}, {prev =
0x7f70e9e2d330, next = 0x7f70e9e2d330}, {prev = 0x7f70e9e2d340, next =
0x7f70e9e2d340}, {prev = 0x7f70e9e2d350,
  next = 0x7f70e9e2d350}, {prev = 0x7f70e9e2d360, next =
0x7f70e9e2d360}, {prev = 0x7f70e9e2d370, next = 0x7f70e9e2d370}, {prev =
0x7f70e9e2d380, next = 0x7f70e9e2d380}, {
  prev = 0x7f70e9e2d390, next = 0x7f70e9e2d390}, {prev =
0x7f70e9e2d3a0, next = 0x7f70e9e2d3a0}, {prev = 0x7f70e9e2d3b0, next =
0x7f70e9e2d3b0}, {prev = 0x7f70e9e2d3c0,
  next = 0x7f70e9e2d3c0}, {prev = 0x7f70e9e2d3d0, next =
0x7f70e9e2d3d0}, {prev = 0x7f70e9e2d3e0, next = 0x7f70e9e2d3e0}}, subxids =
{xids = {0 repeats 64 times}},
  backendLock = 0x7f70e13dad00, fpLockBits = 0, fpRelId = {0 repeats 16
times}, fpVXIDLock = 0 '\000', fpLocalTransactionId = 0}


#0  0x003dcb6eaf27 in semop () from /lib64/libc.so.6
#1  0x00671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at
pg_sema.c:387
#2  0x006d79ac in LWLockWaitForVar (lock=0x7f70e13df000,
valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406
#3  0x004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at
xlog.c:1576
#4  0x004ee94b in AdvanceXLInsertBuffer (upto=41594920960,
opportunistic=value optimized out) at xlog.c:1888
#5  0x004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669
#6  0x004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920,
fpw_lsn=value optimized out) at xlog.c:1313
#7  XLogInsertRecord (rdata=0x19bc920, fpw_lsn=value optimized out) at
xlog.c:1008
#8  0x004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#9  0x0047e210 in ginPlaceToPage (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=value optimized out, updateblkno=value
optimized out, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#10 0x0047f50d in ginInsertValue (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at
ginbtree.c:748
#11 0x00475deb in ginEntryInsert (ginstate=0x7fffb33cc470,
attnum=5544, key=1, category=value optimized out, items=0x7f70e10d9140,
nitem=51, buildStats=0x0)
at gininsert.c:234
#12 0x0048602c in ginInsertCleanup 

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Heikki Linnakangas

On 06/30/2015 07:37 PM, Alvaro Herrera wrote:

Jeff Janes wrote:


I've gotten the LWLock deadlock again.  User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 .  So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)


Hmm, interesting -- pg_stat_statement is trying to re-do an operation
that involves updating a GIN index, while WAL-logging for the original
update is still ongoing, it seems.


I don't think pg_stat_statement has anything to do with this. You can 
see from the backtrace that pg_stat_statement is enabled, as the call 
went through the pgss_ExecutorRun executor hook, but that's all.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Alvaro Herrera
Jeff Janes wrote:

 I've gotten the LWLock deadlock again.  User backend 24841 holds the
 WALInsertLocks 7 and is blocked attempting to acquire 6 .  So it seems to
 be violating the lock ordering rules (although I don't see that rule
 spelled out in xlog.c)

Hmm, interesting -- pg_stat_statement is trying to re-do an operation
that involves updating a GIN index, while WAL-logging for the original
update is still ongoing, it seems.


 #0  0x003dcb6eaf27 in semop () from /lib64/libc.so.6
 #1  0x00671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at
 pg_sema.c:387
 #2  0x006d79ac in LWLockWaitForVar (lock=0x7f70e13df000,
 valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406
 #3  0x004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at
 xlog.c:1576
 #4  0x004ee94b in AdvanceXLInsertBuffer (upto=41594920960,
 opportunistic=value optimized out) at xlog.c:1888
 #5  0x004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669
 #6  0x004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920,
 fpw_lsn=value optimized out) at xlog.c:1313
 #7  XLogInsertRecord (rdata=0x19bc920, fpw_lsn=value optimized out) at
 xlog.c:1008
 #8  0x004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at
 xloginsert.c:453
 #9  0x0047e210 in ginPlaceToPage (btree=0x7fffb33cc070,
 stack=0x1a5bfe0, insertdata=value optimized out, updateblkno=value
 optimized out, childbuf=0, buildStats=0x0)
 at ginbtree.c:418
 #10 0x0047f50d in ginInsertValue (btree=0x7fffb33cc070,
 stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at
 ginbtree.c:748
 #11 0x00475deb in ginEntryInsert (ginstate=0x7fffb33cc470,
 attnum=5544, key=1, category=value optimized out, items=0x7f70e10d9140,
 nitem=51, buildStats=0x0)
 at gininsert.c:234
 #12 0x0048602c in ginInsertCleanup (ginstate=0x7fffb33cc470,
 vac_delay=value optimized out, stats=0x0) at ginfast.c:843
 #13 0x004871b9 in ginHeapTupleFastInsert (ginstate=0x7fffb33cc470,
 collector=value optimized out) at ginfast.c:436
 #14 0x0047625a in gininsert (fcinfo=value optimized out) at
 gininsert.c:531
 #15 0x007dc483 in FunctionCall6Coll (flinfo=value optimized out,
 collation=value optimized out, arg1=value optimized out, arg2=value
 optimized out,
 arg3=value optimized out, arg4=value optimized out,
 arg5=140122789534360, arg6=0) at fmgr.c:1436
 #16 0x004b2247 in index_insert (indexRelation=0x7f70e1190e60,
 values=0x7fffb33cef50, isnull=0x7fffb33cf050 , heap_t_ctid=0x1a4a794,
 heapRelation=0x7f70e1185a98,
 checkUnique=UNIQUE_CHECK_NO) at indexam.c:226
 #17 0x005d2e67 in ExecInsertIndexTuples (slot=0x1a497f0,
 tupleid=0x1a4a794, estate=0x1a40768, noDupErr=0 '\000', specConflict=0x0,
 arbiterIndexes=0x0)
 at execIndexing.c:384
 #18 0x005f0161 in ExecUpdate (tupleid=0x7fffb33cf250, oldtuple=0x0,
 slot=0x1a497f0, planSlot=0x1a42498, epqstate=0x1a40a70, estate=0x1a40768,
 canSetTag=1 '\001')
 at nodeModifyTable.c:978
 #19 0x005f0b2a in ExecModifyTable (node=0x1a409d0) at
 nodeModifyTable.c:1436
 #20 0x005d6cc8 in ExecProcNode (node=0x1a409d0) at
 execProcnode.c:389
 #21 0x005d5402 in ExecutePlan (queryDesc=0x1a2b220,
 direction=value optimized out, count=0) at execMain.c:1549
 #22 standard_ExecutorRun (queryDesc=0x1a2b220, direction=value optimized
 out, count=0) at execMain.c:337
 #23 0x7f70ea39af3b in pgss_ExecutorRun (queryDesc=0x1a2b220,
 direction=ForwardScanDirection, count=0) at pg_stat_statements.c:881
 #24 0x006ea38f in ProcessQuery (plan=0x1a2b3d8,
 sourceText=0x1a2af68 update foo set count=count+1 where text_array @ $1,
 params=0x1a2afe8, dest=value optimized out,
 completionTag=0x7fffb33cf660 ) at pquery.c:185
 #25 0x006ea5ec in PortalRunMulti (portal=0x19a6128, isTopLevel=1
 '\001', dest=0xc55020, altdest=0xc55020, completionTag=0x7fffb33cf660 )
 at pquery.c:1279
 #26 0x006eadb3 in PortalRun (portal=0x19a6128,
 count=9223372036854775807, isTopLevel=1 '\001', dest=0x1990630,
 altdest=0x1990630, completionTag=0x7fffb33cf660 )
 at pquery.c:816
 #27 0x006e6ffb in exec_execute_message (portal_name=0x1990218 ,
 max_rows=9223372036854775807) at postgres.c:1988
 #28 0x006e8c15 in PostgresMain (argc=value optimized out,
 argv=value optimized out, dbname=0x19a3210 jjanes, username=value
 optimized out) at postgres.c:4088
 #29 0x006855dd in BackendRun (argc=value optimized out,
 argv=value optimized out) at postmaster.c:4159
 #30 BackendStartup (argc=value optimized out, argv=value optimized out)
 at postmaster.c:3835
 #31 ServerLoop (argc=value optimized out, argv=value optimized out) at
 postmaster.c:1609
 #32 PostmasterMain (argc=value optimized out, argv=value optimized out)
 at postmaster.c:1254
 #33 0x00610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Heikki Linnakangas

On 06/30/2015 07:05 PM, Jeff Janes wrote:

On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote:


On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote:


On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:

Is there a way to use gdb to figure out who holds the lock they are

waiting

for?


Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.



I hadn't planned on running into this problem, so had not compiled
accordingly.

I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.

Since I messed up the gdb session causing the postmaster to SIGKILL all
the children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.



I've gotten the LWLock deadlock again.  User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 .  So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)

The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6
and is blocked on 7.

I'm not sure where to go from here.


The user backend 24841 is waiting in a LWLockWaitForVar() call, on 
WALInsertLock 6, and oldval==0. The checkpointer is holding 
WALInsertLock 6, but it has set the insertingat value to PG_UINT64_MAX. 
The LWLockWaitForVar() call should not be blocked on that, because 0 != 
PG_UINT64_MAX.


Looks like the LWLock-scalability patch that made LWlock acquisiton to 
use atomic ops instead of a spinlock broke this. LWLockWaitForVar() is 
supposed to guarantee that it always wakes up from sleep, when the 
variable's value changes. But there is now a race condition in 
LWLockWaitForVar that wasn't there in 9.4:



if (mustwait)
{
/*
 * Perform comparison using spinlock as we can't rely 
on atomic 64
 * bit reads/stores.
 */
#ifdef LWLOCK_STATS
lwstats-spin_delay_count += 
SpinLockAcquire(lock-mutex);
#else
SpinLockAcquire(lock-mutex);
#endif

/*
 * XXX: We can significantly optimize this on platforms 
with 64bit
 * atomics.
 */
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(lock-mutex);
}
else
mustwait = false;

if (!mustwait)
break;  /* the lock was free or 
value didn't match */

/*
 * Add myself to wait queue. Note that this is racy, somebody 
else
 * could wakeup before we're finished queuing. NB: We're using 
nearly
 * the same twice-in-a-row lock acquisition protocol as
 * LWLockAcquire(). Check its comments for details.
 */
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);


After the spinlock is released above, but before the LWLockQueueSelf() 
call, it's possible that another backend comes in, acquires the lock, 
changes the variable's value, and releases the lock again. In 9.4, the 
spinlock was not released until the process was queued.


Looking at LWLockAcquireWithVar(), it's also no longer holding the 
spinlock while it updates the Var. That's not cool either :-(.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Andres Freund
On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote:
  /*
   * XXX: We can significantly optimize this on platforms 
  with 64bit
   * atomics.
   */
  value = *valptr;
  if (value != oldval)
  {
  result = false;
  mustwait = false;
  *newval = value;
  }
  else
  mustwait = true;
  SpinLockRelease(lock-mutex);
  }
  else
  mustwait = false;
 
  if (!mustwait)
  break;  /* the lock was free or 
  value didn't match */
 
  /*
   * Add myself to wait queue. Note that this is racy, somebody 
  else
   * could wakeup before we're finished queuing. NB: We're using 
  nearly
   * the same twice-in-a-row lock acquisition protocol as
   * LWLockAcquire(). Check its comments for details.
   */
  LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);
 
 After the spinlock is released above, but before the LWLockQueueSelf() call,
 it's possible that another backend comes in, acquires the lock, changes the
 variable's value, and releases the lock again. In 9.4, the spinlock was not
 released until the process was queued.

Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.

 Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock
 while it updates the Var. That's not cool either :-(.

Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but
it really isn't. This var stuff isn't fitting all that well into the
lwlock code.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Heikki Linnakangas

On 06/30/2015 10:09 PM, Andres Freund wrote:

On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote:

/*
 * XXX: We can significantly optimize this on platforms 
with 64bit
 * atomics.
 */
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(lock-mutex);
}
else
mustwait = false;

if (!mustwait)
break;  /* the lock was free or 
value didn't match */

/*
 * Add myself to wait queue. Note that this is racy, somebody 
else
 * could wakeup before we're finished queuing. NB: We're using 
nearly
 * the same twice-in-a-row lock acquisition protocol as
 * LWLockAcquire(). Check its comments for details.
 */
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);


After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.


Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.


Yeah. It's probably more efficient to not release the spinlock between 
checking the value and LWLockQueueSelf() though.



Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock
while it updates the Var. That's not cool either :-(.


Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but
it really isn't. This var stuff isn't fitting all that well into the
lwlock code.


I'll take a stab at fixing this tomorrow. I take the blame for not 
documenting the semantics of LWLockAcquireWithVar() and friends well 
enough...


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Andres Freund
On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:
 Hm. Right. A recheck of the value after the queuing should be sufficient
 to fix? That's how we deal with the exact same scenarios for the normal
 lock state, so that shouldn't be very hard to add.
 
 Yeah. It's probably more efficient to not release the spinlock between
 checking the value and LWLockQueueSelf() though.

Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder.  Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way.  On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.

 I'll take a stab at fixing this tomorrow.

Thanks! Do you mean both or just the second issue?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Jeff Janes
On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  Is there a way to use gdb to figure out who holds the lock they are
 waiting
  for?

 Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
 defined? That might do it.


I hadn't planned on running into this problem, so had not compiled
accordingly.

I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.

Since I messed up the gdb session causing the postmaster to SIGKILL all the
children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.

Thanks,

Jeff


[HACKERS] LWLock deadlock and gdb advice

2015-06-29 Thread Jeff Janes
I have a 9.5alpha1 cluster which is locked up.  All the user back ends seem
to be waiting on semop, eventually on WALInsertLockAcquire.

Is there a way to use gdb to figure out who holds the lock they are waiting
for?

It is compiled with both debug and cassert.

I am hoping someone can give me recipe similar to the one for Examining
backend memory use in https://wiki.postgresql.org/wiki/Developer_FAQ

example backtrace:

#0  0x003dcb6eaf27 in semop () from /lib64/libc.so.6
#1  0x0067190f in PGSemaphoreLock (sema=0x7f28a98b9468) at
pg_sema.c:387
#2  0x006d4b0c in LWLockAcquireCommon (l=0x7f28a0e6d600,
valptr=0x7f28a0e6d618, val=0) at lwlock.c:1042
#3  LWLockAcquireWithVar (l=0x7f28a0e6d600, valptr=0x7f28a0e6d618, val=0)
at lwlock.c:916
#4  0x004f3c4f in WALInsertLockAcquire (rdata=0xc5c130, fpw_lsn=0)
at xlog.c:1411
#5  XLogInsertRecord (rdata=0xc5c130, fpw_lsn=0) at xlog.c:948
#6  0x004f7aac in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#7  0x0047e0b0 in ginPlaceToPage (btree=0x7fffca9263e0,
stack=0x2c94ff8, insertdata=value optimized out, updateblkno=value
optimized out, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#8  0x0047f3ad in ginInsertValue (btree=0x7fffca9263e0,
stack=0x2c94ff8, insertdata=0x7fffca926460, buildStats=0x0) at
ginbtree.c:748
#9  0x00475c8b in ginEntryInsert (ginstate=0x7fffca9267e0,
attnum=29784, key=1, category=value optimized out, items=0x7f28a0c7b458,
nitem=47, buildStats=0x0)
at gininsert.c:234
#10 0x00485ecc in ginInsertCleanup (ginstate=0x7fffca9267e0,
vac_delay=value optimized out, stats=0x0) at ginfast.c:843
#11 0x00487059 in ginHeapTupleFastInsert (ginstate=0x7fffca9267e0,
collector=value optimized out) at ginfast.c:436
#12 0x004760fa in gininsert (fcinfo=value optimized out) at
gininsert.c:531

Cheers,

Jeff


Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-29 Thread Peter Geoghegan
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Is there a way to use gdb to figure out who holds the lock they are waiting
 for?

Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.

Otherwise, I suggest dereferencing the l argument to
LWLockAcquireWithVar() or something -- set the frame to 3 in your
example backtrace, using f 3. Then, p *l. You'll get the tranche
id there, and so on.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-29 Thread Amit Kapila
On Tue, Jun 30, 2015 at 6:25 AM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  Is there a way to use gdb to figure out who holds the lock they are
waiting
  for?

 Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
 defined? That might do it.


If you define LOCK_DEBUG, then you can check owner of the
lock [1], which will tell you about the Exclusive owner of that lock
and can help you in debugging the problem.


[1]
#ifdef LOCK_DEBUG
struct PGPROC *owner; /* last exlusive owner of the lock */
#endif
} LWLock;


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com