Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-05 Thread Jeff Janes
On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund and...@anarazel.de wrote: On 2015-08-02 17:04:07 +0200, Andres Freund wrote: I've attached a version of the patch that should address Heikki's concern. It imo also improves the API and increases debuggability by not having stale variable values

Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-05 Thread Andres Freund
On 2015-08-05 11:22:55 -0700, Jeff Janes wrote: On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund and...@anarazel.de wrote: On 2015-08-02 17:04:07 +0200, Andres Freund wrote: I've attached a version of the patch that should address Heikki's concern. It imo also improves the API and

Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Andres Freund
On 2015-08-02 12:33:06 -0400, Tom Lane wrote: Andres Freund and...@anarazel.de writes: I plan to commit the patch tomorrow, so it's included in alpha2. Please try to commit anything you want in alpha2 *today*. I'd prefer to see some successful buildfarm cycles on such patches before we

Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Tom Lane
Andres Freund and...@anarazel.de writes: I plan to commit the patch tomorrow, so it's included in alpha2. Please try to commit anything you want in alpha2 *today*. I'd prefer to see some successful buildfarm cycles on such patches before we wrap. regards, tom lane --

Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Andres Freund
Hi Jeff, Heikki, On 2015-07-31 09:48:28 -0700, Jeff Janes wrote: I had run it for 24 hours, while it usually took less than 8 hours to look up before. I did see it within a few minutes one time when I checked out a new HEAD and then forgot to re-apply your or Heikki's patch. But now I've

Re: [HACKERS] LWLock deadlock and gdb advice

2015-08-02 Thread Andres Freund
On 2015-08-02 17:04:07 +0200, Andres Freund wrote: I've attached a version of the patch that should address Heikki's concern. It imo also improves the API and increases debuggability by not having stale variable values in the variables anymore. (also attached is a minor optimization that

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-31 Thread Jeff Janes
On Thu, Jul 30, 2015 at 8:22 PM, Andres Freund and...@anarazel.de wrote: On 2015-07-30 09:03:01 -0700, Jeff Janes wrote: On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote: What do you think about something roughly like the attached? I've not evaluated the code,

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-31 Thread Heikki Linnakangas
On 07/30/2015 09:14 PM, Andres Freund wrote: On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote: In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held, until it has updated the variable. And LWLockWaitForVar() holds the spinlock when it checks that the lock is held and

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Heikki Linnakangas
On 07/29/2015 09:35 PM, Andres Freund wrote: On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote: Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on it. The lock holder releases the lock, and wakes up A. But before A wakes up and sees that the lock is free, another

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Andres Freund
On 2015-07-30 09:03:01 -0700, Jeff Janes wrote: On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote: What do you think about something roughly like the attached? I've not evaluated the code, but applying it does solve the problem I was seeing. Cool, thanks for

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Jeff Janes
On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-29 14:22:23 +0200, Andres Freund wrote: On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Andres Freund
On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote: In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held, until it has updated the variable. And LWLockWaitForVar() holds the spinlock when it checks that the lock is held and that the variable's value matches. So it cannot

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-30 Thread Andres Freund
On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote: Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on it. The lock holder releases the lock, and wakes up A. But before A wakes up and sees that the lock is free, another backend acquires the lock again. It runs

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 09:23:32 -0700, Jeff Janes wrote: On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've reproduced it again against commit b2ed8edeecd715c8a23ae462. It took 5 hours on a 8 core Intel(R) Xeon(R) CPU E5-2650. I also reproduced it in 3 hours on the same

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Jeff Janes
On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund and...@anarazel.de wrote: Hi, On 2015-07-19 11:49:14 -0700, Jeff Janes wrote: After applying this patch to commit fdf28853ae6a397497b79f, it has survived testing long

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Jeff Janes
On Wed, Jul 29, 2015 at 9:26 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-29 09:23:32 -0700, Jeff Janes wrote: On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've reproduced it again against commit b2ed8edeecd715c8a23ae462. It took 5 hours on a 8 core

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Heikki Linnakangas
On 07/29/2015 04:10 PM, Andres Freund wrote: On 2015-07-29 14:22:23 +0200, Andres Freund wrote: On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? I'll try.

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Heikki Linnakangas
On 07/29/2015 02:39 PM, Andres Freund wrote: On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote: Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
Hi, Finally getting to this. On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote: Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway,

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? I'll try. Shouldn't be too hard. Andres -- Sent via pgsql-hackers mailing list

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 14:22:23 +0200, Andres Freund wrote: On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: Ah, ok, that should work, as long as you also re-check the variable's value after queueing. Want to write the patch, or should I? I'll try. Shouldn't be too hard. What do you think

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Andres Freund
On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote: On 07/29/2015 02:39 PM, Andres Freund wrote: In an earlier email you say: After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-29 Thread Heikki Linnakangas
On 07/29/2015 03:08 PM, Andres Freund wrote: On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote: On 07/29/2015 02:39 PM, Andres Freund wrote: In an earlier email you say: After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-28 Thread Andres Freund
Hi, On 2015-07-19 11:49:14 -0700, Jeff Janes wrote: After applying this patch to commit fdf28853ae6a397497b79f, it has survived testing long enough to convince that this fixes the problem. What was the actual workload breaking with the bug? I ran a small variety and I couldn't reproduce it

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-28 Thread Jeff Janes
On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund and...@anarazel.de wrote: Hi, On 2015-07-19 11:49:14 -0700, Jeff Janes wrote: After applying this patch to commit fdf28853ae6a397497b79f, it has survived testing long enough to convince that this fixes the problem. What was the actual

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-26 Thread Amit Langote
On 2015-07-16 PM 04:03, Jeff Janes wrote: On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-19 Thread Jeff Janes
On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-19 Thread Jeff Janes
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/30/2015 11:24 PM, Andres Freund wrote: On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote: Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-16 Thread Jeff Janes
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the

Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-15 Thread Heikki Linnakangas
On 06/30/2015 11:24 PM, Andres Freund wrote: On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote: Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Jeff Janes
On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Heikki Linnakangas
On 06/30/2015 07:37 PM, Alvaro Herrera wrote: Jeff Janes wrote: I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Alvaro Herrera
Jeff Janes wrote: I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) Hmm, interesting -- pg_stat_statement

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Heikki Linnakangas
On 06/30/2015 07:05 PM, Jeff Janes wrote: On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Andres Freund
On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote: /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */ value = *valptr; if

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Heikki Linnakangas
On 06/30/2015 10:09 PM, Andres Freund wrote: On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote: /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Andres Freund
On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote: Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Yeah. It's probably more efficient to

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-30 Thread Jeff Janes
On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and

[HACKERS] LWLock deadlock and gdb advice

2015-06-29 Thread Jeff Janes
I have a 9.5alpha1 cluster which is locked up. All the user back ends seem to be waiting on semop, eventually on WALInsertLockAcquire. Is there a way to use gdb to figure out who holds the lock they are waiting for? It is compiled with both debug and cassert. I am hoping someone can give me

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-29 Thread Peter Geoghegan
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. Otherwise, I suggest dereferencing the l

Re: [HACKERS] LWLock deadlock and gdb advice

2015-06-29 Thread Amit Kapila
On Tue, Jun 30, 2015 at 6:25 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and