Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-23 Thread Robert Haas
On Mon, Feb 22, 2016 at 7:59 PM, Tom Lane wrote: >> No, you don't. I've spent a good deal of time thinking about that problem. >> [ much snipped ] >> Unless I'm missing something, though, this is a fairly obscure >> problem. Early release of catalog locks is desirable, and

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-22 Thread Tom Lane
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> ... However, this is one of the big problems that >> we'd have to have a solution for before we ever consider allowing >> read-write parallelism. > Having such a blocker for read-write parallelism would be

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-22 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Robert Haas writes: > > On Wed, Feb 17, 2016 at 9:48 PM, Tom Lane wrote: > >> I just had a rather disturbing thought to the effect that this entire > >> design --- ie, parallel workers taking out locks for

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-22 Thread Tom Lane
Robert Haas writes: > On Mon, Feb 22, 2016 at 2:56 AM, Tom Lane wrote: > !held by the indicated process. False indicates that this process is > !currently waiting to acquire this lock, which implies that at > least one other > !process is

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-22 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 17, 2016 at 9:48 PM, Tom Lane wrote: >> I just had a rather disturbing thought to the effect that this entire >> design --- ie, parallel workers taking out locks for themselves --- is >> fundamentally flawed. As far as

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-22 Thread Robert Haas
On Mon, Feb 22, 2016 at 2:56 AM, Tom Lane wrote: > I wrote: >> Robert Haas writes: >>> As for the patch itself, I'm having trouble grokking what it's trying >>> to do. I think it might be worth having a comment defining precisely >>> what we mean by "A

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-22 Thread Robert Haas
On Wed, Feb 17, 2016 at 9:48 PM, Tom Lane wrote: > I just had a rather disturbing thought to the effect that this entire > design --- ie, parallel workers taking out locks for themselves --- is > fundamentally flawed. As far as I can tell from README.parallel, > parallel

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-21 Thread Tom Lane
I wrote: > Robert Haas writes: >> As for the patch itself, I'm having trouble grokking what it's trying >> to do. I think it might be worth having a comment defining precisely >> what we mean by "A blocks B". I would define "A blocks B" in general >> as either A holds a

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-17 Thread Tom Lane
I just had a rather disturbing thought to the effect that this entire design --- ie, parallel workers taking out locks for themselves --- is fundamentally flawed. As far as I can tell from README.parallel, parallel workers are supposed to exit (and, presumably, release their locks) before the

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-17 Thread Tom Lane
Robert Haas writes: > On Tue, Feb 16, 2016 at 2:59 AM, Tom Lane wrote: >> Not to be neglected also is that (I believe) this gives the right answer, >> whereas isolationtester's existing query is currently completely broken by >> parallel queries, and it

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-17 Thread Robert Haas
On Tue, Feb 16, 2016 at 2:59 AM, Tom Lane wrote: > Andres Freund writes: >> I wonder if we shouldn't just expose a 'which pid is process X waiting >> for' API, implemented serverside. That's generally really useful, and >> looks like it's actually going to

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-15 Thread Tom Lane
Andres Freund writes: > I wonder if we shouldn't just expose a 'which pid is process X waiting > for' API, implemented serverside. That's generally really useful, and > looks like it's actually going to be less complicated than that > query... And it's surely going to be

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-13 Thread Tom Lane
Greg Stark writes: > The tests worked fine on faster build animals, right? And the clobber > animals are much much slower So it seems perfectly sensible that their > deadlock timeout would just have to be much much higher to have the same > behaviour. I see nothing wrong in

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-12 Thread Robert Haas
On Thu, Feb 11, 2016 at 11:34 PM, Tom Lane wrote: > We're not out of the woods on this :-( ... jaguarundi, which is the first > of the CLOBBER_CACHE_ALWAYS animals to run these tests, didn't like them > at all. I think I fixed the deadlock-soft-2 failure, but its take on >

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-12 Thread Tom Lane
Robert Haas writes: > On Thu, Feb 11, 2016 at 11:34 PM, Tom Lane wrote: >> The problem here is that when the deadlock detector kills s8's >> transaction, s7a8 is also left free to proceed, so there is a race >> condition as to which query completion

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-12 Thread Robert Haas
On Fri, Feb 12, 2016 at 4:59 PM, Tom Lane wrote: > I wrote: >> Instead, what I propose we do about this is to change isolationtester >> so that once it's decided that a given step is blocked, it no longer >> issues the is-it-waiting query for that step; it just assumes that

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-12 Thread Tom Lane
I wrote: > Instead, what I propose we do about this is to change isolationtester > so that once it's decided that a given step is blocked, it no longer > issues the is-it-waiting query for that step; it just assumes that the > step should be treated as blocked. So all we need do for "backlogged"

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-11 Thread Tom Lane
Robert Haas writes: >> That would be great. Taking a look at what happened, I have a feeling >> this may be a race condition of some kind in the isolation tester. It >> seems to have failed to recognize that a1 started waiting, and that >> caused the "deadlock detected"

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-11 Thread Tom Lane
We're not out of the woods on this :-( ... jaguarundi, which is the first of the CLOBBER_CACHE_ALWAYS animals to run these tests, didn't like them at all. I think I fixed the deadlock-soft-2 failure, but its take on deadlock-hard is: *** 17,25 step s6a7: LOCK TABLE a7; step s7a8: LOCK

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-11 Thread Tom Lane
I wrote: > No, because the machines that are failing are showing a "" > annotation that your reference output *doesn't* have. I think what is > actually happening is that these machines are seeing the process as > waiting and reporting it, whereas on your machine the backend detects > the

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-11 Thread Robert Haas
On Thu, Feb 11, 2016 at 12:04 PM, Tom Lane wrote: > I wrote: >> No, because the machines that are failing are showing a "" >> annotation that your reference output *doesn't* have. I think what is >> actually happening is that these machines are seeing the process as >>