Re: TopoSort() fix

2019-07-31 Thread Robert Haas
On Tue, Jul 30, 2019 at 2:10 PM Tom Lane wrote: > Sure. But I think what we can foresee is that if there are any bugs > reachable this way, they'd be reachable and need fixing regardless. > We've already established that parallel workers can take and release locks > that their leader isn't

Re: TopoSort() fix

2019-07-30 Thread Tom Lane
Robert Haas writes: > On Tue, Jul 30, 2019 at 1:44 PM Tom Lane wrote: >> Well, there'd be an actual isolation test that they work ;-), if you >> override the marking. Admittedly, one test case does not prove that >> there's no way to crash the system, but that can be said of most >> parts of

Re: TopoSort() fix

2019-07-30 Thread Robert Haas
On Tue, Jul 30, 2019 at 1:44 PM Tom Lane wrote: > Well, there'd be an actual isolation test that they work ;-), if you > override the marking. Admittedly, one test case does not prove that > there's no way to crash the system, but that can be said of most > parts of Postgres. True. I'm just

Re: TopoSort() fix

2019-07-30 Thread Tom Lane
Robert Haas writes: > On Tue, Jul 30, 2019 at 1:36 PM Tom Lane wrote: >> In any case, my question at the moment is whether we need the belt-and- >> suspenders-too approach of having both non-parallel-safe marking and an >> explicit check inside these functions. We've largely moved away from >>

Re: TopoSort() fix

2019-07-30 Thread Robert Haas
On Tue, Jul 30, 2019 at 1:36 PM Tom Lane wrote: > No, there's a sufficient reason why we should force advisory locks > to be taken in the leader process, namely that the behavior is totally > different if we don't: they will disappear at the end of the parallel > worker run, not at end of

Re: TopoSort() fix

2019-07-30 Thread Tom Lane
Robert Haas writes: > On Tue, Jul 30, 2019 at 10:27 AM Tom Lane wrote: >> (BTW, why aren't these functions just "parallel restricted"?) > ... > But it is really pretty arguable whether we should feel responsible > for that problem. We could just decide that if you're doing that, and > you

Re: TopoSort() fix

2019-07-30 Thread Robert Haas
On Tue, Jul 30, 2019 at 10:27 AM Tom Lane wrote: > I also looked into whether one could use SELECT FOR UPDATE/SHARE to get > stronger locks at a tuple level, but that's been blocked off as well. > You guys really did a pretty good job of locking that down. Thanks. We learned from the master. >

Re: TopoSort() fix

2019-07-30 Thread Tom Lane
Robert Haas writes: > On Mon, Jul 29, 2019 at 9:48 PM Tom Lane wrote: >> I believe the only accessible route to taking any sort of new lock >> in a parallel worker is catalog lookups causing AccessShareLock on >> a catalog. > Can't the worker just query a previously-untouched table, maybe by >

Re: TopoSort() fix

2019-07-29 Thread Tom Lane
Alvaro Herrera writes: > On 2019-Jul-29, Tom Lane wrote: >> FYI, I just got done inventing a way to reach that code, and I have >> to suspect that it's impossible to do so in production, because under >> ordinary circumstances no parallel worker will take any exclusive lock >> that isn't already

Re: TopoSort() fix

2019-07-29 Thread Alvaro Herrera
On 2019-Jul-29, Tom Lane wrote: > FYI, I just got done inventing a way to reach that code, and I have > to suspect that it's impossible to do so in production, because under > ordinary circumstances no parallel worker will take any exclusive lock > that isn't already held by its leader. Hmm,

Re: TopoSort() fix

2019-07-29 Thread Michael Paquier
On Mon, Jul 29, 2019 at 10:56:05AM -0400, Tom Lane wrote: > Michael Paquier writes: >> +1. Any volunteers? > > If Robert doesn't weigh in pretty soon, I'll take responsibility for it. Thanks Tom for taking care of it! -- Michael signature.asc Description: PGP signature

Re: TopoSort() fix

2019-07-29 Thread Robert Haas
On Mon, Jul 29, 2019 at 9:48 PM Tom Lane wrote: > I tried that first. There are backstops preventing doing LOCK TABLE > in a worker, just like for advisory locks. > > I believe the only accessible route to taking any sort of new lock > in a parallel worker is catalog lookups causing

Re: TopoSort() fix

2019-07-29 Thread Tom Lane
Robert Haas writes: > On Mon, Jul 29, 2019 at 5:55 PM Tom Lane wrote: >> FYI, I just got done inventing a way to reach that code, and I have >> to suspect that it's impossible to do so in production, because under >> ordinary circumstances no parallel worker will take any exclusive lock >> that

Re: TopoSort() fix

2019-07-29 Thread Robert Haas
On Mon, Jul 29, 2019 at 5:55 PM Tom Lane wrote: > FYI, I just got done inventing a way to reach that code, and I have > to suspect that it's impossible to do so in production, because under > ordinary circumstances no parallel worker will take any exclusive lock > that isn't already held by its

Re: TopoSort() fix

2019-07-29 Thread Tom Lane
[ removing , as that mailing address seems to be MIA ] Robert Haas writes: > On Mon, Jul 29, 2019 at 10:56 AM Tom Lane wrote: >> If Robert doesn't weigh in pretty soon, I'll take responsibility for it. > That's fine, or if you prefer that I commit it, I will. FYI, I just got done inventing a

Re: TopoSort() fix

2019-07-29 Thread Robert Haas
On Mon, Jul 29, 2019 at 10:56 AM Tom Lane wrote: > Michael Paquier writes: > > On Fri, Jul 26, 2019 at 08:24:16PM -0400, Tom Lane wrote: > >> I think this is a sufficiently obvious bug, and a sufficiently > >> obvious fix, that we should just fix it and not insist on getting > >> a reproducible

Re: TopoSort() fix

2019-07-29 Thread Tom Lane
Michael Paquier writes: > On Fri, Jul 26, 2019 at 08:24:16PM -0400, Tom Lane wrote: >> I think this is a sufficiently obvious bug, and a sufficiently >> obvious fix, that we should just fix it and not insist on getting >> a reproducible test case first. I think a test case would soon >> bit-rot

Re: TopoSort() fix

2019-07-28 Thread Michael Paquier
On Fri, Jul 26, 2019 at 08:24:16PM -0400, Tom Lane wrote: > I think this is a sufficiently obvious bug, and a sufficiently > obvious fix, that we should just fix it and not insist on getting > a reproducible test case first. I think a test case would soon > bit-rot anyway, and no longer exercise

Re: TopoSort() fix

2019-07-26 Thread Tom Lane
Andres Freund writes: > On 2019-07-26 18:05:38 -0400, Alvaro Herrera wrote: >> Hello, is anybody looking into this issue? > I guess this is on Robert's docket otherwise. He's on vacation till > early next week... I think this is a sufficiently obvious bug, and a sufficiently obvious fix, that

Re: TopoSort() fix

2019-07-26 Thread Andres Freund
Hi, On 2019-07-26 18:05:38 -0400, Alvaro Herrera wrote: > On 2019-Jul-04, Rui Hai Jiang wrote: > > > I'll try to figure out some scenarios to do the test. A parallel process > > group is needed for the test. Rui, have you made any progress on this? > > Actually I was trying to do some

Re: TopoSort() fix

2019-07-26 Thread Alvaro Herrera
On 2019-Jul-04, Rui Hai Jiang wrote: > I'll try to figure out some scenarios to do the test. A parallel process > group is needed for the test. > > Actually I was trying to do some testing against the locking mechanism. I > happened to see this issue. Hello, is anybody looking into this issue?

Re: TopoSort() fix

2019-07-03 Thread Rui Hai Jiang
I'll try to figure out some scenarios to do the test. A parallel process group is needed for the test. Actually I was trying to do some testing against the locking mechanism. I happened to see this issue. On Wed, Jul 3, 2019 at 9:38 PM Robert Haas wrote: > On Tue, Jul 2, 2019 at 11:23 AM Tom

Re: TopoSort() fix

2019-07-03 Thread Robert Haas
On Tue, Jul 2, 2019 at 11:23 AM Tom Lane wrote: > Rui Hai Jiang writes: > > I think I found an issue in the TopoSort() function. > > This indeed seems like a live bug introduced by a1c1af2a. > Robert? This is pretty thoroughly swapped out of my head, but it looks like that analysis might be

Re: TopoSort() fix

2019-07-03 Thread Michael Paquier
On Wed, Jul 03, 2019 at 10:41:59AM +0800, Rui Hai Jiang wrote: > Could the attached patch fix this issue? Or does any one else plan to fix > it? > > If people are busy and have not time, I can go ahead to fix it. To fix > this issue, do we need a patch for each official branch? Only a

Re: TopoSort() fix

2019-07-02 Thread Rui Hai Jiang
Could the attached patch fix this issue? Or does any one else plan to fix it? If people are busy and have not time, I can go ahead to fix it. To fix this issue, do we need a patch for each official branch? Regards, Ruihai On Tue, Jul 2, 2019 at 11:23 PM Tom Lane wrote: > Rui Hai Jiang

Re: TopoSort() fix

2019-07-02 Thread Tom Lane
Rui Hai Jiang writes: > I think I found an issue in the TopoSort() function. This indeed seems like a live bug introduced by a1c1af2a. Robert? regards, tom lane

TopoSort() fix

2019-07-02 Thread Rui Hai Jiang
Hi Hackers, I think I found an issue in the TopoSort() function. As the comments say, /* . * .. If there are any other processes * in the same lock group on the queue, set their number of * beforeConstraints to -1 to