Re: SegFault on 9.6.14

2019-11-26 Thread Amit Kapila
On Wed, Nov 20, 2019 at 5:12 PM Amit Kapila wrote: > > On Mon, Nov 18, 2019 at 2:22 PM Amit Kapila wrote: > > > > I have modified the commit message as proposed above and additionally > > added comments in nodeLimit.c. I think we should move ahead with this > > bug-fix patch. If we don't like

Re: SegFault on 9.6.14

2019-11-20 Thread Amit Kapila
On Mon, Nov 18, 2019 at 2:22 PM Amit Kapila wrote: > > I have modified the commit message as proposed above and additionally > added comments in nodeLimit.c. I think we should move ahead with this > bug-fix patch. If we don't like the comment, it can anyway be > improved later. > > Any

Re: SegFault on 9.6.14

2019-11-18 Thread Amit Kapila
On Fri, Oct 18, 2019 at 10:08 AM Amit Kapila wrote: > > On Thu, Oct 17, 2019 at 10:51 AM Thomas Munro wrote: > > > > === > > Don't shut down Gather[Merge] early under Limit. > > > > Revert part of commit 19df1702f5. > > > > Early shutdown was added by that commit so that we could collect > >

Re: SegFault on 9.6.14

2019-10-17 Thread Amit Kapila
On Thu, Oct 17, 2019 at 10:51 AM Thomas Munro wrote: > > On Fri, Sep 13, 2019 at 1:35 AM Robert Haas wrote: > > On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila wrote: > > > Robert, Thomas, do you have any more suggestions related to this. I > > > am planning to commit the above-discussed patch

Re: SegFault on 9.6.14

2019-10-16 Thread Thomas Munro
On Fri, Sep 13, 2019 at 1:35 AM Robert Haas wrote: > On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila wrote: > > Robert, Thomas, do you have any more suggestions related to this. I > > am planning to commit the above-discussed patch (Forbid Limit node to > > shutdown resources.) coming Monday, so

Re: SegFault on 9.6.14

2019-09-12 Thread Robert Haas
On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila wrote: > Robert, Thomas, do you have any more suggestions related to this. I > am planning to commit the above-discussed patch (Forbid Limit node to > shutdown resources.) coming Monday, so that at least the reported > problem got fixed. I think that

Re: SegFault on 9.6.14

2019-09-12 Thread Amit Kapila
On Thu, Sep 5, 2019 at 7:53 PM Amit Kapila wrote: > > On Mon, Sep 2, 2019 at 4:51 PM Amit Kapila wrote: > > > > On Fri, Aug 9, 2019 at 6:29 PM Robert Haas wrote: > > > > > > > > > But beyond that, the issue here is that the Limit node is shutting > > > down the Gather node too early, and the

Re: SegFault on 9.6.14

2019-09-05 Thread Amit Kapila
On Mon, Sep 2, 2019 at 4:51 PM Amit Kapila wrote: > > On Fri, Aug 9, 2019 at 6:29 PM Robert Haas wrote: > > > > > > But beyond that, the issue here is that the Limit node is shutting > > down the Gather node too early, and the right fix must be to stop > > doing that, not to change the

Re: SegFault on 9.6.14

2019-09-02 Thread Amit Kapila
On Fri, Aug 9, 2019 at 6:29 PM Robert Haas wrote: > > On Wed, Aug 7, 2019 at 5:45 AM vignesh C wrote: > > I have made a patch based on the above lines. > > I have tested the scenarios which Thomas had shared in the earlier > > mail and few more tests based on Thomas's tests. > > I'm not sure if

Re: SegFault on 9.6.14

2019-08-13 Thread Tom Lane
Amit Kapila writes: > Another point which I am wondering is why can't we use the existing > REWIND flag to solve the current issue, basically if we have access to > that information in nodeLimit.c (ExecLimit), then can't we just pass > down that to ExecShutdownNode? The existing REWIND flag

Re: SegFault on 9.6.14

2019-08-13 Thread Amit Kapila
On Tue, Aug 13, 2019 at 9:28 PM Tom Lane wrote: > > Amit Kapila writes: > > On Tue, Aug 13, 2019 at 3:18 AM Tom Lane wrote: > >> To clarify my position --- I think it's definitely possible to improve > >> the situation a great deal. We "just" have to pass down more information > >> about

Re: SegFault on 9.6.14

2019-08-13 Thread Tom Lane
Amit Kapila writes: > On Tue, Aug 13, 2019 at 3:18 AM Tom Lane wrote: >> To clarify my position --- I think it's definitely possible to improve >> the situation a great deal. We "just" have to pass down more information >> about whether rescans are possible. > Right, you have speculated above

Re: SegFault on 9.6.14

2019-08-12 Thread Amit Kapila
On Tue, Aug 13, 2019 at 3:18 AM Tom Lane wrote: > > Robert Haas writes: > > Being able to do that sort of thing was one of my goals in designing > > the ExecShutdownNode stuff. Unfortunately, it's clear from this bug > > report that it's still a few bricks short of a load, and Tom doesn't > >

Re: SegFault on 9.6.14

2019-08-12 Thread Robert Haas
On Mon, Aug 12, 2019 at 5:48 PM Tom Lane wrote: > To clarify my position --- I think it's definitely possible to improve > the situation a great deal. We "just" have to pass down more information > about whether rescans are possible. What I don't believe is that that > leads to a bug fix that

Re: SegFault on 9.6.14

2019-08-12 Thread Tom Lane
Robert Haas writes: > Being able to do that sort of thing was one of my goals in designing > the ExecShutdownNode stuff. Unfortunately, it's clear from this bug > report that it's still a few bricks short of a load, and Tom doesn't > seem real optimistic about how easy it will be to buy those

Re: SegFault on 9.6.14

2019-08-12 Thread Thomas Munro
On Tue, Aug 13, 2019 at 7:07 AM Alvaro Herrera wrote: > On 2019-Aug-12, Thomas Munro wrote: > > That's possibly relevant because it means we'd have a ParallelContext > > or some new overarching object that has a lifetime that is longer than > > the individual Gather nodes' processes and

Re: SegFault on 9.6.14

2019-08-12 Thread Robert Haas
On Mon, Aug 12, 2019 at 3:07 PM Alvaro Herrera wrote: > How likely is it that we would ever be able to release memory from a > Sort (or, say, a hashjoin hash table) when it's done being read, but > before completing the whole plan? As I understand, right now we hold > onto a lot of memory after

Re: SegFault on 9.6.14

2019-08-12 Thread Alvaro Herrera
On 2019-Aug-12, Thomas Munro wrote: > That's possibly relevant because it means we'd have a ParallelContext > or some new overarching object that has a lifetime that is longer than > the individual Gather nodes' processes and instrumentation data. I'm > not saying we need to discuss any details

Re: SegFault on 9.6.14

2019-08-11 Thread Thomas Munro
On Sat, Aug 10, 2019 at 12:59 AM Robert Haas wrote: > But beyond that, the issue here is that the Limit node is shutting > down the Gather node too early, and the right fix must be to stop > doing that, not to change the definition of what it means to shut down > a node, as this patch does. So

Re: SegFault on 9.6.14

2019-08-09 Thread Robert Haas
On Wed, Aug 7, 2019 at 5:45 AM vignesh C wrote: > I have made a patch based on the above lines. > I have tested the scenarios which Thomas had shared in the earlier > mail and few more tests based on Thomas's tests. > I'm not sure if we will be going ahead with this solution or not. > Let me know

Re: SegFault on 9.6.14

2019-08-07 Thread Amit Kapila
On Wed, Aug 7, 2019 at 3:15 PM vignesh C wrote: > > On Wed, Jul 31, 2019 at 9:37 AM Amit Kapila wrote: > > > > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas wrote: > > > > The other idea we had discussed which comes closer to adopting Tom's > > position was that during ExecShutdownNode, we just

Re: SegFault on 9.6.14

2019-08-07 Thread vignesh C
On Wed, Jul 31, 2019 at 9:37 AM Amit Kapila wrote: > > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas wrote: > > > > On Thu, Jul 18, 2019 at 9:45 AM Tom Lane wrote: > > > I think this is going in the wrong direction. Nodes should *always* > > > assume that a rescan is possible until ExecEndNode

Re: SegFault on 9.6.14

2019-07-30 Thread Tom Lane
Amit Kapila writes: > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas wrote: >> The other option is to do >> what I understand Amit and Thomas to be proposing, which is to do a >> better job identifying the case where we're "done for good" and can >> trigger the shutdown fearlessly. > Yes, this

Re: SegFault on 9.6.14

2019-07-30 Thread Amit Kapila
On Wed, Jul 31, 2019 at 12:05 AM Robert Haas wrote: > > On Thu, Jul 18, 2019 at 9:45 AM Tom Lane wrote: > > I think this is going in the wrong direction. Nodes should *always* > > assume that a rescan is possible until ExecEndNode is called. > > If you want to do otherwise, you are going to be

Re: SegFault on 9.6.14

2019-07-30 Thread Robert Haas
On Thu, Jul 18, 2019 at 9:45 AM Tom Lane wrote: > I think this is going in the wrong direction. Nodes should *always* > assume that a rescan is possible until ExecEndNode is called. > If you want to do otherwise, you are going to be inventing a whole > bunch of complicated and

Re: SegFault on 9.6.14

2019-07-26 Thread Amit Kapila
On Sat, Jul 27, 2019 at 8:29 AM Thomas Munro wrote: > > On Fri, Jul 26, 2019 at 4:13 PM Amit Kapila wrote: > > On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila wrote: > > > Right, that will be lesser code churn and it can also work. However, > > > one thing that needs some thought is till now

Re: SegFault on 9.6.14

2019-07-26 Thread Thomas Munro
On Fri, Jul 26, 2019 at 4:13 PM Amit Kapila wrote: > On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila wrote: > > Right, that will be lesser code churn and it can also work. However, > > one thing that needs some thought is till now es_top_eflags is only > > set in ExecutorStart and same is mentioned

Re: SegFault on 9.6.14

2019-07-25 Thread Amit Kapila
On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila wrote: > > On Tue, Jul 23, 2019 at 9:11 AM Thomas Munro wrote: > > > > > Another idea from the band-aid-solutions-that-are-easy-to-back-patch > > department: in ExecutePlan() where we call ExecShutdownNode(), we > > could write EXEC_FLAG_DONE into

Re: SegFault on 9.6.14

2019-07-23 Thread Amit Kapila
On Tue, Jul 23, 2019 at 9:11 AM Thomas Munro wrote: > > On Fri, Jul 19, 2019 at 3:00 PM Amit Kapila wrote: > > I am thinking that why not we remove the part of destroying the > > parallel context (and shared memory) from ExecShutdownGather (and > > ExecShutdownGatherMerge) and then do it at the

Re: SegFault on 9.6.14

2019-07-22 Thread Thomas Munro
On Fri, Jul 19, 2019 at 3:00 PM Amit Kapila wrote: > On Thu, Jul 18, 2019 at 7:15 PM Tom Lane wrote: > > Thomas Munro writes: > > > Hmm, so something like a new argument "bool final" added to the > > > ExecXXXShutdown() functions, which receives false in this case to tell > > > it that there

Re: SegFault on 9.6.14

2019-07-18 Thread Amit Kapila
On Thu, Jul 18, 2019 at 7:15 PM Tom Lane wrote: > > Thomas Munro writes: > > Hmm, so something like a new argument "bool final" added to the > > ExecXXXShutdown() functions, which receives false in this case to tell > > it that there could be a rescan so keep the parallel context around. > > I

Re: SegFault on 9.6.14

2019-07-18 Thread Tom Lane
Thomas Munro writes: > Hmm, so something like a new argument "bool final" added to the > ExecXXXShutdown() functions, which receives false in this case to tell > it that there could be a rescan so keep the parallel context around. I think this is going in the wrong direction. Nodes should

Re: SegFault on 9.6.14

2019-07-18 Thread Thomas Munro
On Thu, Jul 18, 2019 at 6:40 PM Amit Kapila wrote: > On Wed, Jul 17, 2019 at 4:10 PM Amit Kapila wrote: > > Yeah, that is a problem. Actually, what we need here is to > > wait-for-workers-to-finish and collect all the instrumentation > > information. We don't need to destroy the shared memory

Re: SegFault on 9.6.14

2019-07-18 Thread Amit Kapila
On Wed, Jul 17, 2019 at 4:10 PM Amit Kapila wrote: > > On Wed, Jul 17, 2019 at 6:28 AM Thomas Munro wrote: > > > > On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro > > wrote: > > > > #11 0x55666e0359df in ExecShutdownNode > > > > (node=node@entry=0x55667033a6c8) > > > > at > > > >

Re: SegFault on 9.6.14

2019-07-17 Thread Amit Kapila
On Wed, Jul 17, 2019 at 6:28 AM Thomas Munro wrote: > > On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro wrote: > > > #11 0x55666e0359df in ExecShutdownNode > > > (node=node@entry=0x55667033a6c8) > > > at > > >

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Wed, Jul 17, 2019 at 12:57 PM Thomas Munro wrote: > On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro wrote: > > > #11 0x55666e0359df in ExecShutdownNode > > > (node=node@entry=0x55667033a6c8) > > > at > > >

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro wrote: > > #11 0x55666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) > > at > > /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 > > #12 0x55666e04d0ff in ExecLimit

Re: SegFault on 9.6.14

2019-07-16 Thread Jerry Sievers
Thomas Munro writes: > On Wed, Jul 17, 2019 at 12:26 PM Jerry Sievers wrote: > >> Is this the right sequencing? >> >> 1. Start client and get backend pid >> 2. GDB; handle SIGUSR1, break, cont >> 3. Run query >> 4. bt > > Perfect, thanks. I think I just spotted something: Dig that! Great

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Wed, Jul 17, 2019 at 12:26 PM Jerry Sievers wrote: > Is this the right sequencing? > > 1. Start client and get backend pid > 2. GDB; handle SIGUSR1, break, cont > 3. Run query > 4. bt Perfect, thanks. I think I just spotted something: > #11 0x55666e0359df in ExecShutdownNode

Re: SegFault on 9.6.14

2019-07-16 Thread Jerry Sievers
Thomas Munro writes: > On Wed, Jul 17, 2019 at 12:05 PM Jerry Sievers wrote: > >> Program received signal SIGUSR1, User defined signal 1. > > Oh, we need to ignore those pesky signals with "handle SIGUSR1 noprint > nostop". Is this the right sequencing? 1. Start client and get backend pid 2.

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Wed, Jul 17, 2019 at 12:05 PM Jerry Sievers wrote: > Program received signal SIGUSR1, User defined signal 1. Oh, we need to ignore those pesky signals with "handle SIGUSR1 noprint nostop". -- Thomas Munro https://enterprisedb.com

Re: SegFault on 9.6.14

2019-07-16 Thread Jerry Sievers
Thomas Munro writes: > On Wed, Jul 17, 2019 at 11:33 AM Jerry Sievers wrote: > >> -> Nested Loop Left Join (cost=251621.81..12300177.37 rows=48 >> width=44) >>-> Gather (cost=1001.55..270403.27 rows=48 width=40) > >>-> Limit

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Wed, Jul 17, 2019 at 11:33 AM Jerry Sievers wrote: > -> Nested Loop Left Join (cost=251621.81..12300177.37 rows=48 > width=44) >-> Gather (cost=1001.55..270403.27 rows=48 width=40) >-> Limit (cost=250620.25..250620.27 rows=1 width=20) >

Re: SegFault on 9.6.14

2019-07-16 Thread Jerry Sievers
Thomas Munro writes: > On Wed, Jul 17, 2019 at 11:06 AM Jerry Sievers wrote: > >> (gdb) p *scan->rs_parallel >> Cannot access memory at address 0x7fa673a54108 > > So I guess one question is: was it a valid address that's been > unexpectedly unmapped, or is the pointer corrupted? Any chance you

Re: SegFault on 9.6.14

2019-07-16 Thread Jerry Sievers
Thomas Munro writes: > On Wed, Jul 17, 2019 at 11:06 AM Jerry Sievers wrote: > >> (gdb) p *scan->rs_parallel >> Cannot access memory at address 0x7fa673a54108 > > So I guess one question is: was it a valid address that's been > unexpectedly unmapped, or is the pointer corrupted? Any chance you

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Wed, Jul 17, 2019 at 11:11 AM Thomas Munro wrote: > map, unmap mmap, munmap -- Thomas Munro https://enterprisedb.com

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Wed, Jul 17, 2019 at 11:06 AM Jerry Sievers wrote: > (gdb) p *scan->rs_parallel > Cannot access memory at address 0x7fa673a54108 So I guess one question is: was it a valid address that's been unexpectedly unmapped, or is the pointer corrupted? Any chance you can strace the backend and pull

Re: SegFault on 9.6.14

2019-07-16 Thread Jerry Sievers
Tomas Vondra writes: > On Mon, Jul 15, 2019 at 08:20:00PM -0500, Jerry Sievers wrote: > >>Tomas Vondra writes: >> >>> On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: >>> Tomas Vondra writes: > On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: >

Re: SegFault on 9.6.14

2019-07-16 Thread Thomas Munro
On Tue, Jul 16, 2019 at 8:22 PM Tomas Vondra wrote: > On Mon, Jul 15, 2019 at 08:20:00PM -0500, Jerry Sievers wrote: > >We have a reproduceable case of $subject that issues a backtrace such as > >seen below. > >#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, >

Re: SegFault on 9.6.14

2019-07-16 Thread Tomas Vondra
On Mon, Jul 15, 2019 at 08:20:00PM -0500, Jerry Sievers wrote: Tomas Vondra writes: On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: Tomas Vondra writes: On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: Greetings Hackers. We have a reproduceable case of

Re: SegFault on 9.6.14

2019-07-15 Thread Jerry Sievers
Tomas Vondra writes: > On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: > >>Tomas Vondra writes: >> >>> On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: >>> Greetings Hackers. We have a reproduceable case of $subject that issues a backtrace such as

Re: SegFault on 9.6.14

2019-07-15 Thread Tomas Vondra
On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: Tomas Vondra writes: On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: Greetings Hackers. We have a reproduceable case of $subject that issues a backtrace such as seen below. The query that I'd prefer to sanitize

Re: SegFault on 9.6.14

2019-07-15 Thread Jerry Sievers
Tomas Vondra writes: > On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: > >>Greetings Hackers. >> >>We have a reproduceable case of $subject that issues a backtrace such as >>seen below. >> >>The query that I'd prefer to sanitize before sending is <30 lines of at >>a glance, not

Re: SegFault on 9.6.14

2019-07-15 Thread Tomas Vondra
On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: Greetings Hackers. We have a reproduceable case of $subject that issues a backtrace such as seen below. The query that I'd prefer to sanitize before sending is <30 lines of at a glance, not terribly complex logic. It nonetheless