Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-16 Thread Robert Haas
On Tue, Aug 16, 2016 at 1:05 AM, Rushabh Lathia
 wrote:
> I agree, this make sense.
>
> Here is the patch to allocate worker instrumentation into same context
> as the regular instrumentation which is per-query context.

Looks good, committed.  I am not sure it was a very good idea for
af33039317ddc4a0e38a02e2255c2bf453115fd2 by Tom Lane to change the
current memory context for the entire execution of gather_readnext();
this might not be the only or the last bug that results from that
decision.  However, I don't really want to get an argument about that
right now, and this at least fixes the problem we know about.  Thanks
for the report and patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Rushabh Lathia
On Mon, Aug 15, 2016 at 6:02 PM, Robert Haas  wrote:

> On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila 
> wrote:
> > AFAICS, your patch seems to be the right fix for this issue, unless we
> > need the instrumentation information during execution (other than for
> > explain) for some purpose.
>
> Hmm, I disagree.  It should be the job of
> ExecParallelRetrieveInstrumentation to allocate its data in the
> correct context, not the responsibility of nodeGather.c to work around
> the fact that it doesn't.  The worker instrumentation should be
> allocated in the same context as the regular instrumentation
> information, which I assume is probably the per-query context.
>

I agree, this make sense.

Here is the patch to allocate worker instrumentation into same context
as the regular instrumentation which is per-query context.

PFA patch.


--
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 380d743..5aa6f02 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -500,6 +500,7 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	int			n;
 	int			ibytes;
 	int			plan_node_id = planstate->plan->plan_node_id;
+	MemoryContext oldcontext;
 
 	/* Find the instumentation for this node. */
 	for (i = 0; i < instrumentation->num_plan_nodes; ++i)
@@ -514,10 +515,19 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	for (n = 0; n < instrumentation->num_workers; ++n)
 		InstrAggNode(planstate->instrument, [n]);
 
-	/* Also store the per-worker detail. */
+	/*
+	 * Also store the per-worker detail.
+	 *
+	 * Worker instrumentation should be allocated in the same context as
+	 * the regular instrumentation information, which is the per-query
+	 * context. Switch into per-query memory context.
+	 */
+	oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
 	ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
 	planstate->worker_instrument =
 		palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
+	MemoryContextSwitchTo(oldcontext);
+
 	planstate->worker_instrument->num_workers = instrumentation->num_workers;
 	memcpy(>worker_instrument->instrument, instrument, ibytes);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Robert Haas
On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila  wrote:
> AFAICS, your patch seems to be the right fix for this issue, unless we
> need the instrumentation information during execution (other than for
> explain) for some purpose.

Hmm, I disagree.  It should be the job of
ExecParallelRetrieveInstrumentation to allocate its data in the
correct context, not the responsibility of nodeGather.c to work around
the fact that it doesn't.  The worker instrumentation should be
allocated in the same context as the regular instrumentation
information, which I assume is probably the per-query context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-13 Thread Amit Kapila
On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathia
 wrote:
> Hi All,
>
> Recently while running tpc-h queries on postgresql master branch, I am
> noticed
> random server crash. Most of the time server crash coming while turn tpch
> query
> number 3 - (but its very random).
>
>
> Here its clear that work_instrument is either corrupted or Un-inililized
> that is the
> reason its ending up with server crash.
>
> With bit more debugging and looked at git history I found that issue started
> coming
> with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext()
> calls
> ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers()
> calls ExecParallelFinish() which collects the instrumentation before marking
> ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the
> allocation
> of planstate->worker_instrument.
>
> With commit af33039317 now we calling the gather_readnext() with per-tuple
> context,
> but with nreader == 0 with ExecShutdownGatherWorkers() we end up with
> allocation
> of planstate->worker_instrument into per-tuple context - which is wrong.
>
> Now fix can be:
>
> 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and
> let
> ExecEndGather() do that things.
>

I don't think we can wait till ExecEndGather() to collect statistics,
as we need it before that for explain path.  However, we do call
ExecShutdownNode() from ExecutePlan() when there are no more tuples
which can take care of ensuring the shutdown of Gather node.   I think
the advantage of calling it in
gather_readnext() is that it will resources to be released early and
populating the instrumentation/statistics as early as possible.

> But with this change, gather_readread() and
> gather_getnext() depend on planstate->reader structure to continue reading
> tuple.
> Now either we can change those condition to be depend on planstate->nreaders
> or
> just pfree(planstate->reader) into gather_readnext() instead of calling
> ExecShutdownGatherWorkers().
>
>
> Attaching patch, which fix the issue with approach 1).
>

AFAICS, your patch seems to be the right fix for this issue, unless we
need the instrumentation information during execution (other than for
explain) for some purpose.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-12 Thread Rushabh Lathia
Hi All,

Recently while running tpc-h queries on postgresql master branch, I am
noticed
random server crash. Most of the time server crash coming while turn tpch
query
number 3 - (but its very random).

Call Stack of server crash:

(gdb) bt
#0  0x102aa9ac in ExplainNode (planstate=0x1001a7471d8,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1516
#1  0x102aaeb8 in ExplainNode (planstate=0x1001a746b50,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#2  0x102aaeb8 in ExplainNode (planstate=0x1001a7468b8,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#3  0x102aaeb8 in ExplainNode (planstate=0x1001a745e48,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#4  0x102aaeb8 in ExplainNode (planstate=0x1001a745bb0,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#5  0x102aaeb8 in ExplainNode (planstate=0x1001a745870,
ancestors=0x1001a745348, relationship=0x0, plan_name=0x0, es=0x1001a65ad48)
at explain.c:1599
#6  0x102a81d8 in ExplainPrintPlan (es=0x1001a65ad48,
queryDesc=0x1001a7442a0) at explain.c:599
#7  0x102a7e20 in ExplainOnePlan (plannedstmt=0x1001a744208,
into=0x0, es=0x1001a65ad48,
queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose)
select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as
revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment
= 'BUILD"..., params=0x0, planduration=0x3fffe17c6488) at explain.c:515
#8  0x102a7968 in ExplainOneQuery (query=0x1001a65ab98, into=0x0,
es=0x1001a65ad48,
queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose)
select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as
revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment
= 'BUILD"..., params=0x0) at explain.c:357
#9  0x102a74b8 in ExplainQuery (stmt=0x1001a6fe468,
queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose)
select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as
revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment
= 'BUILD"..., params=0x0, dest=0x1001a65acb0) at explain.c:245

(gdb) p *w
$2 = {num_workers = 2139062143, instrument = 0x1001af8a8d0}
(gdb) p planstate
$3 = (PlanState *) 0x1001a7471d8
(gdb) p *planstate
$4 = {type = T_HashJoinState, plan = 0x1001a740730, state = 0x1001a745758,
instrument = 0x1001a7514a8, worker_instrument = 0x1001af8a8c8,
  targetlist = 0x1001a747448, qual = 0x0, lefttree = 0x1001a74a0e8,
righttree = 0x1001a74f2f8, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0,
  ps_ResultTupleSlot = 0x1001a750c10, ps_ExprContext = 0x1001a7472f0,
ps_ProjInfo = 0x1001a751220, ps_TupFromTlist = 0 '\000'}
(gdb) p *planstate->worker_instrument
$5 = {num_workers = 2139062143, instrument = 0x1001af8a8d0}
(gdb) p n
$6 = 13055
(gdb) p n
$7 = 13055

Here its clear that work_instrument is either corrupted or Un-inililized
that is the
reason its ending up with server crash.

With bit more debugging and looked at git history I found that issue
started coming
with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext()
calls
ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers()
calls ExecParallelFinish() which collects the instrumentation before
marking
ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do
the allocation
of planstate->worker_instrument.

With commit af33039317 now we calling the gather_readnext() with per-tuple
context,
but with nreader == 0 with ExecShutdownGatherWorkers() we end up with
allocation
of planstate->worker_instrument into per-tuple context - which is wrong.

Now fix can be:

1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and
let
ExecEndGather() do that things. But with this change, gather_readread() and
gather_getnext() depend on planstate->reader structure to continue reading
tuple.
Now either we can change those condition to be depend on
planstate->nreaders or
just pfree(planstate->reader) into gather_readnext() instead of calling
ExecShutdownGatherWorkers().

2) Teach ExecParallelRetrieveInstrumentation() to do allocation of
planstate->worker_instrument into proper memory context.

Attaching patch, which fix the issue with approach 1).


Regards.

Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 438d1b2..bc56f0e 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -348,7 +348,8 @@ gather_readnext(GatherState *gatherstate)
 			--gatherstate->nreaders;
 			if