Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-25 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Nov 25, 2016 at 10:33 AM, Tom Lane  wrote:
>> The impression I got in poking at this for a few minutes, before
>> going off to stuff myself with turkey, was that we were allowing
>> a SubqueryScanPath to mark itself as parallel-safe even though the
>> resulting plan node would have an InitPlan attached.  So my thought
>> about fixing it was along the lines of if-subroot-contains-initplans-
>> then-dont-let-SubqueryScanPath-be-parallel-safe.

> I think this will work for the reported case, see the patch attached.

After further thought, it seems to me that the fundamental error here
is that we're not accounting for initPlans while marking paths as
parallel safe or not.  They should be correctly marked when they come
out of subquery_planner, rather than expecting callers to know that
they can't trust that marking.  That way, we don't need to do anything
special in create_subqueryscan_path, and we can also get rid of that
kluge in the force_parallel_mode logic.

> However, don't we need to worry about if there is a subplan
> (non-initplan) instead of initplan.

I don't think so.  References to subplans are already known to be parallel
restricted.  The issue that we're fixing here is that if a plan node has
initPlans attached, ExecutorStart will try to access those subplans,
whether or not they actually get used while running.  That's why the plan
node itself has to be marked parallel-unsafe so it won't get shipped to
parallel workers.

>> There's another issue here, which is that the InitPlan is derived from an
>> outer join qual whose outer join seems to have been deleted entirely by
>> analyzejoins.c.  Up to now it was possible to avert our eyes from the fact
>> that join removal is lazy about getting rid of unused InitPlans, but if
>> they're going to disable parallelization of the surrounding query, maybe
>> we need to work harder on that.

> Yeah, that makes sense, but not sure whether we should try it along
> with this patch.

I'm not certain that sublinks in deletable outer join quals is a case
important enough to sweat over.  But this is the first time I've realized
that failure to remove those initPlans is capable of making a plan worse.
So it's something to keep an eye on.  (I still wish that we could make
join removal happen during join tree preprocessing; doing it where it is
is nothing but a kluge, with unpleasant side effects like this one.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-25 Thread Amit Kapila
On Fri, Nov 25, 2016 at 10:33 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich  
>> wrote:
>>> just caught another InitPlan below Gather with the recent patches in
>>> (master as of 4cc6a3f).  Recipe below.
>
>> I think this problem exists since commit
>> 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed
>> subqueries to be pushed to parallel workers.
>
> The impression I got in poking at this for a few minutes, before
> going off to stuff myself with turkey, was that we were allowing
> a SubqueryScanPath to mark itself as parallel-safe even though the
> resulting plan node would have an InitPlan attached.  So my thought
> about fixing it was along the lines of if-subroot-contains-initplans-
> then-dont-let-SubqueryScanPath-be-parallel-safe.
>

I think this will work for the reported case, see the patch attached.
However, don't we need to worry about if there is a subplan
(non-initplan) instead of initplan.  I have tried by tweaking the
above query such that it will generate a subplan and for such a case
it will make SubqueryScanPath as parallel-safe.

explain select 1 from public.quad_point_tbl as ref_0, lateral (select
ref_0.p as c3, sample_0.d as c5 from public.nv_child_2010 as sample_0
left join public.mvtest_tvv as ref_1 on ('x' = ANY (select contype
from pg_catalog.pg_constraint limit 1)) limit 82) as subq_0;

>  What you propose
> here seems like it would shut off parallel query in more cases than
> that.  But I'm still full of turkey and may be missing something.
>
> There's another issue here, which is that the InitPlan is derived from an
> outer join qual whose outer join seems to have been deleted entirely by
> analyzejoins.c.  Up to now it was possible to avert our eyes from the fact
> that join removal is lazy about getting rid of unused InitPlans, but if
> they're going to disable parallelization of the surrounding query, maybe
> we need to work harder on that.
>

Yeah, that makes sense, but not sure whether we should try it along
with this patch.

> Another question worth asking is whether it was okay for the subquery to
> decide to use a Gather.  Are we OK with multiple Gathers per plan tree,
> independently of the points above?
>

As of now, we don't support nested Gather case.  Example:

Not Okay Plan
-
-> Gather
 -> Nested Loop
 -> Parallel Seq Scan
 -> Gather
  -> Parallel Seq Scan

Okay Plan
-
-> Nested Loop
 -> Gather
  -> Parallel Seq Scan
 -> Gather
  -> Parallel Seq Scan



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


allow_safe_subquery_parallel_worker_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-24 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich  
> wrote:
>> just caught another InitPlan below Gather with the recent patches in
>> (master as of 4cc6a3f).  Recipe below.

> I think this problem exists since commit
> 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed
> subqueries to be pushed to parallel workers.

The impression I got in poking at this for a few minutes, before
going off to stuff myself with turkey, was that we were allowing
a SubqueryScanPath to mark itself as parallel-safe even though the
resulting plan node would have an InitPlan attached.  So my thought
about fixing it was along the lines of if-subroot-contains-initplans-
then-dont-let-SubqueryScanPath-be-parallel-safe.  What you propose
here seems like it would shut off parallel query in more cases than
that.  But I'm still full of turkey and may be missing something.

There's another issue here, which is that the InitPlan is derived from an
outer join qual whose outer join seems to have been deleted entirely by
analyzejoins.c.  Up to now it was possible to avert our eyes from the fact
that join removal is lazy about getting rid of unused InitPlans, but if
they're going to disable parallelization of the surrounding query, maybe
we need to work harder on that.

Another question worth asking is whether it was okay for the subquery to
decide to use a Gather.  Are we OK with multiple Gathers per plan tree,
independently of the points above?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-24 Thread Amit Kapila
On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich  wrote:
> Hi,
>
> just caught another InitPlan below Gather with the recent patches in
> (master as of 4cc6a3f).  Recipe below.
>

I think this problem exists since commit
110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed
subqueries to be pushed to parallel workers.  I think we should
consider rel (where rtekind is RTE_SUBQUERY) to be parallel safe if
the subquery is also parallel safe.   Attached patch does that and
fixes the reported problem for me.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


allow_safe_subquery_parallel_worker_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-24 Thread Andreas Seltenreich
Hi,

just caught another InitPlan below Gather with the recent patches in
(master as of 4cc6a3f).  Recipe below.

regards,
andreas

set max_parallel_workers_per_gather = 2;
set min_parallel_relation_size = 0;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;

explain select 1 from
   public.quad_point_tbl as ref_0,
   lateral (select
 ref_0.p as c3,
 sample_0.d as c5
   from
 public.nv_child_2010 as sample_0
   left join public.mvtest_tvv as ref_1
   on ('x'< (select contype from pg_catalog.pg_constraint limit 1))
   limit 82) as subq_0;

--QUERY PLAN
-- 

--  Gather  (cost=0.19..13727.52 rows=902246 width=4)
--Workers Planned: 2
--->  Nested Loop  (cost=0.19..13727.52 rows=902246 width=4)
--  ->  Parallel Seq Scan on quad_point_tbl ref_0  (cost=0.00..105.85 
rows=4585 width=16)
--  ->  Limit  (cost=0.19..1.33 rows=82 width=20)
--InitPlan 1 (returns $0)
--  ->  Limit  (cost=0.00..0.19 rows=1 width=1)
--->  Gather  (cost=0.00..10.22 rows=54 width=1)
--  Workers Planned: 2
--  ->  Parallel Seq Scan on pg_constraint  
(cost=0.00..10.22 rows=22 width=1)
--->  Seq Scan on nv_child_2010 sample_0  (cost=0.00..35.50 
rows=2550 width=20)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Amit Kapila
On Mon, Nov 21, 2016 at 9:30 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila  wrote:
>>> Here instead of checking whether top_plan has initPlan, it should
>>> check whether initPlan is present anywhere in plan tree.  I think one
>>> simple way could be to check *glob->subplans* instead of
>>> top_plan->initPlan,
>
>> Patch based on above suggestion is attached with this mail.
>
> I think this is the right fix for the moment, because the proximate cause
> of the crash is that ExecSerializePlan doesn't transmit any part of the
> PlannedStmt.subplans list, which glob->subplans is the preimage of.
>
> Now, maybe I'm missing something, but it seems to me that ordinary
> subplans could be transmitted to workers just fine.  The problem with
> transmitting initplans is that you'd lose the expectation of single
> evaluation.
>

Yes and I think we can handle it such that master backend evaluates
initplans and share the calculated value along with paramid with all
the workers. Workers will, in turn, restore it in
queryDesc->estate->es_param_exec_vals (or some other place where those
can be directly used, I have yet to evaluate on this matter).  I am
working on a patch to parallelize queries containing
initplans/subplans, so I will evaluate your suggestion of passing
subplans (maybe non-InitPlans) in ExecSerializePlan as part of that
patch.  I have yet to figure out what is the best way to share hashed
subplans, do we pass them as it is and let each worker evaluate and
store it's own copy of hash table or shall we try to form the hash
table once in master and then share the same with workers (which could
be tricky) or shall we restrict such queries which contain hashed
subplans based on assumption that it will be costly for each worker to
form the hash table.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila  wrote:
>> Here instead of checking whether top_plan has initPlan, it should
>> check whether initPlan is present anywhere in plan tree.  I think one
>> simple way could be to check *glob->subplans* instead of
>> top_plan->initPlan,

> Patch based on above suggestion is attached with this mail.

I think this is the right fix for the moment, because the proximate cause
of the crash is that ExecSerializePlan doesn't transmit any part of the
PlannedStmt.subplans list, which glob->subplans is the preimage of.

Now, maybe I'm missing something, but it seems to me that ordinary
subplans could be transmitted to workers just fine.  The problem with
transmitting initplans is that you'd lose the expectation of single
evaluation.  (Even there, it might be okay if they're far enough down
in the plan tree, but I haven't thought about it in detail.)  So I'd
rather see us fix ExecSerializePlan to transmit the subplans list
and have some more-restrictive test here.  This code would still be
wrong as it stands though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Amit Kapila
On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila  wrote:
> On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich
>  wrote:
>> Hi,
>>
>> the query below triggers a parallel worker assertion for me when run on
>> the regression database of master as of 0832f2d.  The plan sports a
>> couple of InitPlan nodes below Gather.
>>
>
> I think the reason of this issue is that in some cases where subplan
> is at some node other than top_plan node, we allow them to be executed
> in the worker in force_parallel_mode.  It seems to me that the
> problematic code is below check in standard_planner()
>
> if (force_parallel_mode != FORCE_PARALLEL_OFF &&
> best_path->parallel_safe &&
> top_plan->initPlan == NIL)
>
> Here instead of checking whether top_plan has initPlan, it should
> check whether initPlan is present anywhere in plan tree.  I think one
> simple way could be to check *glob->subplans* instead of
> top_plan->initPlan,
>

Patch based on above suggestion is attached with this mail.

Thoughts?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


restrict_subplans_parallel_mode_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Amit Kapila
On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich
 wrote:
> Hi,
>
> the query below triggers a parallel worker assertion for me when run on
> the regression database of master as of 0832f2d.  The plan sports a
> couple of InitPlan nodes below Gather.
>

I think the reason of this issue is that in some cases where subplan
is at some node other than top_plan node, we allow them to be executed
in the worker in force_parallel_mode.  It seems to me that the
problematic code is below check in standard_planner()

if (force_parallel_mode != FORCE_PARALLEL_OFF &&
best_path->parallel_safe &&
top_plan->initPlan == NIL)

Here instead of checking whether top_plan has initPlan, it should
check whether initPlan is present anywhere in plan tree.  I think one
simple way could be to check *glob->subplans* instead of
top_plan->initPlan, another possibility is to traverse the whole tree
to see if initPlan is present.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-20 Thread Andreas Seltenreich
Hi,

the query below triggers a parallel worker assertion for me when run on
the regression database of master as of 0832f2d.  The plan sports a
couple of InitPlan nodes below Gather.

regards,
Andreas

 Gather  (cost=1.64..84.29 rows=128 width=4)
   Workers Planned: 1
   Single Copy: true
   ->  Limit  (cost=1.64..84.29 rows=128 width=4)
 ->  Subquery Scan on subq_0  (cost=1.64..451.06 rows=696 width=4)
   Filter: (subq_0.c6 IS NOT NULL)
   ->  Nested Loop Left Join  (cost=1.64..444.07 rows=699 width=145)
 Join Filter: (sample_0.aa = sample_1.pageno)
 InitPlan 4 (returns $3)
   ->  Result  (cost=1.21..5.36 rows=15 width=0)
 One-Time Filter: ($0 AND ($1 = $2))
 InitPlan 1 (returns $0)
   ->  Result  (cost=0.00..0.00 rows=0 width=0)
 One-Time Filter: false
 InitPlan 2 (returns $1)
   ->  Limit  (cost=0.35..0.52 rows=1 width=4)
 ->  Gather  (cost=0.00..1.04 rows=6 
width=4)
   Workers Planned: 1
   ->  Parallel Seq Scan on reltime_tbl 
 (cost=0.00..1.04 rows=4 width=4)
 InitPlan 3 (returns $2)
   ->  Limit  (cost=0.52..0.69 rows=1 width=4)
 ->  Gather  (cost=0.00..1.04 rows=6 
width=4)
   Workers Planned: 1
   ->  Parallel Seq Scan on reltime_tbl 
reltime_tbl_1  (cost=0.00..1.04 rows=4 width=4)
 ->  Sample Scan on pg_foreign_data_wrapper 
sample_2  (cost=1.21..5.36 rows=15 width=0)
   Sampling: system ('3.1'::real)
 ->  Nested Loop  (cost=0.15..382.85 rows=699 width=4)
   ->  Sample Scan on pg_largeobject sample_1  
(cost=0.00..209.03 rows=699 width=8)
 Sampling: bernoulli ('2.9'::real)
 Filter: (pageno IS NOT NULL)
   ->  Index Only Scan using 
pg_foreign_table_relid_index on pg_foreign_table ref_0  (cost=0.15..0.24 rows=1 
width=4)
 Index Cond: (ftrelid = sample_1.loid)
 ->  Materialize  (cost=0.00..16.06 rows=4 width=4)
   ->  Append  (cost=0.00..16.04 rows=4 width=4)
 ->  Sample Scan on a sample_0  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)
 ->  Sample Scan on b sample_0_1  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)
 ->  Sample Scan on c sample_0_2  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)
 ->  Sample Scan on d sample_0_3  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)


--8<---cut here---start->8---
set force_parallel_mode to on;
set max_parallel_workers_per_gather to 2;

select
  91 as c0
from
  (select
(select pfname from public.pslot limit 1 offset 3)
   as c0,
ref_1.grandtot as c1,
(select pg_catalog.min(procost) from pg_catalog.pg_proc)
   as c2,
ref_0.ftoptions as c3,
ref_1.grandtot as c4,
sample_1.loid as c5,
pg_catalog.pg_rotate_logfile() as c6,
(select random from public.hash_i4_heap limit 1 offset 5)
   as c7,
sample_1.loid as c8
  from
public.a as sample_0 tablesample system (5)
right join pg_catalog.pg_largeobject as sample_1 tablesample 
bernoulli (2.9)
  inner join pg_catalog.pg_foreign_table as ref_0
  on (sample_1.loid = ref_0.ftrelid )
on (sample_0.aa = sample_1.pageno )
  left join public.mvtest_tvv as ref_1
  on (EXISTS (
  select
  sample_2.fdwoptions as c0,
  sample_2.fdwhandler as c1,
  (select during from public.test_range_excl limit 1 offset 89)
 as c2
from
  pg_catalog.pg_foreign_data_wrapper as sample_2 tablesample 
system (3.1)
where (EXISTS (
select
sample_3.b as c0,
(select grantee from information_schema.udt_privileges 
limit 1 offset 4)
   as c1,
sample_3.b as c2,
sample_3.rf_a as c3,
sample_3.b as c4,