Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Amit Kapilawrites: > On Fri, Nov 25, 2016 at 10:33 AM, Tom Lane wrote: >> The impression I got in poking at this for a few minutes, before >> going off to stuff myself with turkey, was that we were allowing >> a SubqueryScanPath to mark itself as parallel-safe even though the >> resulting plan node would have an InitPlan attached. So my thought >> about fixing it was along the lines of if-subroot-contains-initplans- >> then-dont-let-SubqueryScanPath-be-parallel-safe. > I think this will work for the reported case, see the patch attached. After further thought, it seems to me that the fundamental error here is that we're not accounting for initPlans while marking paths as parallel safe or not. They should be correctly marked when they come out of subquery_planner, rather than expecting callers to know that they can't trust that marking. That way, we don't need to do anything special in create_subqueryscan_path, and we can also get rid of that kluge in the force_parallel_mode logic. > However, don't we need to worry about if there is a subplan > (non-initplan) instead of initplan. I don't think so. References to subplans are already known to be parallel restricted. The issue that we're fixing here is that if a plan node has initPlans attached, ExecutorStart will try to access those subplans, whether or not they actually get used while running. That's why the plan node itself has to be marked parallel-unsafe so it won't get shipped to parallel workers. >> There's another issue here, which is that the InitPlan is derived from an >> outer join qual whose outer join seems to have been deleted entirely by >> analyzejoins.c. Up to now it was possible to avert our eyes from the fact >> that join removal is lazy about getting rid of unused InitPlans, but if >> they're going to disable parallelization of the surrounding query, maybe >> we need to work harder on that. > Yeah, that makes sense, but not sure whether we should try it along > with this patch. I'm not certain that sublinks in deletable outer join quals is a case important enough to sweat over. But this is the first time I've realized that failure to remove those initPlans is capable of making a plan worse. So it's something to keep an eye on. (I still wish that we could make join removal happen during join tree preprocessing; doing it where it is is nothing but a kluge, with unpleasant side effects like this one.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Fri, Nov 25, 2016 at 10:33 AM, Tom Lanewrote: > Amit Kapila writes: >> On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich >> wrote: >>> just caught another InitPlan below Gather with the recent patches in >>> (master as of 4cc6a3f). Recipe below. > >> I think this problem exists since commit >> 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed >> subqueries to be pushed to parallel workers. > > The impression I got in poking at this for a few minutes, before > going off to stuff myself with turkey, was that we were allowing > a SubqueryScanPath to mark itself as parallel-safe even though the > resulting plan node would have an InitPlan attached. So my thought > about fixing it was along the lines of if-subroot-contains-initplans- > then-dont-let-SubqueryScanPath-be-parallel-safe. > I think this will work for the reported case, see the patch attached. However, don't we need to worry about if there is a subplan (non-initplan) instead of initplan. I have tried by tweaking the above query such that it will generate a subplan and for such a case it will make SubqueryScanPath as parallel-safe. explain select 1 from public.quad_point_tbl as ref_0, lateral (select ref_0.p as c3, sample_0.d as c5 from public.nv_child_2010 as sample_0 left join public.mvtest_tvv as ref_1 on ('x' = ANY (select contype from pg_catalog.pg_constraint limit 1)) limit 82) as subq_0; > What you propose > here seems like it would shut off parallel query in more cases than > that. But I'm still full of turkey and may be missing something. > > There's another issue here, which is that the InitPlan is derived from an > outer join qual whose outer join seems to have been deleted entirely by > analyzejoins.c. Up to now it was possible to avert our eyes from the fact > that join removal is lazy about getting rid of unused InitPlans, but if > they're going to disable parallelization of the surrounding query, maybe > we need to work harder on that. > Yeah, that makes sense, but not sure whether we should try it along with this patch. > Another question worth asking is whether it was okay for the subquery to > decide to use a Gather. Are we OK with multiple Gathers per plan tree, > independently of the points above? > As of now, we don't support nested Gather case. Example: Not Okay Plan - -> Gather -> Nested Loop -> Parallel Seq Scan -> Gather -> Parallel Seq Scan Okay Plan - -> Nested Loop -> Gather -> Parallel Seq Scan -> Gather -> Parallel Seq Scan -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com allow_safe_subquery_parallel_worker_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Amit Kapilawrites: > On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich > wrote: >> just caught another InitPlan below Gather with the recent patches in >> (master as of 4cc6a3f). Recipe below. > I think this problem exists since commit > 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed > subqueries to be pushed to parallel workers. The impression I got in poking at this for a few minutes, before going off to stuff myself with turkey, was that we were allowing a SubqueryScanPath to mark itself as parallel-safe even though the resulting plan node would have an InitPlan attached. So my thought about fixing it was along the lines of if-subroot-contains-initplans- then-dont-let-SubqueryScanPath-be-parallel-safe. What you propose here seems like it would shut off parallel query in more cases than that. But I'm still full of turkey and may be missing something. There's another issue here, which is that the InitPlan is derived from an outer join qual whose outer join seems to have been deleted entirely by analyzejoins.c. Up to now it was possible to avert our eyes from the fact that join removal is lazy about getting rid of unused InitPlans, but if they're going to disable parallelization of the surrounding query, maybe we need to work harder on that. Another question worth asking is whether it was okay for the subquery to decide to use a Gather. Are we OK with multiple Gathers per plan tree, independently of the points above? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreichwrote: > Hi, > > just caught another InitPlan below Gather with the recent patches in > (master as of 4cc6a3f). Recipe below. > I think this problem exists since commit 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed subqueries to be pushed to parallel workers. I think we should consider rel (where rtekind is RTE_SUBQUERY) to be parallel safe if the subquery is also parallel safe. Attached patch does that and fixes the reported problem for me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com allow_safe_subquery_parallel_worker_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Hi, just caught another InitPlan below Gather with the recent patches in (master as of 4cc6a3f). Recipe below. regards, andreas set max_parallel_workers_per_gather = 2; set min_parallel_relation_size = 0; set parallel_setup_cost = 0; set parallel_tuple_cost = 0; explain select 1 from public.quad_point_tbl as ref_0, lateral (select ref_0.p as c3, sample_0.d as c5 from public.nv_child_2010 as sample_0 left join public.mvtest_tvv as ref_1 on ('x'< (select contype from pg_catalog.pg_constraint limit 1)) limit 82) as subq_0; --QUERY PLAN -- -- Gather (cost=0.19..13727.52 rows=902246 width=4) --Workers Planned: 2 ---> Nested Loop (cost=0.19..13727.52 rows=902246 width=4) -- -> Parallel Seq Scan on quad_point_tbl ref_0 (cost=0.00..105.85 rows=4585 width=16) -- -> Limit (cost=0.19..1.33 rows=82 width=20) --InitPlan 1 (returns $0) -- -> Limit (cost=0.00..0.19 rows=1 width=1) ---> Gather (cost=0.00..10.22 rows=54 width=1) -- Workers Planned: 2 -- -> Parallel Seq Scan on pg_constraint (cost=0.00..10.22 rows=22 width=1) ---> Seq Scan on nv_child_2010 sample_0 (cost=0.00..35.50 rows=2550 width=20) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Mon, Nov 21, 2016 at 9:30 PM, Tom Lanewrote: > Amit Kapila writes: >> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila wrote: >>> Here instead of checking whether top_plan has initPlan, it should >>> check whether initPlan is present anywhere in plan tree. I think one >>> simple way could be to check *glob->subplans* instead of >>> top_plan->initPlan, > >> Patch based on above suggestion is attached with this mail. > > I think this is the right fix for the moment, because the proximate cause > of the crash is that ExecSerializePlan doesn't transmit any part of the > PlannedStmt.subplans list, which glob->subplans is the preimage of. > > Now, maybe I'm missing something, but it seems to me that ordinary > subplans could be transmitted to workers just fine. The problem with > transmitting initplans is that you'd lose the expectation of single > evaluation. > Yes and I think we can handle it such that master backend evaluates initplans and share the calculated value along with paramid with all the workers. Workers will, in turn, restore it in queryDesc->estate->es_param_exec_vals (or some other place where those can be directly used, I have yet to evaluate on this matter). I am working on a patch to parallelize queries containing initplans/subplans, so I will evaluate your suggestion of passing subplans (maybe non-InitPlans) in ExecSerializePlan as part of that patch. I have yet to figure out what is the best way to share hashed subplans, do we pass them as it is and let each worker evaluate and store it's own copy of hash table or shall we try to form the hash table once in master and then share the same with workers (which could be tricky) or shall we restrict such queries which contain hashed subplans based on assumption that it will be costly for each worker to form the hash table. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Amit Kapilawrites: > On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila wrote: >> Here instead of checking whether top_plan has initPlan, it should >> check whether initPlan is present anywhere in plan tree. I think one >> simple way could be to check *glob->subplans* instead of >> top_plan->initPlan, > Patch based on above suggestion is attached with this mail. I think this is the right fix for the moment, because the proximate cause of the crash is that ExecSerializePlan doesn't transmit any part of the PlannedStmt.subplans list, which glob->subplans is the preimage of. Now, maybe I'm missing something, but it seems to me that ordinary subplans could be transmitted to workers just fine. The problem with transmitting initplans is that you'd lose the expectation of single evaluation. (Even there, it might be okay if they're far enough down in the plan tree, but I haven't thought about it in detail.) So I'd rather see us fix ExecSerializePlan to transmit the subplans list and have some more-restrictive test here. This code would still be wrong as it stands though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapilawrote: > On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich > wrote: >> Hi, >> >> the query below triggers a parallel worker assertion for me when run on >> the regression database of master as of 0832f2d. The plan sports a >> couple of InitPlan nodes below Gather. >> > > I think the reason of this issue is that in some cases where subplan > is at some node other than top_plan node, we allow them to be executed > in the worker in force_parallel_mode. It seems to me that the > problematic code is below check in standard_planner() > > if (force_parallel_mode != FORCE_PARALLEL_OFF && > best_path->parallel_safe && > top_plan->initPlan == NIL) > > Here instead of checking whether top_plan has initPlan, it should > check whether initPlan is present anywhere in plan tree. I think one > simple way could be to check *glob->subplans* instead of > top_plan->initPlan, > Patch based on above suggestion is attached with this mail. Thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com restrict_subplans_parallel_mode_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreichwrote: > Hi, > > the query below triggers a parallel worker assertion for me when run on > the regression database of master as of 0832f2d. The plan sports a > couple of InitPlan nodes below Gather. > I think the reason of this issue is that in some cases where subplan is at some node other than top_plan node, we allow them to be executed in the worker in force_parallel_mode. It seems to me that the problematic code is below check in standard_planner() if (force_parallel_mode != FORCE_PARALLEL_OFF && best_path->parallel_safe && top_plan->initPlan == NIL) Here instead of checking whether top_plan has initPlan, it should check whether initPlan is present anywhere in plan tree. I think one simple way could be to check *glob->subplans* instead of top_plan->initPlan, another possibility is to traverse the whole tree to see if initPlan is present. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Hi, the query below triggers a parallel worker assertion for me when run on the regression database of master as of 0832f2d. The plan sports a couple of InitPlan nodes below Gather. regards, Andreas Gather (cost=1.64..84.29 rows=128 width=4) Workers Planned: 1 Single Copy: true -> Limit (cost=1.64..84.29 rows=128 width=4) -> Subquery Scan on subq_0 (cost=1.64..451.06 rows=696 width=4) Filter: (subq_0.c6 IS NOT NULL) -> Nested Loop Left Join (cost=1.64..444.07 rows=699 width=145) Join Filter: (sample_0.aa = sample_1.pageno) InitPlan 4 (returns $3) -> Result (cost=1.21..5.36 rows=15 width=0) One-Time Filter: ($0 AND ($1 = $2)) InitPlan 1 (returns $0) -> Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false InitPlan 2 (returns $1) -> Limit (cost=0.35..0.52 rows=1 width=4) -> Gather (cost=0.00..1.04 rows=6 width=4) Workers Planned: 1 -> Parallel Seq Scan on reltime_tbl (cost=0.00..1.04 rows=4 width=4) InitPlan 3 (returns $2) -> Limit (cost=0.52..0.69 rows=1 width=4) -> Gather (cost=0.00..1.04 rows=6 width=4) Workers Planned: 1 -> Parallel Seq Scan on reltime_tbl reltime_tbl_1 (cost=0.00..1.04 rows=4 width=4) -> Sample Scan on pg_foreign_data_wrapper sample_2 (cost=1.21..5.36 rows=15 width=0) Sampling: system ('3.1'::real) -> Nested Loop (cost=0.15..382.85 rows=699 width=4) -> Sample Scan on pg_largeobject sample_1 (cost=0.00..209.03 rows=699 width=8) Sampling: bernoulli ('2.9'::real) Filter: (pageno IS NOT NULL) -> Index Only Scan using pg_foreign_table_relid_index on pg_foreign_table ref_0 (cost=0.15..0.24 rows=1 width=4) Index Cond: (ftrelid = sample_1.loid) -> Materialize (cost=0.00..16.06 rows=4 width=4) -> Append (cost=0.00..16.04 rows=4 width=4) -> Sample Scan on a sample_0 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) -> Sample Scan on b sample_0_1 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) -> Sample Scan on c sample_0_2 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) -> Sample Scan on d sample_0_3 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) --8<---cut here---start->8--- set force_parallel_mode to on; set max_parallel_workers_per_gather to 2; select 91 as c0 from (select (select pfname from public.pslot limit 1 offset 3) as c0, ref_1.grandtot as c1, (select pg_catalog.min(procost) from pg_catalog.pg_proc) as c2, ref_0.ftoptions as c3, ref_1.grandtot as c4, sample_1.loid as c5, pg_catalog.pg_rotate_logfile() as c6, (select random from public.hash_i4_heap limit 1 offset 5) as c7, sample_1.loid as c8 from public.a as sample_0 tablesample system (5) right join pg_catalog.pg_largeobject as sample_1 tablesample bernoulli (2.9) inner join pg_catalog.pg_foreign_table as ref_0 on (sample_1.loid = ref_0.ftrelid ) on (sample_0.aa = sample_1.pageno ) left join public.mvtest_tvv as ref_1 on (EXISTS ( select sample_2.fdwoptions as c0, sample_2.fdwhandler as c1, (select during from public.test_range_excl limit 1 offset 89) as c2 from pg_catalog.pg_foreign_data_wrapper as sample_2 tablesample system (3.1) where (EXISTS ( select sample_3.b as c0, (select grantee from information_schema.udt_privileges limit 1 offset 4) as c1, sample_3.b as c2, sample_3.rf_a as c3, sample_3.b as c4,