I've been investigating parallelizing certain correlated subqueries, and during that work stumbled across the fact that set_rel_consider_parallel disallows parallel query on what seems like a fairly simple case.
Consider this query: select t.unique1 from tenk1 t join lateral (select t.unique1 from tenk1 offset 0) l on true; Current set_rel_consider_parallel sets consider_parallel=false on the subquery rel because it has a limit/offset. That restriction makes a lot of sense when we have a subquery whose results conceptually need to be "shared" (or at least be the same) across multiple workers (indeed the relevant comment in that function notes that cases where we could prove a unique ordering would also qualify, but punts on implementing that due to complexity). But if the subquery is LATERAL, then no such conceptual restriction. If we change the code slightly to allow considering parallel query even in the face of LIMIT/OFFSET for LATERAL subqueries, then our query above changes from the following plan: Nested Loop Output: t.unique1 -> Gather Output: t.unique1 Workers Planned: 2 -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t Output: t.unique1 -> Gather Output: NULL::integer Workers Planned: 2 -> Parallel Index Only Scan using tenk1_hundred on public.tenk1 Output: NULL::integer to this plan: Gather Output: t.unique1 Workers Planned: 2 -> Nested Loop Output: t.unique1 -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t Output: t.unique1 -> Index Only Scan using tenk1_hundred on public.tenk1 Output: NULL::integer The code change itself is quite simple (1 line). As far as I can tell we don't need to expressly check parallel safety of the limit/offset expressions; that appears to happen elsewhere (and that makes sense since the RTE_RELATION case doesn't check those clauses either). If I'm missing something about the safety of this (or any other issue), I'd appreciate the feedback. James
From 0aff5f1b5e35e37a311c01e9f53caf6e088e8d43 Mon Sep 17 00:00:00 2001 From: jcoleman <jtc...@gmail.com> Date: Mon, 30 Nov 2020 11:36:35 -0500 Subject: [PATCH v1] Allow parallel LATERAL subqueries with LIMIT/OFFSET The code that determined whether or not a rel should be considered for parallel query excluded subqueries with LIMIT/OFFSET. That's correct in the general case: as the comment notes that'd mean we have to guarantee ordering (and claims it's not worth checking that) for results to be consistent across workers. However there's a simpler case that hasn't been considered: LATERAL subqueries with LIMIT/OFFSET don't fall under the same reasoning since they're executed (when not converted to a JOIN) per tuple anyway, so consistency of results across workers isn't a factor. --- src/backend/optimizer/path/allpaths.c | 4 +++- src/test/regress/expected/select_parallel.out | 15 +++++++++++++++ src/test/regress/sql/select_parallel.sql | 6 ++++++ 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 84a69b064a..3c9313b5a9 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -686,11 +686,13 @@ set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel, * inconsistent results at the top-level. (In some cases, where * the result is ordered, we could relax this restriction. But it * doesn't currently seem worth expending extra effort to do so.) + * LATERAL is an exception: LIMIT/OFFSET is safe to execute within + * workers since the sub-select is executed per tuple */ { Query *subquery = castNode(Query, rte->subquery); - if (limit_needed(subquery)) + if (!rte->lateral && limit_needed(subquery)) return; } break; diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out index 9b0c418db7..9ba40ca2c5 100644 --- a/src/test/regress/expected/select_parallel.out +++ b/src/test/regress/expected/select_parallel.out @@ -1042,6 +1042,21 @@ explain (costs off) Filter: (stringu1 ~~ '%AAAA'::text) (11 rows) +-- ...unless it's LATERAL +savepoint settings; +set parallel_tuple_cost=0; +explain (costs off) select t.unique1 from tenk1 t +join lateral (select t.unique1 from tenk1 offset 0) l on true; + QUERY PLAN +--------------------------------------------------------------------- + Gather + Workers Planned: 4 + -> Nested Loop + -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t + -> Index Only Scan using tenk1_hundred on tenk1 +(5 rows) + +rollback to savepoint settings; -- to increase the parallel query test coverage SAVEPOINT settings; SET LOCAL force_parallel_mode = 1; diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql index 5a01a98b26..5c14b78457 100644 --- a/src/test/regress/sql/select_parallel.sql +++ b/src/test/regress/sql/select_parallel.sql @@ -392,6 +392,12 @@ explain (costs off, verbose) explain (costs off) select * from tenk1 a where two in (select two from tenk1 b where stringu1 like '%AAAA' limit 3); +-- ...unless it's LATERAL +savepoint settings; +set parallel_tuple_cost=0; +explain (costs off) select t.unique1 from tenk1 t +join lateral (select t.unique1 from tenk1 offset 0) l on true; +rollback to savepoint settings; -- to increase the parallel query test coverage SAVEPOINT settings; -- 2.17.1