Re: A reloption for partitioned tables - parallel_workers

2021-09-04 Thread Daniel Gustafsson
> On 4 Sep 2021, at 01:17, Amit Langote  wrote:
> On Sat, Sep 4, 2021 at 3:10 Laurenz Albe  > wrote:
> On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:

> > This thread has stalled and the patch no longer applies.  I propose that we
> > mark this Returned with Feedback, is that Ok with you Amit?
> 
> +1.  That requires more thought.
> 
> Yes, I think so too.

Done that way, it can always be resubmitted in a later CF.

--
Daniel Gustafsson   https://vmware.com/





Re: A reloption for partitioned tables - parallel_workers

2021-09-03 Thread Amit Langote
On Sat, Sep 4, 2021 at 3:10 Laurenz Albe  wrote:

> On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:
> > > On 6 Apr 2021, at 09:46, Amit Langote  wrote:
> > > On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe 
> wrote:
> >
> > > > I don't know if Seamus is still working on that; if not, we might
> > > > mark it as "returned with feedback".
> > >
> > > I have to agree given the time left.
> >
> > This thread has stalled and the patch no longer applies.  I propose that
> we
> > mark this Returned with Feedback, is that Ok with you Amit?
>
> +1.  That requires more thought.


Yes, I think so too.
-- 
Amit Langote
EDB: http://www.enterprisedb.com


Re: A reloption for partitioned tables - parallel_workers

2021-09-03 Thread Laurenz Albe
On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:
> > On 6 Apr 2021, at 09:46, Amit Langote  wrote:
> > On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe  
> > wrote:
> 
> > > I don't know if Seamus is still working on that; if not, we might
> > > mark it as "returned with feedback".
> > 
> > I have to agree given the time left.
> 
> This thread has stalled and the patch no longer applies.  I propose that we
> mark this Returned with Feedback, is that Ok with you Amit?

+1.  That requires more thought.

Yours,
Laurenz Albe





Re: A reloption for partitioned tables - parallel_workers

2021-09-03 Thread Daniel Gustafsson
> On 6 Apr 2021, at 09:46, Amit Langote  wrote:
> On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe  wrote:

>> I don't know if Seamus is still working on that; if not, we might
>> mark it as "returned with feedback".
> 
> I have to agree given the time left.

This thread has stalled and the patch no longer applies.  I propose that we
mark this Returned with Feedback, is that Ok with you Amit?

--
Daniel Gustafsson   https://vmware.com/





Re: A reloption for partitioned tables - parallel_workers

2021-04-06 Thread Amit Langote
On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe  wrote:
> On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:
> > On Fri, 19 Mar 2021 at 02:07, Amit Langote  wrote:
> > > Attached a new version rebased over c8f78b616, with the grouping
> > > relation partitioning enhancements as a separate patch 0001.  Sorry
> > > about the delay.
> >
> > I had a quick look at this and wondered if the partitioned table's
> > parallel workers shouldn't be limited to the sum of the parallel
> > workers of the Append's subpaths?
> >
> > It seems a bit weird to me that the following case requests 4 workers:
> >
> > # create table lp (a int) partition by list(a);
> > # create table lp1 partition of lp for values in(1);
> > # insert into lp select 1 from generate_series(1,1000) x;
> > # alter table lp1 set (parallel_workers = 2);
> > # alter table lp set (parallel_workers = 4);
> > # set max_parallel_workers_per_Gather = 8;
> > # explain select count(*) from lp;
> > QUERY PLAN
> > ---
> >  Finalize Aggregate  (cost=97331.63..97331.64 rows=1 width=8)
> >->  Gather  (cost=97331.21..97331.62 rows=4 width=8)
> >  Workers Planned: 4
> >  ->  Partial Aggregate  (cost=96331.21..96331.22 rows=1 width=8)
> >->  Parallel Seq Scan on lp1 lp  (cost=0.00..85914.57
> > rows=4166657 width=0)
> > (5 rows)
> >
> > I can see a good argument that there should only be 2 workers here.
>
> Good point, I agree.
>
> > If someone sets the partitioned table's parallel_workers high so that
> > they get a large number of workers when no partitions are pruned
> > during planning, do they really want the same number of workers in
> > queries where a large number of partitions are pruned?
> >
> > This problem gets a bit more complex in generic plans where the
> > planner can't prune anything but run-time pruning prunes many
> > partitions. I'm not so sure what to do about that, but the problem
> > does exist today to a lesser extent with the current method of
> > determining the append parallel workers.
>
> Also a good point.  That would require changing the actual number of
> parallel workers at execution time, but that is tricky.
> If we go with your suggestion above, we'd have to disambiguate if
> the number of workers is set because a partition is large enough
> to warrant a parallel scan (then it shouldn't be reduced if the executor
> prunes partitions) or if it is because of the number of partitions
> (then it should be reduced).

Maybe we really want a parallel_append_workers for partitioned tables,
instead of piggybacking on parallel_workers?

> I don't know if Seamus is still working on that; if not, we might
> mark it as "returned with feedback".

I have to agree given the time left.

> Perhaps Amit's patch 0001 should go in independently.

Perhaps, but maybe we should wait until something really needs that.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: A reloption for partitioned tables - parallel_workers

2021-04-02 Thread Laurenz Albe
On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:
> On Fri, 19 Mar 2021 at 02:07, Amit Langote  wrote:
> > Attached a new version rebased over c8f78b616, with the grouping
> > relation partitioning enhancements as a separate patch 0001.  Sorry
> > about the delay.
> 
> I had a quick look at this and wondered if the partitioned table's
> parallel workers shouldn't be limited to the sum of the parallel
> workers of the Append's subpaths?
> 
> It seems a bit weird to me that the following case requests 4 workers:
> 
> # create table lp (a int) partition by list(a);
> # create table lp1 partition of lp for values in(1);
> # insert into lp select 1 from generate_series(1,1000) x;
> # alter table lp1 set (parallel_workers = 2);
> # alter table lp set (parallel_workers = 4);
> # set max_parallel_workers_per_Gather = 8;
> # explain select count(*) from lp;
> QUERY PLAN
> ---
>  Finalize Aggregate  (cost=97331.63..97331.64 rows=1 width=8)
>->  Gather  (cost=97331.21..97331.62 rows=4 width=8)
>  Workers Planned: 4
>  ->  Partial Aggregate  (cost=96331.21..96331.22 rows=1 width=8)
>->  Parallel Seq Scan on lp1 lp  (cost=0.00..85914.57
> rows=4166657 width=0)
> (5 rows)
> 
> I can see a good argument that there should only be 2 workers here.

Good point, I agree.

> If someone sets the partitioned table's parallel_workers high so that
> they get a large number of workers when no partitions are pruned
> during planning, do they really want the same number of workers in
> queries where a large number of partitions are pruned?
> 
> This problem gets a bit more complex in generic plans where the
> planner can't prune anything but run-time pruning prunes many
> partitions. I'm not so sure what to do about that, but the problem
> does exist today to a lesser extent with the current method of
> determining the append parallel workers.

Also a good point.  That would require changing the actual number of
parallel workers at execution time, but that is tricky.
If we go with your suggestion above, we'd have to disambiguate if
the number of workers is set because a partition is large enough
to warrant a parallel scan (then it shouldn't be reduced if the executor
prunes partitions) or if it is because of the number of partitions
(then it should be reduced).

Currently, we don't reduce parallelism if the executor prunes
partitions, so this could be seen as an independent problem.

I don't know if Seamus is still working on that; if not, we might
mark it as "returned with feedback".

Perhaps Amit's patch 0001 should go in independently.

I'll mark the patch as "waiting for author".

Yours,
Laurenz Albe





Re: A reloption for partitioned tables - parallel_workers

2021-03-23 Thread David Rowley
On Fri, 19 Mar 2021 at 02:07, Amit Langote  wrote:
> Attached a new version rebased over c8f78b616, with the grouping
> relation partitioning enhancements as a separate patch 0001.  Sorry
> about the delay.

I had a quick look at this and wondered if the partitioned table's
parallel workers shouldn't be limited to the sum of the parallel
workers of the Append's subpaths?

It seems a bit weird to me that the following case requests 4 workers:

# create table lp (a int) partition by list(a);
# create table lp1 partition of lp for values in(1);
# insert into lp select 1 from generate_series(1,1000) x;
# alter table lp1 set (parallel_workers = 2);
# alter table lp set (parallel_workers = 4);
# set max_parallel_workers_per_Gather = 8;
# explain select count(*) from lp;
QUERY PLAN
---
 Finalize Aggregate  (cost=97331.63..97331.64 rows=1 width=8)
   ->  Gather  (cost=97331.21..97331.62 rows=4 width=8)
 Workers Planned: 4
 ->  Partial Aggregate  (cost=96331.21..96331.22 rows=1 width=8)
   ->  Parallel Seq Scan on lp1 lp  (cost=0.00..85914.57
rows=4166657 width=0)
(5 rows)

I can see a good argument that there should only be 2 workers here.

If someone sets the partitioned table's parallel_workers high so that
they get a large number of workers when no partitions are pruned
during planning, do they really want the same number of workers in
queries where a large number of partitions are pruned?

This problem gets a bit more complex in generic plans where the
planner can't prune anything but run-time pruning prunes many
partitions. I'm not so sure what to do about that, but the problem
does exist today to a lesser extent with the current method of
determining the append parallel workers.

David




Re: A reloption for partitioned tables - parallel_workers

2021-03-18 Thread Amit Langote
On Fri, Mar 5, 2021 at 11:06 PM Laurenz Albe  wrote:
> On Fri, 2021-03-05 at 22:55 +0900, Amit Langote wrote:
> > On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe  
> > wrote:
> > > On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> > > > For example, with the attached PoC patch:
> > >
> > > I have incorporated your POC patch and added a regression test.
> > >
> > > I didn't test it thoroughly though.
> >
> > Thanks.  Although, I wonder if we should rather consider it a
> > standalone patch to fix a partition planning code deficiency.
>
> Oh - I didn't realize that your patch was independent.

Attached a new version rebased over c8f78b616, with the grouping
relation partitioning enhancements as a separate patch 0001.  Sorry
about the delay.

I'd also like to change compute_append_parallel_workers(), as also
mentioned upthread, such that disabling Parallel Append by setting
parallel_workers=0 on a parent partitioned table does not also disable
the partitions themselves being scanned in parallel even though under
an Append.  I didn't get time today to work on that though.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v5-0001-Mark-fully-grouped-relations-partitioned-if-input.patch
Description: Binary data


v5-0002-Allow-setting-parallel_workers-on-partitioned-tab.patch
Description: Binary data


Re: A reloption for partitioned tables - parallel_workers

2021-03-05 Thread Laurenz Albe
On Fri, 2021-03-05 at 22:55 +0900, Amit Langote wrote:
> On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe  wrote:
> > On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> > > For example, with the attached PoC patch:
> > 
> > I have incorporated your POC patch and added a regression test.
> > 
> > I didn't test it thoroughly though.
> 
> Thanks.  Although, I wonder if we should rather consider it a
> standalone patch to fix a partition planning code deficiency.

Oh - I didn't realize that your patch was independent.

Yours,
Laurenz Albe





Re: A reloption for partitioned tables - parallel_workers

2021-03-05 Thread Amit Langote
On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe  wrote:
> On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> > For example, with the attached PoC patch:
>
> I have incorporated your POC patch and added a regression test.
>
> I didn't test it thoroughly though.

Thanks.  Although, I wonder if we should rather consider it a
standalone patch to fix a partition planning code deficiency.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: A reloption for partitioned tables - parallel_workers

2021-03-05 Thread Laurenz Albe
On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> For example, with the attached PoC patch:

I have incorporated your POC patch and added a regression test.

I didn't test it thoroughly though.

Yours,
Laurenz Albe
From 34f7da98b34bc1dbf7daca9e2ca6055c80d77f43 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 5 Mar 2021 14:43:34 +0100
Subject: [PATCH] Allow setting parallel_workers on partitioned tables.

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase when used with cstore_fdw
column store partitions.

This is also useful to lower the number of parallel workers, for
example when the executor is expected to prune most partitions.

The setting also affects parallel appends for partitionwise aggregates
on a partitioned table, but not partitionwise joins.

Authors: Seamus Abshere, Amit Langote, Laurenz Albe
Discussion: https://postgr.es/m/95b1dd96-8634-4545-b1de-e2ac779beb44%40www.fastmail.com
---
 doc/src/sgml/ref/create_table.sgml|  15 +-
 src/backend/access/common/reloptions.c|  14 +-
 src/backend/optimizer/path/allpaths.c | 155 ++
 src/backend/optimizer/plan/planner.c  |  19 +++
 src/include/utils/rel.h   |  15 +-
 .../regress/expected/partition_aggregate.out  |  73 -
 src/test/regress/sql/partition_aggregate.sql  |  19 ++-
 7 files changed, 231 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 3b2b227683..c8c14850c1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS numeric_literal, REM
 If a table parameter value is set and the
 equivalent toast. parameter is not, the TOAST table
 will use the table's parameter value.
-Specifying these parameters for partitioned tables is not supported,
-but you may specify them for individual leaf partitions.
+These parameters, with the exception of parallel_workers,
+are not supported on partitioned tables, but you may specify them for
+individual leaf partitions.

 

@@ -1401,9 +1402,13 @@ WITH ( MODULUS numeric_literal, REM
  
   This sets the number of workers that should be used to assist a parallel
   scan of this table.  If not set, the system will determine a value based
-  on the relation size.  The actual number of workers chosen by the planner
-  or by utility statements that use parallel scans may be less, for example
-  due to the setting of .
+  on the relation size and the number of scanned partitions.
+  When set on a partitioned table, the specified number of workers will
+  work on distinct partitions, so the number of partitions affected by the
+  parallel operation should be taken into account.
+  The actual number of workers chosen by the planner or by utility
+  statements that use parallel scans may be less, for example due
+  to the setting of .
  
 

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
 		{
 			"parallel_workers",
 			"Number of parallel processes that can be used per executor node for this relation.",
-			RELOPT_KIND_HEAP,
+			RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
 			ShareUpdateExclusiveLock
 		},
 		-1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
 	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
+	 * Currently the only setting known to be useful for partitioned tables
+	 * is parallel_workers.
 	 */
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(PartitionedTableRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 	  RELOPT_KIND_PARTITIONED,
-	  0, NULL, 0);
+	  sizeof(PartitionedTableRdOptions),
+	  tab, lengthof(tab));
 }
 
 /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..e22cb6f33e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,6 +97,9 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 Index rti, RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	Index rti, RangeTblEntry *rte);
+static int compute_append_parallel_workers(RelOptInfo *rel, List 

Re: A reloption for partitioned tables - parallel_workers

2021-03-03 Thread Laurenz Albe
On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> On Tue, Mar 2, 2021 at 5:47 PM Laurenz Albe  wrote:
> > On Tue, 2021-03-02 at 11:23 +0900, Amit Langote wrote:
> > > I got the same result with my implementation, but I am wondering if
> > > setting parallel_workers=0 on the parent table shouldn't really
> > > disable a regular (non-parallel-aware) Append running under Gather
> > > even if it does Parallel Append (parallel-aware)?  So in this test
> > > case, there should have been a Gather atop Append, with individual
> > > partitions scanned using Parallel Seq Scan where applicable.
> > 
> > I am not sure, but I tend to think that if you specify no
> > parallel workers, you want no parallel workers.
> 
> I am thinking that one would set parallel_workers on a parent
> partitioned table to control only how many workers a Parallel Append
> can spread across partitions or use parallel_workers=0 to disable this
> form of partition parallelism.  However, one may still want the
> individual partitions to be scanned in parallel, where workers only
> spread across the partition's blocks.  IMO, we should try to keep
> those two forms of parallelism separately configurable.

I see your point.

I thought that PostgreSQL might consider such a plan anyway, but
I am not deep enough into the partitioning code to know.

Thinking this further, wouldn't that mean that we get into a
conflict if someone sets "parallel_workers" on both a partition and
the partitioned table?  Which setting should win?

> >   SET enable_partitionwise_aggregate = on;
> > 
> >   EXPLAIN (COSTS OFF)
> >   SELECT count(*) FROM pagg_tab_ml;
> > QUERY PLAN
> >   
> > --
> >Finalize Aggregate
> >  ->  Gather
> >Workers Planned: 4
> >->  Parallel Append
> >  ->  Partial Aggregate
> >->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
> >  ->  Partial Aggregate
> >->  Parallel Seq Scan on pagg_tab_ml_p3_s1 
> > pagg_tab_ml_3
> >  ->  Partial Aggregate
> >->  Parallel Seq Scan on pagg_tab_ml_p2_s1 
> > pagg_tab_ml_1
> >  ->  Partial Aggregate
> >->  Parallel Seq Scan on pagg_tab_ml_p3_s2 
> > pagg_tab_ml_4
> >  ->  Partial Aggregate
> >->  Parallel Seq Scan on pagg_tab_ml_p2_s2 
> > pagg_tab_ml_2
> >   (14 rows)
> > 
> > The default number of parallel workers is taken, because the append is
> > on an upper relation, not the partitioned table itself.
> > 
> > One would wish that "parallel_workers" somehow percolated up,
> 
> It appears that we don't set the fields of an upper relation such that
> IS_PARTITIONED_REL() would return true for it, like we do for base and
> join relations.  In compute_append_parallel_workers(), we're requiring
> it to be true to even look at the relation's rel_parallel_workers.  We
> can set those properties in *some* grouping rels, for example, when
> the aggregation is grouped on the input relation's partition key.
> That would make it possible for the Append on such grouping relations
> to refer to their input partitioned relation's rel_parallel_workers.
> For example, with the attached PoC patch:
> 
> SET parallel_setup_cost TO 0;
> SET max_parallel_workers_per_gather TO 8;
> SET enable_partitionwise_aggregate = on;
> 
> alter table pagg_tab_ml set (parallel_workers=5);
> 
> EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
>  QUERY PLAN
> -
>  Gather
>Workers Planned: 5
>->  Parallel Append
>  ->  HashAggregate
>Group Key: pagg_tab_ml_5.a
>->  Append
>  ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
>  ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
>  ->  HashAggregate
>Group Key: pagg_tab_ml.a
>->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
>  ->  HashAggregate
>Group Key: pagg_tab_ml_2.a
>->  Append
>  ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
>  ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
> (16 rows)
> 
> alter table pagg_tab_ml set (parallel_workers=0);
> 
> EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
> QUERY PLAN
> --
>  Append
>->  Finalize GroupAggregate
>  Group Key: pagg_tab_ml.a
>  ->  Gather Merge
>Workers Planned: 1
>->  Sort
>  Sort Key: pagg_tab_ml.a
>  ->  Partial HashAggregate
>   

Re: A reloption for partitioned tables - parallel_workers

2021-03-03 Thread Amit Langote
On Tue, Mar 2, 2021 at 5:47 PM Laurenz Albe  wrote:
> On Tue, 2021-03-02 at 11:23 +0900, Amit Langote wrote:
> > +ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
> > +EXPLAIN (COSTS OFF)
> > +SELECT a FROM pagg_tab_ml WHERE b = 42;
> > +QUERY PLAN
> > +---
> > + Append
> > +   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
> > + Filter: (b = 42)
> > +   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
> > + Filter: (b = 42)
> > +   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
> > + Filter: (b = 42)
> > +(7 rows)
> >
> > I got the same result with my implementation, but I am wondering if
> > setting parallel_workers=0 on the parent table shouldn't really
> > disable a regular (non-parallel-aware) Append running under Gather
> > even if it does Parallel Append (parallel-aware)?  So in this test
> > case, there should have been a Gather atop Append, with individual
> > partitions scanned using Parallel Seq Scan where applicable.
>
> I am not sure, but I tend to think that if you specify no
> parallel workers, you want no parallel workers.

I am thinking that one would set parallel_workers on a parent
partitioned table to control only how many workers a Parallel Append
can spread across partitions or use parallel_workers=0 to disable this
form of partition parallelism.  However, one may still want the
individual partitions to be scanned in parallel, where workers only
spread across the partition's blocks.  IMO, we should try to keep
those two forms of parallelism separately configurable.

> But I noticed the following:
>
>   SET enable_partitionwise_aggregate = on;
>
>   EXPLAIN (COSTS OFF)
>   SELECT count(*) FROM pagg_tab_ml;
> QUERY PLAN
>   
> --
>Finalize Aggregate
>  ->  Gather
>Workers Planned: 4
>->  Parallel Append
>  ->  Partial Aggregate
>->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
>  ->  Partial Aggregate
>->  Parallel Seq Scan on pagg_tab_ml_p3_s1 
> pagg_tab_ml_3
>  ->  Partial Aggregate
>->  Parallel Seq Scan on pagg_tab_ml_p2_s1 
> pagg_tab_ml_1
>  ->  Partial Aggregate
>->  Parallel Seq Scan on pagg_tab_ml_p3_s2 
> pagg_tab_ml_4
>  ->  Partial Aggregate
>->  Parallel Seq Scan on pagg_tab_ml_p2_s2 
> pagg_tab_ml_2
>   (14 rows)
>
> The default number of parallel workers is taken, because the append is
> on an upper relation, not the partitioned table itself.
>
> One would wish that "parallel_workers" somehow percolated up,

I would have liked that too.

> but I
> have no idea how that should work.

It appears that we don't set the fields of an upper relation such that
IS_PARTITIONED_REL() would return true for it, like we do for base and
join relations.  In compute_append_parallel_workers(), we're requiring
it to be true to even look at the relation's rel_parallel_workers.  We
can set those properties in *some* grouping rels, for example, when
the aggregation is grouped on the input relation's partition key.
That would make it possible for the Append on such grouping relations
to refer to their input partitioned relation's rel_parallel_workers.
For example, with the attached PoC patch:

SET parallel_setup_cost TO 0;
SET max_parallel_workers_per_gather TO 8;
SET enable_partitionwise_aggregate = on;

alter table pagg_tab_ml set (parallel_workers=5);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
 QUERY PLAN
-
 Gather
   Workers Planned: 5
   ->  Parallel Append
 ->  HashAggregate
   Group Key: pagg_tab_ml_5.a
   ->  Append
 ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
 ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
 ->  HashAggregate
   Group Key: pagg_tab_ml.a
   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
 ->  HashAggregate
   Group Key: pagg_tab_ml_2.a
   ->  Append
 ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
 ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
(16 rows)

alter table pagg_tab_ml set (parallel_workers=0);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
QUERY PLAN
--
 Append
   ->  Finalize GroupAggregate
 Group Key: pagg_tab_ml.a
 ->  Gather Merge
   Workers Planned: 1
   ->  Sort
 Sort Key: pagg_tab_ml.a
 ->  Partial 

Re: A reloption for partitioned tables - parallel_workers

2021-03-02 Thread Laurenz Albe
On Tue, 2021-03-02 at 11:23 +0900, Amit Langote wrote:
> +ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
> +EXPLAIN (COSTS OFF)
> +SELECT a FROM pagg_tab_ml WHERE b = 42;
> +QUERY PLAN
> +---
> + Append
> +   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
> + Filter: (b = 42)
> +   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
> + Filter: (b = 42)
> +   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
> + Filter: (b = 42)
> +(7 rows)
> 
> I got the same result with my implementation, but I am wondering if
> setting parallel_workers=0 on the parent table shouldn't really
> disable a regular (non-parallel-aware) Append running under Gather
> even if it does Parallel Append (parallel-aware)?  So in this test
> case, there should have been a Gather atop Append, with individual
> partitions scanned using Parallel Seq Scan where applicable.

I am not sure, but I tend to think that if you specify no
parallel workers, you want no parallel workers.

But I noticed the following:

  SET enable_partitionwise_aggregate = on;

  EXPLAIN (COSTS OFF)
  SELECT count(*) FROM pagg_tab_ml;
QUERY PLAN  
  --
   Finalize Aggregate
 ->  Gather
   Workers Planned: 4
   ->  Parallel Append
 ->  Partial Aggregate
   ->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
 ->  Partial Aggregate
   ->  Parallel Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
 ->  Partial Aggregate
   ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
 ->  Partial Aggregate
   ->  Parallel Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
 ->  Partial Aggregate
   ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
  (14 rows)

The default number of parallel workers is taken, because the append is
on an upper relation, not the partitioned table itself.

One would wish that "parallel_workers" somehow percolated up, but I
have no idea how that should work.

Yours,
Laurenz Albe





Re: A reloption for partitioned tables - parallel_workers

2021-03-01 Thread Amit Langote
On Tue, Mar 2, 2021 at 12:10 AM Laurenz Albe  wrote:
> Here is an updated patch with this fix.

Thanks for updating the patch.  I was about to post an updated version
myself but you beat me to it.

> I added regression tests and adapted the documentation a bit.
>
> I also added support for lowering the number of parallel workers.

+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+QUERY PLAN
+---
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+ Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+ Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+ Filter: (b = 42)
+(7 rows)

I got the same result with my implementation, but I am wondering if
setting parallel_workers=0 on the parent table shouldn't really
disable a regular (non-parallel-aware) Append running under Gather
even if it does Parallel Append (parallel-aware)?  So in this test
case, there should have been a Gather atop Append, with individual
partitions scanned using Parallel Seq Scan where applicable.


--
Amit Langote
EDB: http://www.enterprisedb.com




Re: A reloption for partitioned tables - parallel_workers

2021-03-01 Thread Laurenz Albe
On Mon, 2021-03-01 at 17:39 +0900, Amit Langote wrote:
> > I am not sure.
> > IMO, for normal table, we use the following macro to get the 
> > parallel_workers:
> > --
> > /*
> >   * RelationGetParallelWorkers
> >   *  Returns the relation's parallel_workers reloption setting.
> >   *  Note multiple eval of argument!
> >   */
> > #define RelationGetParallelWorkers(relation, defaultpw) \
> >  ((relation)->rd_options ? \
> >   ((StdRdOptions *) (relation)->rd_options)->parallel_workers : 
> > (defaultpw))
> > --
> > Since we add new struct " PartitionedTableRdOptions ", It seems we need to 
> > get parallel_workers in different way.
> > Do we need similar macro to get partitioned table's parallel_workers ?
> 
> Oh, you're right.  The parallel_workers setting of a relation is only
> accessed through this macro, even for partitioned tables, and I can
> see that it is actually wrong to access a partitioned table's
> parallel_workers through this macro as-is.   Although I hadn't tested
> that, so thanks for pointing that out.
> 
> > Like:
> > #define PartitionedTableGetParallelWorkers(relation, defaultpw) \
> > xxx
> > (PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : 
> > (defaultpw))
> 
> I'm thinking it would be better to just modify the existing macro to
> check relkind to decide which struct pointer type to cast the value in
> rd_options to.

Here is an updated patch with this fix.

I added regression tests and adapted the documentation a bit.

I also added support for lowering the number of parallel workers.

Yours,
Laurenz Albe
From d48874a61ac698dc284b83b7e3b147a71158d09d Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 1 Mar 2021 16:06:49 +0100
Subject: [PATCH] Allow setting parallel_workers on partitioned tables.

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase (in my case, 700%) when used
with cstore_fdw column store partitions.

This is also useful to lower the number of parallel workers, for
example when the executor is expected to prune most partitions.

Authors: Seamus Abshere, Amit Langote, Laurenz Albe
Discussion: https://postgr.es/m/95b1dd96-8634-4545-b1de-e2ac779beb44%40www.fastmail.com
---
 doc/src/sgml/ref/create_table.sgml|  15 +-
 src/backend/access/common/reloptions.c|  14 +-
 src/backend/optimizer/path/allpaths.c | 155 ++
 src/include/utils/rel.h   |  15 +-
 .../regress/expected/partition_aggregate.out  |  51 +-
 src/test/regress/sql/partition_aggregate.sql  |  17 +-
 6 files changed, 188 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 3b2b227683..c8c14850c1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS numeric_literal, REM
 If a table parameter value is set and the
 equivalent toast. parameter is not, the TOAST table
 will use the table's parameter value.
-Specifying these parameters for partitioned tables is not supported,
-but you may specify them for individual leaf partitions.
+These parameters, with the exception of parallel_workers,
+are not supported on partitioned tables, but you may specify them for
+individual leaf partitions.

 

@@ -1401,9 +1402,13 @@ WITH ( MODULUS numeric_literal, REM
  
   This sets the number of workers that should be used to assist a parallel
   scan of this table.  If not set, the system will determine a value based
-  on the relation size.  The actual number of workers chosen by the planner
-  or by utility statements that use parallel scans may be less, for example
-  due to the setting of .
+  on the relation size and the number of scanned partitions.
+  When set on a partitioned table, the specified number of workers will
+  work on distinct partitions, so the number of partitions affected by the
+  parallel operation should be taken into account.
+  The actual number of workers chosen by the planner or by utility
+  statements that use parallel scans may be less, for example due
+  to the setting of .
  
 

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
 		{
 			"parallel_workers",
 			"Number of parallel processes that can be used per executor node for this relation.",
-			RELOPT_KIND_HEAP,
+			

Re: A reloption for partitioned tables - parallel_workers

2021-03-01 Thread Amit Langote
On Tue, Feb 23, 2021 at 7:24 PM houzj.f...@fujitsu.com
 wrote:
> > > It seems the patch does not include the code that get the
> > > parallel_workers from new struct " PartitionedTableRdOptions ", Did I miss
> > something ?
> >
> > Aren't the following hunks in the v2 patch what you meant?
> >
> > diff --git a/src/backend/access/common/reloptions.c
> > b/src/backend/access/common/reloptions.c
> > index c687d3ee9e..f8443d2361 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
> >   {
> >   "parallel_workers",
> >   "Number of parallel processes that can be used per executor node for this
> > relation.",
> > - RELOPT_KIND_HEAP,
> > + RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> >   ShareUpdateExclusiveLock
> >   },
> >   -1, 0, 1024
> > @@ -1962,12 +1962,18 @@ bytea *
> >  partitioned_table_reloptions(Datum reloptions, bool validate)  {
> >   /*
> > - * There are no options for partitioned tables yet, but this is able to do
> > - * some validation.
> > + * Currently the only setting known to be useful for partitioned tables
> > + * is parallel_workers.
> >   */
> > + static const relopt_parse_elt tab[] = { {"parallel_workers",
> > + RELOPT_TYPE_INT, offsetof(PartitionedTableRdOptions,
> > + parallel_workers)}, };
> > +
> >   return (bytea *) build_reloptions(reloptions, validate,
> > RELOPT_KIND_PARTITIONED,
> > -   0, NULL, 0);
> > +   sizeof(PartitionedTableRdOptions),
> > +   tab, lengthof(tab));
> >  }
> >
> >  /*
> >
> > diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index
> > 10b63982c0..fe114e0856 100644
> > --- a/src/include/utils/rel.h
> > +++ b/src/include/utils/rel.h
> > @@ -308,6 +308,16 @@ typedef struct StdRdOptions
> >   bool vacuum_truncate; /* enables vacuum to truncate a relation */  }
> > StdRdOptions;
> >
> > +/*
> > + * PartitionedTableRdOptions
> > + * Contents of rd_options for partitioned tables  */ typedef struct
> > +PartitionedTableRdOptions {
> > + int32 vl_len_; /* varlena header (do not touch directly!) */  int
> > +parallel_workers; /* max number of parallel workers */ }
> > +PartitionedTableRdOptions;
> > +
> >  #define HEAP_MIN_FILLFACTOR 10
> >  #define HEAP_DEFAULT_FILLFACTOR 100
> Hi,
>
> I am not sure.
> IMO, for normal table, we use the following macro to get the parallel_workers:
> --
> /*
>  * RelationGetParallelWorkers
>  *  Returns the relation's parallel_workers reloption setting.
>  *  Note multiple eval of argument!
>  */
> #define RelationGetParallelWorkers(relation, defaultpw) \
> ((relation)->rd_options ? \
>  ((StdRdOptions *) (relation)->rd_options)->parallel_workers : 
> (defaultpw))
> --
>
> Since we add new struct " PartitionedTableRdOptions ", It seems we need to 
> get parallel_workers in different way.
> Do we need similar macro to get partitioned table's parallel_workers ?

Oh, you're right.  The parallel_workers setting of a relation is only
accessed through this macro, even for partitioned tables, and I can
see that it is actually wrong to access a partitioned table's
parallel_workers through this macro as-is.   Although I hadn't tested
that, so thanks for pointing that out.

> Like:
> #define PartitionedTableGetParallelWorkers(relation, defaultpw) \
> xxx
> (PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : 
> (defaultpw))

I'm thinking it would be better to just modify the existing macro to
check relkind to decide which struct pointer type to cast the value in
rd_options to.

I will post an updated patch later.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




RE: A reloption for partitioned tables - parallel_workers

2021-02-23 Thread houzj.f...@fujitsu.com
> > It seems the patch does not include the code that get the
> > parallel_workers from new struct " PartitionedTableRdOptions ", Did I miss
> something ?
> 
> Aren't the following hunks in the v2 patch what you meant?
> 
> diff --git a/src/backend/access/common/reloptions.c
> b/src/backend/access/common/reloptions.c
> index c687d3ee9e..f8443d2361 100644
> --- a/src/backend/access/common/reloptions.c
> +++ b/src/backend/access/common/reloptions.c
> @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
>   {
>   "parallel_workers",
>   "Number of parallel processes that can be used per executor node for this
> relation.",
> - RELOPT_KIND_HEAP,
> + RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
>   ShareUpdateExclusiveLock
>   },
>   -1, 0, 1024
> @@ -1962,12 +1962,18 @@ bytea *
>  partitioned_table_reloptions(Datum reloptions, bool validate)  {
>   /*
> - * There are no options for partitioned tables yet, but this is able to do
> - * some validation.
> + * Currently the only setting known to be useful for partitioned tables
> + * is parallel_workers.
>   */
> + static const relopt_parse_elt tab[] = { {"parallel_workers",
> + RELOPT_TYPE_INT, offsetof(PartitionedTableRdOptions,
> + parallel_workers)}, };
> +
>   return (bytea *) build_reloptions(reloptions, validate,
> RELOPT_KIND_PARTITIONED,
> -   0, NULL, 0);
> +   sizeof(PartitionedTableRdOptions),
> +   tab, lengthof(tab));
>  }
> 
>  /*
> 
> diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index
> 10b63982c0..fe114e0856 100644
> --- a/src/include/utils/rel.h
> +++ b/src/include/utils/rel.h
> @@ -308,6 +308,16 @@ typedef struct StdRdOptions
>   bool vacuum_truncate; /* enables vacuum to truncate a relation */  }
> StdRdOptions;
> 
> +/*
> + * PartitionedTableRdOptions
> + * Contents of rd_options for partitioned tables  */ typedef struct
> +PartitionedTableRdOptions {
> + int32 vl_len_; /* varlena header (do not touch directly!) */  int
> +parallel_workers; /* max number of parallel workers */ }
> +PartitionedTableRdOptions;
> +
>  #define HEAP_MIN_FILLFACTOR 10
>  #define HEAP_DEFAULT_FILLFACTOR 100
Hi,

I am not sure.
IMO, for normal table, we use the following macro to get the parallel_workers:
--
/*
 * RelationGetParallelWorkers
 *  Returns the relation's parallel_workers reloption setting.
 *  Note multiple eval of argument!
 */
#define RelationGetParallelWorkers(relation, defaultpw) \
((relation)->rd_options ? \
 ((StdRdOptions *) (relation)->rd_options)->parallel_workers : 
(defaultpw))
--

Since we add new struct " PartitionedTableRdOptions ", It seems we need to get 
parallel_workers in different way.
Do we need similar macro to get partitioned table's parallel_workers ?

Like:
#define PartitionedTableGetParallelWorkers(relation, defaultpw) \ 
xxx
(PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : 
(defaultpw))

Best regards,
houzj



Re: A reloption for partitioned tables - parallel_workers

2021-02-23 Thread Amit Langote
Hi,

On Tue, Feb 23, 2021 at 3:12 PM houzj.f...@fujitsu.com
 wrote:
> > Here is an updated version of the Seamus' patch that takes into account 
> > these
> > and other comments received on this thread so far.
> > Maybe warrants adding some tests too but I haven't.
> >
> > Seamus, please register this patch in the next commit-fest:
> > https://commitfest.postgresql.org/32/
> >
> > If you haven't already, you will need to create a community account to use 
> > that
> > site.
>
> It seems the patch does not include the code that get the parallel_workers 
> from new struct " PartitionedTableRdOptions ",
> Did I miss something ?

Aren't the following hunks in the v2 patch what you meant?

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
  {
  "parallel_workers",
  "Number of parallel processes that can be used per executor node for
this relation.",
- RELOPT_KIND_HEAP,
+ RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
  ShareUpdateExclusiveLock
  },
  -1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
  /*
- * There are no options for partitioned tables yet, but this is able to do
- * some validation.
+ * Currently the only setting known to be useful for partitioned tables
+ * is parallel_workers.
  */
+ static const relopt_parse_elt tab[] = {
+ {"parallel_workers", RELOPT_TYPE_INT,
+ offsetof(PartitionedTableRdOptions, parallel_workers)},
+ };
+
  return (bytea *) build_reloptions(reloptions, validate,
RELOPT_KIND_PARTITIONED,
-   0, NULL, 0);
+   sizeof(PartitionedTableRdOptions),
+   tab, lengthof(tab));
 }

 /*

diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 10b63982c0..fe114e0856 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
  bool vacuum_truncate; /* enables vacuum to truncate a relation */
 } StdRdOptions;

+/*
+ * PartitionedTableRdOptions
+ * Contents of rd_options for partitioned tables
+ */
+typedef struct PartitionedTableRdOptions
+{
+ int32 vl_len_; /* varlena header (do not touch directly!) */
+ int parallel_workers; /* max number of parallel workers */
+} PartitionedTableRdOptions;
+
 #define HEAP_MIN_FILLFACTOR 10
 #define HEAP_DEFAULT_FILLFACTOR 100

-- 
Amit Langote
EDB: http://www.enterprisedb.com




RE: A reloption for partitioned tables - parallel_workers

2021-02-22 Thread houzj.f...@fujitsu.com
> > 4. Maybe it also doesn't make sense to consider the parent relation's
> > parallel_workers if Parallel Append is disabled
> > (enable_parallel_append = off).  That's because with a simple
> > (non-parallel) Append running under Gather, all launched parallel
> > workers process the same partition before moving to the next one.
> > OTOH, one's intention of setting parallel_workers on the parent
> > partitioned table would most likely be to distribute workers across
> > partitions, which is only possible with parallel Append
> > (enable_parallel_append = on).  So, the new code should look like
> > this:
> >
> > if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
> > parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
> 
> Here is an updated version of the Seamus' patch that takes into account these
> and other comments received on this thread so far.
> Maybe warrants adding some tests too but I haven't.
> 
> Seamus, please register this patch in the next commit-fest:
> https://commitfest.postgresql.org/32/
> 
> If you haven't already, you will need to create a community account to use 
> that
> site.

It seems the patch does not include the code that get the parallel_workers from 
new struct " PartitionedTableRdOptions ",
Did I miss something ?

Best regards,
houzj



Re: A reloption for partitioned tables - parallel_workers

2021-02-19 Thread Laurenz Albe
On Fri, 2021-02-19 at 16:30 +0900, Amit Langote wrote:
> On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere  wrote:
> > > Here we go, my first patch... solves 
> > > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com
> 
> Here is an updated version of the Seamus' patch that takes into
> account these and other comments received on this thread so far.
> Maybe warrants adding some tests too but I haven't.

Yes, there should be regression tests.

I gave the patch a spin, and it allows to raise the number of workers for
a parallel append as advertised.

--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS numeric_literal, REM
 If a table parameter value is set and the
 equivalent toast. parameter is not, the TOAST table
 will use the table's parameter value.
-Specifying these parameters for partitioned tables is not supported,
-but you may specify them for individual leaf partitions.
+Specifying most of these parameters for partitioned tables is not
+supported, but you may specify them for individual leaf partitions;
+refer to the description of individual parameters for more details.


This doesn't make me happy.  Since the options themselves do not say if they
are supported on partitioned tables or not, the reader is left in the dark.

Perhaps:

  These options, with the exception of parallel_workers,
  are not supported on partitioned tables, but you may specify them for 
individual
  leaf partitions.

@@ -1401,9 +1402,12 @@ WITH ( MODULUS numeric_literal, REM
  
   This sets the number of workers that should be used to assist a parallel
   scan of this table.  If not set, the system will determine a value based
-  on the relation size.  The actual number of workers chosen by the planner
-  or by utility statements that use parallel scans may be less, for example
-  due to the setting of .
+  on the relation size.  When set on a partitioned table, the specified
+  number of workers will work on distinct partitions, so the number of
+  partitions affected by the parallel operation should be taken into
+  account.  The actual number of workers chosen by the planner or by
+  utility statements that use parallel scans may be less, for example due
+  to the setting of .
  
 


The reader is left to believe that the default number of workers depends on the
size of the partitioned table, which is not entirely true.

Perhaps:

  If not set, the system will determine a value based on the relation size and
  the number of scanned partitions.

--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1268,6 +1268,59 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo 
*rel,
add_paths_to_append_rel(root, rel, live_childrels);
 }
 
+/*
+ * compute_append_parallel_workers
+ * Computes the number of workers to assign to scan the subpaths 
appended
+ * by a given Append path
+ */
+static int
+compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+   int num_live_children,
+   bool parallel_append)

The new function should have a prototype.

+{
+   ListCell   *lc;
+   int parallel_workers = 0;
+
+   /*
+* For partitioned rels, first see if there is a root-level setting for
+* parallel_workers.  But only consider if a Parallel Append plan is
+* to be considered.
+*/
+   if (IS_PARTITIONED_REL(rel) && parallel_append)
+   parallel_workers =
+   compute_parallel_worker(rel, -1, -1,
+   max_parallel_workers_per_gather);
+
+   /* Find the highest number of workers requested for any subpath. */
+   foreach(lc, subpaths)
+   foreach(lc, subpaths)
+   {
+   Path   *path = lfirst(lc);
+
+   parallel_workers = Max(parallel_workers, path->parallel_workers);
+   }
+   Assert(parallel_workers > 0 || subpaths == NIL);
+
+   /*
+* If the use of parallel append is permitted, always request at least
+* log2(# of children) workers.  We assume it can be useful to have
+* extra workers in this case because they will be spread out across
+* the children.  The precise formula is just a guess, but we don't
+* want to end up with a radically different answer for a table with N
+* partitions vs. an unpartitioned table with the same data, so the
+* use of some kind of log-scaling here seems to make some sense.
+*/
+   if (parallel_append)
+   {
+   parallel_workers = Max(parallel_workers,
+  fls(num_live_children));
+   parallel_workers = Min(parallel_workers,
+  max_parallel_workers_per_gather);
+   }
+   Assert(parallel_workers > 0);
+
+   return parallel_workers;
+}

That means that it is not possible to *lower* the number of parallel workers
with this 

Re: A reloption for partitioned tables - parallel_workers

2021-02-19 Thread Amit Langote
On Fri, Feb 19, 2021 at 11:54 PM Seamus Abshere  wrote:
> On Fri, Feb 19, 2021, at 2:30 AM, Amit Langote wrote:
> > Here is an updated version of the Seamus' patch that takes into
> > account these and other comments received on this thread so far.
> > Maybe warrants adding some tests too but I haven't.
> >
> > Seamus, please register this patch in the next commit-fest:
> > https://commitfest.postgresql.org/32/
> >
> > If you haven't already, you will need to create a community account to
> > use that site.
>
> hi Amit,
>
> Thanks so much for doing this. I had created
>
> https://commitfest.postgresql.org/32/2987/

Ah, sorry, I had not checked.  I updated the entry to add you as the author.

> and it looks like it now shows your patch as the one to use. Let me know if I 
> should do anything else.

You could take a look at the latest patch and see if you find any
problems with my or others' suggestions that I implemented in the v2
patch.  Also, please add regression tests for the new feature in
src/test/regress/sql/select_parallel.sql.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: A reloption for partitioned tables - parallel_workers

2021-02-19 Thread Seamus Abshere
hi Amit,

Thanks so much for doing this. I had created

https://commitfest.postgresql.org/32/2987/

and it looks like it now shows your patch as the one to use. Let me know if I 
should do anything else.

Best,
Seamus

On Fri, Feb 19, 2021, at 2:30 AM, Amit Langote wrote:
> On Tue, Feb 16, 2021 at 3:05 PM Amit Langote  wrote:
> > On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere  wrote:
> > > Here we go, my first patch... solves 
> > > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com
> >
> > Thanks for sending the patch here.
> >
> > It seems you haven't made enough changes for reloptions code to
> > recognize parallel_workers as valid for partitioned tables, because
> > even with the patch applied, I get this:
> >
> > create table rp (a int) partition by range (a);
> > create table rp1 partition of rp for values from (minvalue) to (0);
> > create table rp2 partition of rp for values from (0) to (maxvalue);
> > alter table rp set (parallel_workers = 1);
> > ERROR:  unrecognized parameter "parallel_workers"
> >
> > You need this:
> >
> > diff --git a/src/backend/access/common/reloptions.c
> > b/src/backend/access/common/reloptions.c
> > index 029a73325e..9eb8a0c10d 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
> > {
> > "parallel_workers",
> > "Number of parallel processes that can be used per
> > executor node for this relation.",
> > -   RELOPT_KIND_HEAP,
> > +   RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> > ShareUpdateExclusiveLock
> > },
> > -1, 0, 1024
> >
> > which tells reloptions parsing code that parallel_workers is
> > acceptable for both heap and partitioned relations.
> >
> > Other comments on the patch:
> >
> > * This function calling style, where the first argument is not placed
> > on the same line as the function itself, is not very common in
> > Postgres:
> >
> > +   /* First see if there is a root-level setting for parallel_workers 
> > */
> > +   parallel_workers = compute_parallel_worker(
> > +   rel,
> > +   -1,
> > +   -1,
> > +   max_parallel_workers_per_gather
> > +
> >
> > This makes the new code look very different from the rest of the
> > codebase.  Better to stick to existing styles.
> >
> > 2. It might be a good idea to use this opportunity to add a function,
> > say compute_append_parallel_workers(), for the code that does what the
> > function name says.  Then the patch will simply add the new
> > compute_parallel_worker() call at the top of that function.
> >
> > 3. I think we should consider the Append parent relation's
> > parallel_workers ONLY if it is a partitioned relation, because it
> > doesn't make a lot of sense for other types of parent relations.  So
> > the new code should look like this:
> >
> > if (IS_PARTITIONED_REL(rel))
> > parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
> >
> > 4. Maybe it also doesn't make sense to consider the parent relation's
> > parallel_workers if Parallel Append is disabled
> > (enable_parallel_append = off).  That's because with a simple
> > (non-parallel) Append running under Gather, all launched parallel
> > workers process the same partition before moving to the next one.
> > OTOH, one's intention of setting parallel_workers on the parent
> > partitioned table would most likely be to distribute workers across
> > partitions, which is only possible with parallel Append
> > (enable_parallel_append = on).  So, the new code should look like
> > this:
> >
> > if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
> > parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
> 
> Here is an updated version of the Seamus' patch that takes into
> account these and other comments received on this thread so far.
> Maybe warrants adding some tests too but I haven't.
> 
> Seamus, please register this patch in the next commit-fest:
> https://commitfest.postgresql.org/32/
> 
> If you haven't already, you will need to create a community account to
> use that site.
> 
> -- 
> Amit Langote
> EDB: http://www.enterprisedb.com
> 
> Attachments:
> * v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patch




Re: A reloption for partitioned tables - parallel_workers

2021-02-18 Thread Amit Langote
On Tue, Feb 16, 2021 at 3:05 PM Amit Langote  wrote:
> On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere  wrote:
> > Here we go, my first patch... solves 
> > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com
>
> Thanks for sending the patch here.
>
> It seems you haven't made enough changes for reloptions code to
> recognize parallel_workers as valid for partitioned tables, because
> even with the patch applied, I get this:
>
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (minvalue) to (0);
> create table rp2 partition of rp for values from (0) to (maxvalue);
> alter table rp set (parallel_workers = 1);
> ERROR:  unrecognized parameter "parallel_workers"
>
> You need this:
>
> diff --git a/src/backend/access/common/reloptions.c
> b/src/backend/access/common/reloptions.c
> index 029a73325e..9eb8a0c10d 100644
> --- a/src/backend/access/common/reloptions.c
> +++ b/src/backend/access/common/reloptions.c
> @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
> {
> "parallel_workers",
> "Number of parallel processes that can be used per
> executor node for this relation.",
> -   RELOPT_KIND_HEAP,
> +   RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> ShareUpdateExclusiveLock
> },
> -1, 0, 1024
>
> which tells reloptions parsing code that parallel_workers is
> acceptable for both heap and partitioned relations.
>
> Other comments on the patch:
>
> * This function calling style, where the first argument is not placed
> on the same line as the function itself, is not very common in
> Postgres:
>
> +   /* First see if there is a root-level setting for parallel_workers */
> +   parallel_workers = compute_parallel_worker(
> +   rel,
> +   -1,
> +   -1,
> +   max_parallel_workers_per_gather
> +
>
> This makes the new code look very different from the rest of the
> codebase.  Better to stick to existing styles.
>
> 2. It might be a good idea to use this opportunity to add a function,
> say compute_append_parallel_workers(), for the code that does what the
> function name says.  Then the patch will simply add the new
> compute_parallel_worker() call at the top of that function.
>
> 3. I think we should consider the Append parent relation's
> parallel_workers ONLY if it is a partitioned relation, because it
> doesn't make a lot of sense for other types of parent relations.  So
> the new code should look like this:
>
> if (IS_PARTITIONED_REL(rel))
> parallel_workers = compute_parallel_worker(rel, -1, -1,
> max_parallel_workers_per_gather);
>
> 4. Maybe it also doesn't make sense to consider the parent relation's
> parallel_workers if Parallel Append is disabled
> (enable_parallel_append = off).  That's because with a simple
> (non-parallel) Append running under Gather, all launched parallel
> workers process the same partition before moving to the next one.
> OTOH, one's intention of setting parallel_workers on the parent
> partitioned table would most likely be to distribute workers across
> partitions, which is only possible with parallel Append
> (enable_parallel_append = on).  So, the new code should look like
> this:
>
> if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
> parallel_workers = compute_parallel_worker(rel, -1, -1,
> max_parallel_workers_per_gather);

Here is an updated version of the Seamus' patch that takes into
account these and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Seamus, please register this patch in the next commit-fest:
https://commitfest.postgresql.org/32/

If you haven't already, you will need to create a community account to
use that site.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patch
Description: Binary data


Re: A reloption for partitioned tables - parallel_workers

2021-02-18 Thread Amit Langote
On Thu, Feb 18, 2021 at 6:06 PM Hou, Zhijie  wrote:
> > Here we go, my first patch... solves
> > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520
> > e...@www.fastmail.com
> >
>
> Hi,
>
> partitioned_table_reloptions(Datum reloptions, bool validate)
>  {
> +   static const relopt_parse_elt tab[] = {
> +   {"parallel_workers", RELOPT_TYPE_INT,
> +   offsetof(StdRdOptions, parallel_workers)},
> +   };
> +
> return (bytea *) build_reloptions(reloptions, validate,
>   
> RELOPT_KIND_PARTITIONED,
> - 0, 
> NULL, 0);
> + 
> sizeof(StdRdOptions),
> + 
> tab, lengthof(tab));
>  }
>
> I noticed that you plan to store the parallel_workers in the same 
> struct(StdRdOptions) as normal heap relation.
> It seems better to store it in a separate struct.
>
> And as commit 1bbd608 said:
> 
> > Splitting things has the advantage to
> > make the information stored in rd_options include only the necessary
> > information, reducing the amount of memory used for a relcache entry
> > with partitioned tables if new reloptions are introduced at this level.
> 
>
> What do you think?

That may be a good idea.  So instead of referring to the
parallel_workers in StdRdOptions, define a new one, say,
PartitionedTableRdOptions as follows and refer to it in
partitioned_table_reloptions():

typedef struct PartitionedTableRdOptions
{
int32   vl_len_;/* varlena header (do not touch directly!) */
int parallel_workers;   /* max number of parallel workers */
} PartitionedTableRdOptions;

-- 
Amit Langote
EDB: http://www.enterprisedb.com




RE: A reloption for partitioned tables - parallel_workers

2021-02-18 Thread Hou, Zhijie
> hi,
> 
> Here we go, my first patch... solves
> https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520
> e...@www.fastmail.com
> 

Hi,

partitioned_table_reloptions(Datum reloptions, bool validate)
 {
+   static const relopt_parse_elt tab[] = {
+   {"parallel_workers", RELOPT_TYPE_INT,
+   offsetof(StdRdOptions, parallel_workers)},
+   };
+
return (bytea *) build_reloptions(reloptions, validate,
  
RELOPT_KIND_PARTITIONED,
- 0, 
NULL, 0);
+ 
sizeof(StdRdOptions),
+ tab, 
lengthof(tab));
 }

I noticed that you plan to store the parallel_workers in the same 
struct(StdRdOptions) as normal heap relation.
It seems better to store it in a separate struct.

And as commit 1bbd608 said:

> Splitting things has the advantage to
> make the information stored in rd_options include only the necessary
> information, reducing the amount of memory used for a relcache entry
> with partitioned tables if new reloptions are introduced at this level.


What do you think?

Best regards,
Houzj










Re: A reloption for partitioned tables - parallel_workers

2021-02-17 Thread Amit Langote
On Tue, Feb 16, 2021 at 11:02 PM Laurenz Albe  wrote:
> On Tue, 2021-02-16 at 16:29 +0900, Amit Langote wrote:
> > > I am +1 on allowing to override the degree of parallelism on a parallel
> > > append.  If "parallel_workers" on the partitioned table is an option for
> > > that, it might be a simple solution.  On the other hand, perhaps it would
> > > be less confusing to have a different storage parameter name rather than
> > > having "parallel_workers" do double duty.
> > > Also, since there is a design rule that storage parameters can only be 
> > > used
> > > on partitions, we would have to change that - is that a problem for 
> > > anybody?
> >
> > I am not aware of a rule that suggests that parallel_workers is always
> > interpreted using storage-level considerations.  If that is indeed a
> > popular interpretation at this point, then yes, we should be open to
> > considering a new name for the parameter that this patch wants to add.
>
> Well, 
> https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
> says:
>
>  "Specifying these parameters for partitioned tables is not supported,
>   but you may specify them for individual leaf partitions."
>
> If we re-purpose "parallel_workers" like this, we'd have to change this.

Right, as I mentioned in my reply, the patch will need to update the
documentation.

> Then for a normal table, "parallel_workers" would mean how many workers
> work on a parallel table scan.  For a partitioned table, it determines
> how many workers work on a parallel append.
>
> Perhaps that is similar enough that it is not confusing.

I tend to agree with that.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: A reloption for partitioned tables - parallel_workers

2021-02-16 Thread Laurenz Albe
On Tue, 2021-02-16 at 16:29 +0900, Amit Langote wrote:
> > I am +1 on allowing to override the degree of parallelism on a parallel
> > append.  If "parallel_workers" on the partitioned table is an option for
> > that, it might be a simple solution.  On the other hand, perhaps it would
> > be less confusing to have a different storage parameter name rather than
> > having "parallel_workers" do double duty.
> > Also, since there is a design rule that storage parameters can only be used
> > on partitions, we would have to change that - is that a problem for anybody?
> 
> I am not aware of a rule that suggests that parallel_workers is always
> interpreted using storage-level considerations.  If that is indeed a
> popular interpretation at this point, then yes, we should be open to
> considering a new name for the parameter that this patch wants to add.

Well, 
https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
says:

 "Specifying these parameters for partitioned tables is not supported,
  but you may specify them for individual leaf partitions."

If we re-purpose "parallel_workers" like this, we'd have to change this.

Then for a normal table, "parallel_workers" would mean how many workers
work on a parallel table scan.  For a partitioned table, it determines
how many workers work on a parallel append.

Perhaps that is similar enough that it is not confusing.

Yours,
Laurenz Albe





Re: A reloption for partitioned tables - parallel_workers

2021-02-15 Thread Amit Langote
Hi,

On Tue, Feb 16, 2021 at 1:06 AM Laurenz Albe  wrote:
> On Mon, 2021-02-15 at 17:53 +0900, Amit Langote wrote:
> > On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere  wrote:
> > > It turns out parallel_workers may be a useful reloption for certain uses 
> > > of partitioned tables,
> > >  at least if they're made up of fancy column store partitions (see
> > >  
> > > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > > Would somebody tell me what I'm doing wrong? I would love to submit a 
> > > patch but I'm stuck:
> >
> > You may see by inspecting the callers of compute_parallel_worker()
> > that it never gets called on a partitioned table, only its leaf
> > partitions.  Maybe you could try calling compute_parallel_worker()
> > somewhere in add_paths_to_append_rel(), which has this code to figure
> > out parallel_workers to use for a parallel Append path for a given
> > partitioned table:
> >
> > /* Find the highest number of workers requested for any subpath. */
> > foreach(lc, partial_subpaths)
> > {
> > Path   *path = lfirst(lc);
> >
> > parallel_workers = Max(parallel_workers, 
> > path->parallel_workers);
> > }
> > Assert(parallel_workers > 0);
> >
> > /*
> >  * If the use of parallel append is permitted, always request at 
> > least
> >  * log2(# of children) workers.  We assume it can be useful to have
> >  * extra workers in this case because they will be spread out across
> >  * the children.  The precise formula is just a guess, but we don't
> >  * want to end up with a radically different answer for a table 
> > with N
> >  * partitions vs. an unpartitioned table with the same data, so the
> >  * use of some kind of log-scaling here seems to make some sense.
> >  */
> > if (enable_parallel_append)
> > {
> > parallel_workers = Max(parallel_workers,
> >fls(list_length(live_childrels)));
> > parallel_workers = Min(parallel_workers,
> >max_parallel_workers_per_gather);
> > }
> > Assert(parallel_workers > 0);
> >
> > Note that the 'rel' in this code refers to the partitioned table for
> > which an Append path is being considered, so compute_parallel_worker()
> > using that 'rel' would use the partitioned table's
> > rel_parallel_workers as you are trying to do.
>
> Note that there is a second chunk of code quite like that one a few
> lines down from there that would also have to be modified.
>
> I am +1 on allowing to override the degree of parallelism on a parallel
> append.  If "parallel_workers" on the partitioned table is an option for
> that, it might be a simple solution.  On the other hand, perhaps it would
> be less confusing to have a different storage parameter name rather than
> having "parallel_workers" do double duty.
>
> Also, since there is a design rule that storage parameters can only be used
> on partitions, we would have to change that - is that a problem for anybody?

I am not aware of a rule that suggests that parallel_workers is always
interpreted using storage-level considerations.  If that is indeed a
popular interpretation at this point, then yes, we should be open to
considering a new name for the parameter that this patch wants to add.

Maybe parallel_append_workers?  Perhaps not a bad idea in this patch's
case, but considering that we may want to expand the support of
cross-partition parallelism to operations other than querying, maybe
something else?

This reminds me of something I forgot to mention in my review of the
patch -- it should also update the documentation of parallel_workers
on the CREATE TABLE page to mention that it will be interpreted a bit
differently for partitioned tables than for regular storage-bearing
relations.  Workers specified for partitioned tables would be
distributed by the executor over its partitions, unlike with
storage-bearing relations, where the distribution of specified workers
is controlled by the AM using storage-level considerations.

> There is another related consideration that doesn't need to be addressed
> by this patch, but that is somewhat related: if the executor prunes some
> partitions, the degree of parallelism is unaffected, right?

That's correct.  Launched workers could be less than planned, but that
would not have anything to do with executor pruning.

> So if the planner decides to use 24 workers for 25 partitions, and the
> executor discards all but one of these partition scans, we would end up
> with 24 workers scanning a single partition.

I do remember pondering this when testing my patches to improve the
performance of executing a generic plan to scan a partitioned table
where runtime pruning is possible.  Here is an example:

create table hp (a int) partition by hash (a);
select 'create table hp' || i 

Re: A reloption for partitioned tables - parallel_workers

2021-02-15 Thread Amit Langote
Hi Seamus,

Please see my reply below.

On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere  wrote:
> On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:
> > On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere  wrote:
> > > It turns out parallel_workers may be a useful reloption for certain uses 
> > > of partitioned tables, at least if they're made up of fancy column store 
> > > partitions (see 
> > > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > >
> > > Would somebody tell me what I'm doing wrong? I would love to submit a 
> > > patch but I'm stuck:
>
> > You may see by inspecting the callers of compute_parallel_worker()
> > that it never gets called on a partitioned table, only its leaf
> > partitions.  Maybe you could try calling compute_parallel_worker()
> > somewhere in add_paths_to_append_rel(), which has this code to figure
> > out parallel_workers to use for a parallel Append path for a given
> > partitioned table:
> >
> > /* Find the highest number of workers requested for any subpath. */
> > foreach(lc, partial_subpaths)
> > {
> > Path   *path = lfirst(lc);
> >
> > parallel_workers = Max(parallel_workers, 
> > path->parallel_workers);
> > }
> > Assert(parallel_workers > 0);
> >
> > /*
> >  * If the use of parallel append is permitted, always request at 
> > least
> >  * log2(# of children) workers.  We assume it can be useful to have
> >  * extra workers in this case because they will be spread out across
> >  * the children.  The precise formula is just a guess, but we don't
> >  * want to end up with a radically different answer for a table 
> > with N
> >  * partitions vs. an unpartitioned table with the same data, so the
> >  * use of some kind of log-scaling here seems to make some sense.
> >  */
> > if (enable_parallel_append)
> > {
> > parallel_workers = Max(parallel_workers,
> >fls(list_length(live_childrels)));
> > parallel_workers = Min(parallel_workers,
> >max_parallel_workers_per_gather);
> > }
> > Assert(parallel_workers > 0);
> >
> > /* Generate a partial append path. */
> > appendpath = create_append_path(root, rel, NIL, partial_subpaths,
> > NIL, NULL, parallel_workers,
> > enable_parallel_append,
> > -1);
> >
> > Note that the 'rel' in this code refers to the partitioned table for
> > which an Append path is being considered, so compute_parallel_worker()
> > using that 'rel' would use the partitioned table's
> > rel_parallel_workers as you are trying to do.
>
> Here we go, my first patch... solves 
> https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com

Thanks for sending the patch here.

It seems you haven't made enough changes for reloptions code to
recognize parallel_workers as valid for partitioned tables, because
even with the patch applied, I get this:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp set (parallel_workers = 1);
ERROR:  unrecognized parameter "parallel_workers"

You need this:

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index 029a73325e..9eb8a0c10d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
{
"parallel_workers",
"Number of parallel processes that can be used per
executor node for this relation.",
-   RELOPT_KIND_HEAP,
+   RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
ShareUpdateExclusiveLock
},
-1, 0, 1024

which tells reloptions parsing code that parallel_workers is
acceptable for both heap and partitioned relations.

Other comments on the patch:

* This function calling style, where the first argument is not placed
on the same line as the function itself, is not very common in
Postgres:

+   /* First see if there is a root-level setting for parallel_workers */
+   parallel_workers = compute_parallel_worker(
+   rel,
+   -1,
+   -1,
+   max_parallel_workers_per_gather
+

This makes the new code look very different from the rest of the
codebase.  Better to stick to existing styles.

2. It might be a good idea to use this opportunity to add a function,
say compute_append_parallel_workers(), for the code that does what the
function name says.  Then the patch will simply add the new
compute_parallel_worker() call at the top of that function.

3. I think we should 

Re: A reloption for partitioned tables - parallel_workers

2021-02-15 Thread Seamus Abshere
hi,

Here we go, my first patch... solves 
https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com

Best,
Seamus

On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:
> Hi Seamus,
> 
> On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere  wrote:
> > It turns out parallel_workers may be a useful reloption for certain uses of 
> > partitioned tables, at least if they're made up of fancy column store 
> > partitions (see 
> > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> >
> > Would somebody tell me what I'm doing wrong? I would love to submit a patch 
> > but I'm stuck:
> >
> > diff --git a/src/backend/optimizer/path/allpaths.c 
> > b/src/backend/optimizer/path/allpaths.c
> > index cd3fdd259c..f1ade035ac 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double 
> > heap_pages, double index_pages,
> >  * If the user has set the parallel_workers reloption, use that; 
> > otherwise
> >  * select a default number of workers.
> >  */
> > +   // I want to affect this
> > if (rel->rel_parallel_workers != -1)
> > parallel_workers = rel->rel_parallel_workers;
> > else
> >
> > so I do this
> >
> > diff --git a/src/backend/access/common/reloptions.c 
> > b/src/backend/access/common/reloptions.c
> > index c687d3ee9e..597b209bfb 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, 
> > Datum options, bool validate)
> >  bytea *
> >  partitioned_table_reloptions(Datum reloptions, bool validate)
> >  {
> > -   /*
> > -* There are no options for partitioned tables yet, but this is 
> > able to do
> > -* some validation.
> > -*/
> > +   static const relopt_parse_elt tab[] = {
> > +   {"parallel_workers", RELOPT_TYPE_INT,
> > +   offsetof(StdRdOptions, parallel_workers)},
> > +   };
> > +
> > return (bytea *) build_reloptions(reloptions, validate,
> >   
> > RELOPT_KIND_PARTITIONED,
> > - 
> > 0, NULL, 0);
> > + 
> > sizeof(StdRdOptions),
> > + 
> > tab, lengthof(tab));
> >  }
> >
> > That "works":
> >
> > postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 
> > 33);
> > ALTER TABLE
> > postgres=# select relname, relkind, reloptions from pg_class where relname 
> > = 'test_3pd_cstore_partitioned';
> >relname   | relkind |  reloptions
> > -+-+---
> >  test_3pd_cstore_partitioned | p   | {parallel_workers=33}
> > (1 row)
> >
> > But it seems to be ignored:
> >
> > diff --git a/src/backend/optimizer/path/allpaths.c 
> > b/src/backend/optimizer/path/allpaths.c
> > index cd3fdd259c..c68835ce38 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double 
> > heap_pages, double index_pages,
> >  * If the user has set the parallel_workers reloption, use that; 
> > otherwise
> >  * select a default number of workers.
> >  */
> > +   // I want to affect this, but this assertion always passes
> > +   Assert(rel->rel_parallel_workers == -1)
> > if (rel->rel_parallel_workers != -1)
> > parallel_workers = rel->rel_parallel_workers;
> > else
> 
> You may see by inspecting the callers of compute_parallel_worker()
> that it never gets called on a partitioned table, only its leaf
> partitions.  Maybe you could try calling compute_parallel_worker()
> somewhere in add_paths_to_append_rel(), which has this code to figure
> out parallel_workers to use for a parallel Append path for a given
> partitioned table:
> 
> /* Find the highest number of workers requested for any subpath. */
> foreach(lc, partial_subpaths)
> {
> Path   *path = lfirst(lc);
> 
> parallel_workers = Max(parallel_workers, path->parallel_workers);
> }
> Assert(parallel_workers > 0);
> 
> /*
>  * If the use of parallel append is permitted, always request at least
>  * log2(# of children) workers.  We assume it can be useful to have
>  * extra workers in this case because they will be spread out across
>  * the children.  The precise formula is just a guess, but we don't
>  * want to end up with a radically different answer for a table with N
>  

Re: A reloption for partitioned tables - parallel_workers

2021-02-15 Thread Laurenz Albe
On Mon, 2021-02-15 at 17:53 +0900, Amit Langote wrote:
> On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere  wrote:
> > It turns out parallel_workers may be a useful reloption for certain uses of 
> > partitioned tables,
> >  at least if they're made up of fancy column store partitions (see
> >  
> > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > Would somebody tell me what I'm doing wrong? I would love to submit a patch 
> > but I'm stuck:
> 
> You may see by inspecting the callers of compute_parallel_worker()
> that it never gets called on a partitioned table, only its leaf
> partitions.  Maybe you could try calling compute_parallel_worker()
> somewhere in add_paths_to_append_rel(), which has this code to figure
> out parallel_workers to use for a parallel Append path for a given
> partitioned table:
> 
> /* Find the highest number of workers requested for any subpath. */
> foreach(lc, partial_subpaths)
> {
> Path   *path = lfirst(lc);
> 
> parallel_workers = Max(parallel_workers, path->parallel_workers);
> }
> Assert(parallel_workers > 0);
> 
> /*
>  * If the use of parallel append is permitted, always request at least
>  * log2(# of children) workers.  We assume it can be useful to have
>  * extra workers in this case because they will be spread out across
>  * the children.  The precise formula is just a guess, but we don't
>  * want to end up with a radically different answer for a table with N
>  * partitions vs. an unpartitioned table with the same data, so the
>  * use of some kind of log-scaling here seems to make some sense.
>  */
> if (enable_parallel_append)
> {
> parallel_workers = Max(parallel_workers,
>fls(list_length(live_childrels)));
> parallel_workers = Min(parallel_workers,
>max_parallel_workers_per_gather);
> }
> Assert(parallel_workers > 0);
> 
> Note that the 'rel' in this code refers to the partitioned table for
> which an Append path is being considered, so compute_parallel_worker()
> using that 'rel' would use the partitioned table's
> rel_parallel_workers as you are trying to do.

Note that there is a second chunk of code quite like that one a few
lines down from there that would also have to be modified.

I am +1 on allowing to override the degree of parallelism on a parallel
append.  If "parallel_workers" on the partitioned table is an option for
that, it might be a simple solution.  On the other hand, perhaps it would
be less confusing to have a different storage parameter name rather than
having "parallel_workers" do double duty.

Also, since there is a design rule that storage parameters can only be used
on partitions, we would have to change that - is that a problem for anybody?


There is another related consideration that doesn't need to be addressed
by this patch, but that is somewhat related: if the executor prunes some
partitions, the degree of parallelism is unaffected, right?
So if the planner decides to use 24 workers for 25 partitions, and the
executor discards all but one of these partition scans, we would end up
with 24 workers scanning a single partition.

I am not sure how that could be improved.

Yours,
Laurenz Albe





Re: A reloption for partitioned tables - parallel_workers

2021-02-15 Thread Amit Langote
Hi Seamus,

On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere  wrote:
> It turns out parallel_workers may be a useful reloption for certain uses of 
> partitioned tables, at least if they're made up of fancy column store 
> partitions (see 
> https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
>
> Would somebody tell me what I'm doing wrong? I would love to submit a patch 
> but I'm stuck:
>
> diff --git a/src/backend/optimizer/path/allpaths.c 
> b/src/backend/optimizer/path/allpaths.c
> index cd3fdd259c..f1ade035ac 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double 
> heap_pages, double index_pages,
>  * If the user has set the parallel_workers reloption, use that; 
> otherwise
>  * select a default number of workers.
>  */
> +   // I want to affect this
> if (rel->rel_parallel_workers != -1)
> parallel_workers = rel->rel_parallel_workers;
> else
>
> so I do this
>
> diff --git a/src/backend/access/common/reloptions.c 
> b/src/backend/access/common/reloptions.c
> index c687d3ee9e..597b209bfb 100644
> --- a/src/backend/access/common/reloptions.c
> +++ b/src/backend/access/common/reloptions.c
> @@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum 
> options, bool validate)
>  bytea *
>  partitioned_table_reloptions(Datum reloptions, bool validate)
>  {
> -   /*
> -* There are no options for partitioned tables yet, but this is able 
> to do
> -* some validation.
> -*/
> +   static const relopt_parse_elt tab[] = {
> +   {"parallel_workers", RELOPT_TYPE_INT,
> +   offsetof(StdRdOptions, parallel_workers)},
> +   };
> +
> return (bytea *) build_reloptions(reloptions, validate,
>   
> RELOPT_KIND_PARTITIONED,
> - 0, 
> NULL, 0);
> + 
> sizeof(StdRdOptions),
> + 
> tab, lengthof(tab));
>  }
>
> That "works":
>
> postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 
> 33);
> ALTER TABLE
> postgres=# select relname, relkind, reloptions from pg_class where relname = 
> 'test_3pd_cstore_partitioned';
>relname   | relkind |  reloptions
> -+-+---
>  test_3pd_cstore_partitioned | p   | {parallel_workers=33}
> (1 row)
>
> But it seems to be ignored:
>
> diff --git a/src/backend/optimizer/path/allpaths.c 
> b/src/backend/optimizer/path/allpaths.c
> index cd3fdd259c..c68835ce38 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double 
> heap_pages, double index_pages,
>  * If the user has set the parallel_workers reloption, use that; 
> otherwise
>  * select a default number of workers.
>  */
> +   // I want to affect this, but this assertion always passes
> +   Assert(rel->rel_parallel_workers == -1)
> if (rel->rel_parallel_workers != -1)
> parallel_workers = rel->rel_parallel_workers;
> else

You may see by inspecting the callers of compute_parallel_worker()
that it never gets called on a partitioned table, only its leaf
partitions.  Maybe you could try calling compute_parallel_worker()
somewhere in add_paths_to_append_rel(), which has this code to figure
out parallel_workers to use for a parallel Append path for a given
partitioned table:

/* Find the highest number of workers requested for any subpath. */
foreach(lc, partial_subpaths)
{
Path   *path = lfirst(lc);

parallel_workers = Max(parallel_workers, path->parallel_workers);
}
Assert(parallel_workers > 0);

/*
 * If the use of parallel append is permitted, always request at least
 * log2(# of children) workers.  We assume it can be useful to have
 * extra workers in this case because they will be spread out across
 * the children.  The precise formula is just a guess, but we don't
 * want to end up with a radically different answer for a table with N
 * partitions vs. an unpartitioned table with the same data, so the
 * use of some kind of log-scaling here seems to make some sense.
 */
if (enable_parallel_append)
{
parallel_workers = Max(parallel_workers,
   fls(list_length(live_childrels)));
parallel_workers = Min(parallel_workers,