Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-13 Thread Konstantin Knizhnik



On 11.11.2017 23:29, Konstantin Knizhnik wrote:

On 10/27/2017 02:01 PM, Jeevan Chalke wrote:

Hi,

Attached new patch-set here. Changes include:

1. Added separate patch for costing Append node as discussed up-front 
in the

patch-set.
2. Since we now cost Append node, we don't need 
partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main 
implementation

patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

Thanks


I applied partition-wise-agg-v6.tar.gz patch to  the master and use 
shard.sh example from 
https://www.postgresql.org/message-id/14577.1509723225%40localhost

Plan for count(*) is the following:

shard=# explain select count(*) from orders;
  QUERY PLAN
--- 


 Finalize Aggregate  (cost=100415.29..100415.30 rows=1 width=8)
   ->  Append  (cost=50207.63..100415.29 rows=2 width=8)
 ->  Partial Aggregate  (cost=50207.63..50207.64 rows=1 width=8)
   ->  Foreign Scan on orders_0 (cost=101.00..50195.13 
rows=5000 width=0)

 ->  Partial Aggregate  (cost=50207.63..50207.64 rows=1 width=8)
   ->  Foreign Scan on orders_1 (cost=101.00..50195.13 
rows=5000 width=0)



We really calculate partial aggregate for each partition, but to do we 
still have to fetch all data from remote host.

So for foreign partitions such plans is absolutely inefficient.
Amy be it should be combined with some other patch?
For example, with  agg_pushdown_v4.tgz patch 
https://www.postgresql.org/message-id/14577.1509723225%40localhost ?

But it is not applied after partition-wise-agg-v6.tar.gz patch.
Also postgres_fdw in 11dev is able to push down aggregates without 
agg_pushdown_v4.tgz patch.


In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch
there is the following check:

 /* Partial aggregates are not supported. */
+   if (extra->isPartial)
+   return;

If we just comment this line then produced plan will be the following:

shard=# explain select sum(product_id) from orders;
   QUERY PLAN

 Finalize Aggregate  (cost=308.41..308.42 rows=1 width=8)
   ->  Append  (cost=144.18..308.41 rows=2 width=8)
 ->  Foreign Scan  (cost=144.18..154.20 rows=1 width=8)
   Relations: Aggregate on (public.orders_0 orders)
 ->  Foreign Scan  (cost=144.18..154.20 rows=1 width=8)
   Relations: Aggregate on (public.orders_1 orders)
(6 rows)

And it is actually desired plan!
Obviously such approach will not always work. FDW really doesn't 
support partial aggregates now.
But for most frequently used aggregates: sum, min, max, count 
aggtype==aggtranstype and there is no difference

between partial and normal aggregate calculation.
So instead of (extra->isPartial) condition we can add more complex 
check which will traverse pathtarget expressions and
check if it can be evaluated in this way. Or... extend FDW API to 
support partial aggregation.


But even the last plan is not ideal: it will calculate predicates at 
each remote node sequentially.

There is parallel append patch:
https://www.postgresql.org/message-id/CAJ3gD9ctEcrVUmpY6fq_JUB6WDKGXAGd70EY68jVFA4kxMbKeQ%40mail.gmail.com 

but ... FDW doesn't support parallel scan, so parallel append can not 
be applied in this case.
And we actually do not need parallel append with all its dynamic 
workers here.
We just need to start commands at all remote servers and only after it 
fetch results (which can be done sequentially).


I am investigating problem of efficient execution of OLAP queries on 
sharded tables (tables with remote partitions).

After reading all this threads and corresponding  patches, it seems to me
that we already have most of parts of the puzzle, what we need is to 
put them on right places and may be add missed ones.

I wonder if somebody is busy with it and can I somehow help here?

Also I am not quite sure about the best approach with parallel 
execution of distributed query at all nodes.
Should we make postgres_fdw parallel safe and use parallel append? How 
difficult it will be?
Or in addition to parallel append we should also have "asynchronous 
append" which will be able to initiate execution at all nodes?
It seems to be close to merge append, because it should simultaneously 
traverse all cursors.


Looks like second approach is easier for implementation. But in case 
of sharded table, distributed query may need to traverse both remote
and local shards and this approach doesn't allow to processed several 
local shards in parallel.




I attach small patch for postgres_fdw.c which allows concurrent 
execution of aggregates by all remote servers (when them are accessed 
through postgres_fdw).
I have added "postgres_fdw.use_prefetch" GUC to enable/disable 
prefetching data in postgres_fdw.


Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-11 Thread Konstantin Knizhnik

On 10/27/2017 02:01 PM, Jeevan Chalke wrote:

Hi,

Attached new patch-set here. Changes include:

1. Added separate patch for costing Append node as discussed up-front in the
patch-set.
2. Since we now cost Append node, we don't need partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

Thanks


I applied partition-wise-agg-v6.tar.gz patch to  the master and use shard.sh 
example from https://www.postgresql.org/message-id/14577.1509723225%40localhost
Plan for count(*) is the following:

shard=# explain select count(*) from orders;
  QUERY PLAN
---
 Finalize Aggregate  (cost=100415.29..100415.30 rows=1 width=8)
   ->  Append  (cost=50207.63..100415.29 rows=2 width=8)
 ->  Partial Aggregate  (cost=50207.63..50207.64 rows=1 width=8)
   ->  Foreign Scan on orders_0 (cost=101.00..50195.13 rows=5000 
width=0)
 ->  Partial Aggregate  (cost=50207.63..50207.64 rows=1 width=8)
   ->  Foreign Scan on orders_1 (cost=101.00..50195.13 rows=5000 
width=0)


We really calculate partial aggregate for each partition, but to do we still 
have to fetch all data from remote host.
So for foreign partitions such plans is absolutely inefficient.
Amy be it should be combined with some other patch?
For example, with  agg_pushdown_v4.tgz patch 
https://www.postgresql.org/message-id/14577.1509723225%40localhost ?
But it is not applied after partition-wise-agg-v6.tar.gz patch.
Also postgres_fdw in 11dev is able to push down aggregates without 
agg_pushdown_v4.tgz patch.

In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch
there is the following check:

 /* Partial aggregates are not supported. */
+   if (extra->isPartial)
+   return;

If we just comment this line then produced plan will be the following:

shard=# explain select sum(product_id) from orders;
   QUERY PLAN

 Finalize Aggregate  (cost=308.41..308.42 rows=1 width=8)
   ->  Append  (cost=144.18..308.41 rows=2 width=8)
 ->  Foreign Scan  (cost=144.18..154.20 rows=1 width=8)
   Relations: Aggregate on (public.orders_0 orders)
 ->  Foreign Scan  (cost=144.18..154.20 rows=1 width=8)
   Relations: Aggregate on (public.orders_1 orders)
(6 rows)

And it is actually desired plan!
Obviously such approach will not always work. FDW really doesn't support 
partial aggregates now.
But for most frequently used aggregates: sum, min, max, count 
aggtype==aggtranstype and there is no difference
between partial and normal aggregate calculation.
So instead of (extra->isPartial) condition we can add more complex check which 
will traverse pathtarget expressions and
check if it can be evaluated in this way. Or... extend FDW API to support 
partial aggregation.

But even the last plan is not ideal: it will calculate predicates at each 
remote node sequentially.
There is parallel append patch:
https://www.postgresql.org/message-id/CAJ3gD9ctEcrVUmpY6fq_JUB6WDKGXAGd70EY68jVFA4kxMbKeQ%40mail.gmail.com
but ... FDW doesn't support parallel scan, so parallel append can not be 
applied in this case.
And we actually do not need parallel append with all its dynamic workers here.
We just need to start commands at all remote servers and only after it fetch 
results (which can be done sequentially).

I am investigating problem of efficient execution of OLAP queries on sharded 
tables (tables with remote partitions).
After reading all this threads and corresponding  patches, it seems to me
that we already have most of parts of the puzzle, what we need is to put them 
on right places and may be add missed ones.
I wonder if somebody is busy with it and can I somehow help here?

Also I am not quite sure about the best approach with parallel execution of 
distributed query at all nodes.
Should we make postgres_fdw parallel safe and use parallel append? How 
difficult it will be?
Or in addition to parallel append we should also have "asynchronous append" 
which will be able to initiate execution at all nodes?
It seems to be close to merge append, because it should simultaneously traverse 
all cursors.

Looks like second approach is easier for implementation. But in case of sharded 
table, distributed query may need to traverse both remote
and local shards and this approach doesn't allow to processed several local 
shards in parallel.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-01 Thread Robert Haas
On Wed, Nov 1, 2017 at 6:20 PM, Jeevan Chalke
 wrote:
> Yep.
> But as David reported earlier, if we remove the first part i.e. adding
> cpu_operator_cost per tuple, Merge Append will be preferred over an Append
> node unlike before. And thus, I thought of better having both, but no so
> sure. Should we remove that part altogether, or add both in a single
> statement with updated comments?

I was only suggesting that you update the comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-01 Thread Jeevan Chalke
On Sat, Oct 28, 2017 at 3:07 PM, Robert Haas  wrote:

> On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke
>  wrote:
> > 1. Added separate patch for costing Append node as discussed up-front in
> the
> > patch-set.
> > 2. Since we now cost Append node, we don't need
> > partition_wise_agg_cost_factor
> > GUC. So removed that. The remaining patch hence merged into main
> > implementation
> > patch.
> > 3. Updated rows in test-cases so that we will get partition-wise plans.
>
> With 0006 applied, cost_merge_append() is now a little bit confused:
>
> /*
>  * Also charge a small amount (arbitrarily set equal to operator cost)
> per
>  * extracted tuple.  We don't charge cpu_tuple_cost because a
> MergeAppend
>  * node doesn't do qual-checking or projection, so it has less overhead
>  * than most plan nodes.
>  */
> run_cost += cpu_operator_cost * tuples;
>
> /* Add MergeAppend node overhead like we do it for the Append node */
> run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
>
> The first comment says that we don't add cpu_tuple_cost, and the
> second one then adds half of it anyway.
>

Yep.
But as David reported earlier, if we remove the first part i.e. adding
cpu_operator_cost per tuple, Merge Append will be preferred over an Append
node unlike before. And thus, I thought of better having both, but no so
sure. Should we remove that part altogether, or add both in a single
statement with updated comments?


> I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR,
> because as you say it's used twice, but I don't think that should be
> exposed in cost.h; I'd make it private to costsize.c and rename it to
> something like APPEND_CPU_COST_MULTIPLIER.  The word DEFAULT, in
> particular, seems useless to me, since there's no provision for it to
> be overridden by a different value.
>

Agree. Will make that change.


>
> What testing, if any, can we think about doing with this plan to make
> sure it doesn't regress things?  For example, if we do a TPC-H run
> with partitioned tables and partition-wise join enabled, will any
> plans change with this patch?


I have tried doing this on my local developer machine. For 1GB database
size (tpc-h scale factor 1), I see no plan change with and without this
patch.

I have tried with scale factor 10, but query is not executing well due to
space and memory constraints. Can someone try out that?


>   Do they get faster or not?  Anyone have
> other ideas for what to test?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-28 Thread Robert Haas
On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke
 wrote:
> 1. Added separate patch for costing Append node as discussed up-front in the
> patch-set.
> 2. Since we now cost Append node, we don't need
> partition_wise_agg_cost_factor
> GUC. So removed that. The remaining patch hence merged into main
> implementation
> patch.
> 3. Updated rows in test-cases so that we will get partition-wise plans.

With 0006 applied, cost_merge_append() is now a little bit confused:

/*
 * Also charge a small amount (arbitrarily set equal to operator cost) per
 * extracted tuple.  We don't charge cpu_tuple_cost because a MergeAppend
 * node doesn't do qual-checking or projection, so it has less overhead
 * than most plan nodes.
 */
run_cost += cpu_operator_cost * tuples;

/* Add MergeAppend node overhead like we do it for the Append node */
run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;

The first comment says that we don't add cpu_tuple_cost, and the
second one then adds half of it anyway.

I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR,
because as you say it's used twice, but I don't think that should be
exposed in cost.h; I'd make it private to costsize.c and rename it to
something like APPEND_CPU_COST_MULTIPLIER.  The word DEFAULT, in
particular, seems useless to me, since there's no provision for it to
be overridden by a different value.

What testing, if any, can we think about doing with this plan to make
sure it doesn't regress things?  For example, if we do a TPC-H run
with partitioned tables and partition-wise join enabled, will any
plans change with this patch?  Do they get faster or not?  Anyone have
other ideas for what to test?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-27 Thread Jeevan Chalke
Hi,

Attached new patch-set here. Changes include:

1. Added separate patch for costing Append node as discussed up-front in the
patch-set.
2. Since we now cost Append node, we don't need
partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main
implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

Thanks


On Wed, Oct 18, 2017 at 9:53 AM, Dilip Kumar  wrote:

> On Tue, Oct 17, 2017 at 10:44 PM, Jeevan Chalke
>  wrote:
> >
>
> > I didn't get what you mean by regression here. Can you please explain?
> >
> > I see that PWA plan is selected over regular plan when enabled on the
> basis
> > of costing.
> > Regular planning need a Result node due to which costing increases where
> as
> > PWA don't need that and thus wins.
>
> Sorry for not clearly explaining,  I meant that with normal plan
> execution time is 263.678 ms whereas with PWA its 339.929 ms.
>
> I only set enable_partition_wise_agg=on and it switched to PWA and
> execution time increased by 30%.
> I understand that the this is the worst case for PWA where
> FinalizeAggregate is getting all the tuple.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v6.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-17 Thread Dilip Kumar
On Tue, Oct 17, 2017 at 10:44 PM, Jeevan Chalke
 wrote:
>

> I didn't get what you mean by regression here. Can you please explain?
>
> I see that PWA plan is selected over regular plan when enabled on the basis
> of costing.
> Regular planning need a Result node due to which costing increases where as
> PWA don't need that and thus wins.

Sorry for not clearly explaining,  I meant that with normal plan
execution time is 263.678 ms whereas with PWA its 339.929 ms.

I only set enable_partition_wise_agg=on and it switched to PWA and
execution time increased by 30%.
I understand that the this is the worst case for PWA where
FinalizeAggregate is getting all the tuple.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-17 Thread Jeevan Chalke
On Tue, Oct 17, 2017 at 7:13 PM, Dilip Kumar  wrote:

> On Fri, Oct 13, 2017 at 12:06 PM, Jeevan Chalke
>  wrote:
> >
> While playing around with the patch I have noticed one regression with
> the partial partition-wise aggregate.
>
> I am consistently able to reproduce this on my local machine.
>
> Scenario: Group by on non-key column and only one tuple per group.
>

I didn't get what you mean by regression here. Can you please explain?

I see that PWA plan is selected over regular plan when enabled on the basis
of costing.
Regular planning need a Result node due to which costing increases where as
PWA don't need that and thus wins.


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-17 Thread Dilip Kumar
On Fri, Oct 13, 2017 at 12:06 PM, Jeevan Chalke
 wrote:
>
While playing around with the patch I have noticed one regression with
the partial partition-wise aggregate.

I am consistently able to reproduce this on my local machine.

Scenario: Group by on non-key column and only one tuple per group.

Complete Test:

create table t(a int,b int) partition by range(a);
create table t1 partition of t for values from (1) to (10);
create table t2 partition of t for values from (10) to (20);

insert into t values (generate_series(1,19),generate_series(1, 19));
postgres=# explain analyze select sum(a) from t group by b;
  QUERY
PLAN
--
 Finalize GroupAggregate  (cost=20379.55..28379.51 rows=19
width=12) (actual time=102.311..322.969 rows=19 loops=1)
   Group Key: t1.b
   ->  Merge Append  (cost=20379.55..25379.53 rows=19 width=12)
(actual time=102.303..232.310 rows=19 loops=1)
 Sort Key: t1.b
 ->  Partial GroupAggregate  (cost=10189.72..11939.70
rows=9 width=12) (actual time=52.164..108.967 rows=9 loops=1)
   Group Key: t1.b
   ->  Sort  (cost=10189.72..10439.72 rows=9 width=8)
(actual time=52.158..66.236 rows=9 loops=1)
 Sort Key: t1.b
 Sort Method: external merge  Disk: 1768kB
 ->  Seq Scan on t1  (cost=0.00..1884.99
rows=9 width=8) (actual time=0.860..20.388 rows=9 loops=1)
 ->  Partial GroupAggregate  (cost=10189.82..11939.82
rows=10 width=12) (actual time=50.134..102.976 rows=10
loops=1)
   Group Key: t2.b
   ->  Sort  (cost=10189.82..10439.82 rows=10 width=8)
(actual time=50.128..63.362 rows=10 loops=1)
 Sort Key: t2.b
 Sort Method: external merge  Disk: 1768kB
 ->  Seq Scan on t2  (cost=0.00..1885.00
rows=10 width=8) (actual time=0.498..20.977 rows=10 loops=1)
 Planning time: 0.190 ms
 Execution time: 339.929 ms
(18 rows)

postgres=# set enable_partition_wise_agg=off;
SET
postgres=# explain analyze select sum(a) from t group by b;
QUERY PLAN
--
 GroupAggregate  (cost=26116.53..29616.51 rows=19 width=12)
(actual time=139.413..250.751 rows=19 loops=1)
   Group Key: t1.b
   ->  Sort  (cost=26116.53..26616.52 rows=19 width=8) (actual
time=139.406..168.775 rows=19 loops=1)
 Sort Key: t1.b
 Sort Method: external merge  Disk: 3544kB
 ->  Result  (cost=0.00..5769.98 rows=19 width=8) (actual
time=0.674..76.392 rows=19 loops=1)
   ->  Append  (cost=0.00..3769.99 rows=19 width=8)
(actual time=0.672..40.291 rows=19 loops=1)
 ->  Seq Scan on t1  (cost=0.00..1884.99
rows=9 width=8) (actual time=0.672..12.408 rows=9 loops=1)
 ->  Seq Scan on t2  (cost=0.00..1885.00
rows=10 width=8) (actual time=1.407..11.689 rows=10 loops=1)
 Planning time: 0.146 ms
 Execution time: 263.678 ms
(11 rows)


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-16 Thread Jeevan Chalke
On Fri, Oct 13, 2017 at 1:13 PM, David Rowley 
wrote:

>
> I looked over the patch and saw this:
>
> @@ -1800,6 +1827,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
>   */
>   run_cost += cpu_operator_cost * tuples;
>
> + /* Add MergeAppend node overhead like we do it for the Append node */
> + run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
> +
>   path->startup_cost = startup_cost + input_startup_cost;
>   path->total_cost = startup_cost + run_cost + input_total_cost;
>  }
>
> You're doing that right after a comment that says we don't do that. It
> also does look like the "run_cost += cpu_operator_cost * tuples" is
> trying to do the same thing, so perhaps it's worth just replacing
> that, which by default will double that additional cost, although
> doing so would have the planner slightly prefer a MergeAppend to an
> Append than previously.
>

I think we can remove that code block entirely. I have added relevant
comments
around DEFAULT_APPEND_COST_FACTOR already.
However, I am not sure of doing this as you correctly said it may prefer
MergeAppend to an Append. Will it be fine we remove that code block?


> +#define DEFAULT_APPEND_COST_FACTOR 0.5
>
> I don't really think the DEFAULT_APPEND_COST_FACTOR adds much. it
> means very little by itself. It also seems that most of the other cost
> functions just use the magic number.
>

Agree, but those magic numbers used only once at that place. But here there
are two places. So if someone wants to update it, (s)he needs to make sure
to update that at two places. To minimize that risk, having a #define seems
better.


>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-13 Thread David Rowley
On 13 October 2017 at 19:36, Jeevan Chalke
 wrote:
> I have tried exactly same tests to get to this factor on my local developer
> machine. And with parallelism enabled I got this number as 7.9. However, if
> I disable the parallelism (and I believe David too disabled that), I get
> this number as 1.8. Whereas for 1 rows, I get this number to 1.7
>
> -- With Gather
> # select current_Setting('cpu_tuple_cost')::float8 / ((10633.56 * (81.035 /
> 72.450) - 10633.56)  / 100);
> 7.9
>
> -- Without Gather
> # select current_Setting('cpu_tuple_cost')::float8 / ((16925.01 * (172.838 /
> 131.400) - 16925.01)  / 100);
> 1.8
>
> -- With 1 rows (so no Gather too)
> # select current_Setting('cpu_tuple_cost')::float8 / ((170.01 * (1.919 /
> 1.424) - 170.01)  / 1);
> 1.7
>
> So it is not so straight forward to come up the correct heuristic here. Thus
> using 50% of cpu_tuple_cost look good to me here.
>
> As suggested by Ashutosh and Robert, attached separate small WIP patch for
> it.

Good to see it stays fairly consistent at different tuple counts, and
is not too far away from what I got on this machine.

I looked over the patch and saw this:

@@ -1800,6 +1827,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
  */
  run_cost += cpu_operator_cost * tuples;

+ /* Add MergeAppend node overhead like we do it for the Append node */
+ run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
  path->startup_cost = startup_cost + input_startup_cost;
  path->total_cost = startup_cost + run_cost + input_total_cost;
 }

You're doing that right after a comment that says we don't do that. It
also does look like the "run_cost += cpu_operator_cost * tuples" is
trying to do the same thing, so perhaps it's worth just replacing
that, which by default will double that additional cost, although
doing so would have the planner slightly prefer a MergeAppend to an
Append than previously.

+#define DEFAULT_APPEND_COST_FACTOR 0.5

I don't really think the DEFAULT_APPEND_COST_FACTOR adds much. it
means very little by itself. It also seems that most of the other cost
functions just use the magic number.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-13 Thread Jeevan Chalke
On Tue, Oct 10, 2017 at 1:31 PM, David Rowley 
wrote:

> On 10 October 2017 at 17:57, Ashutosh Bapat
>  wrote:
> > Append node just returns the result of ExecProcNode(). Charging
> > cpu_tuple_cost may make it too expensive. In other places where we
> > charge cpu_tuple_cost there's some processing done to the tuple like
> > ExecStoreTuple() in SeqNext(). May be we need some other measure for
> > Append's processing of the tuple.
>
> I don't think there's any need to invent any new GUC. You could just
> divide cpu_tuple_cost by something.
>
> I did a quick benchmark on my laptop to see how much Append really
> costs, and with the standard costs the actual cost seems to be about
> cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
> realistic. create_set_projection_path() does something similar and
> brincostestimate() does some similar magic and applies 0.1 *
> cpu_operator_cost to the total cost.
>
> # create table p (a int, b int);
> # create table p1 () inherits (p);
> # insert into p1 select generate_series(1,100);
> # vacuum analyze p1;
> # \q
> $ echo "select count(*) from p1;" > p1.sql
> $ echo "select count(*) from p;" > p.sql
> $ pgbench -T 60 -f p1.sql -n
>
> latency average = 58.567 ms
>
> $ pgbench -T 60 -f p.sql -n
> latency average = 72.984 ms
>
> $ psql
> psql (11devel)
> Type "help" for help.
>
> # -- check the cost of the plan.
> # explain select count(*) from p1;
> QUERY PLAN
> --
>  Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
>->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
> (2 rows)
>
> # -- selecting from the parent is the same due to zero Append cost.
> # explain select count(*) from p;
>QUERY PLAN
> 
>  Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
>->  Append  (cost=0.00..14425.00 rows=101 width=0)
>  ->  Seq Scan on p  (cost=0.00..0.00 rows=1 width=0)
>  ->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
> (4 rows)
>
> # -- extrapolate the additional time taken for the Append scan and
> work out what the planner
> # -- should add to the plan's cost, then divide by the number of rows
> in p1 to work out the
> # -- tuple cost of pulling a row through the append.
> # select (16925.01 * (72.984 / 58.567) - 16925.01)  / 100;
> ?column?
> 
>  0.00416630302337493743
> (1 row)
>
> # show cpu_tuple_cost;
>  cpu_tuple_cost
> 
>  0.01
> (1 row)
>
> # -- How does that compare to the cpu_tuple_cost?
> # select current_Setting('cpu_tuple_cost')::float8 /
> 0.00416630302337493743;
> ?column?
> 
>  2.400209476818
> (1 row)
>
> Maybe it's worth trying with different row counts to see if the
> additional cost is consistent, but it's probably not worth being too
> critical here.
>

I have tried exactly same tests to get to this factor on my local developer
machine. And with parallelism enabled I got this number as 7.9. However, if
I disable the parallelism (and I believe David too disabled that), I get
this number as 1.8. Whereas for 1 rows, I get this number to 1.7

-- With Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((10633.56 * (81.035 /
72.450) - 10633.56)  / 100);
7.9

-- Without Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((16925.01 * (172.838
/ 131.400) - 16925.01)  / 100);
1.8

-- With 1 rows (so no Gather too)
# select current_Setting('cpu_tuple_cost')::float8 / ((170.01 * (1.919 /
1.424) - 170.01)  / 1);
1.7

So it is not so straight forward to come up the correct heuristic here.
Thus using 50% of cpu_tuple_cost look good to me here.

As suggested by Ashutosh and Robert, attached separate small WIP patch for
it.

I think it will be better if we take this topic on another mail-thread.
Do you agree?


>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a4..c0bf602 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1742,6 +1742,33 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * Though Append doesn't do any selection or projection, it's not free.  So we
+ * try to add cost per input tuple which is arbitrarily calculated as
+ * DEFAULT_APPEND_COST_FACTOR * cpu_tuple_cost.
+ *
+ * 'input_startup_cost' is the sum of the input streams' startup costs
+ * 'input_total_cost' is the sum of the input streams' 

Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-11 Thread Robert Haas
On Tue, Oct 10, 2017 at 6:00 AM, Ashutosh Bapat
 wrote:
> This looks good to me. I think it should be a separate, yet very small patch.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-10 Thread Ashutosh Bapat
On Tue, Oct 10, 2017 at 1:31 PM, David Rowley
 wrote:
>
> I don't think there's any need to invent any new GUC. You could just
> divide cpu_tuple_cost by something.
>
> I did a quick benchmark on my laptop to see how much Append really
> costs, and with the standard costs the actual cost seems to be about
> cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
> realistic. create_set_projection_path() does something similar and
> brincostestimate() does some similar magic and applies 0.1 *
> cpu_operator_cost to the total cost.
>
>
> # -- How does that compare to the cpu_tuple_cost?
> # select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743;
> ?column?
> 
>  2.400209476818
> (1 row)
>
> Maybe it's worth trying with different row counts to see if the
> additional cost is consistent, but it's probably not worth being too
> critical here.
>

This looks good to me. I think it should be a separate, yet very small patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-10 Thread David Rowley
On 10 October 2017 at 17:57, Ashutosh Bapat
 wrote:
> Append node just returns the result of ExecProcNode(). Charging
> cpu_tuple_cost may make it too expensive. In other places where we
> charge cpu_tuple_cost there's some processing done to the tuple like
> ExecStoreTuple() in SeqNext(). May be we need some other measure for
> Append's processing of the tuple.

I don't think there's any need to invent any new GUC. You could just
divide cpu_tuple_cost by something.

I did a quick benchmark on my laptop to see how much Append really
costs, and with the standard costs the actual cost seems to be about
cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
realistic. create_set_projection_path() does something similar and
brincostestimate() does some similar magic and applies 0.1 *
cpu_operator_cost to the total cost.

# create table p (a int, b int);
# create table p1 () inherits (p);
# insert into p1 select generate_series(1,100);
# vacuum analyze p1;
# \q
$ echo "select count(*) from p1;" > p1.sql
$ echo "select count(*) from p;" > p.sql
$ pgbench -T 60 -f p1.sql -n

latency average = 58.567 ms

$ pgbench -T 60 -f p.sql -n
latency average = 72.984 ms

$ psql
psql (11devel)
Type "help" for help.

# -- check the cost of the plan.
# explain select count(*) from p1;
QUERY PLAN
--
 Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
   ->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
(2 rows)

# -- selecting from the parent is the same due to zero Append cost.
# explain select count(*) from p;
   QUERY PLAN

 Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
   ->  Append  (cost=0.00..14425.00 rows=101 width=0)
 ->  Seq Scan on p  (cost=0.00..0.00 rows=1 width=0)
 ->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
(4 rows)

# -- extrapolate the additional time taken for the Append scan and
work out what the planner
# -- should add to the plan's cost, then divide by the number of rows
in p1 to work out the
# -- tuple cost of pulling a row through the append.
# select (16925.01 * (72.984 / 58.567) - 16925.01)  / 100;
?column?

 0.00416630302337493743
(1 row)

# show cpu_tuple_cost;
 cpu_tuple_cost

 0.01
(1 row)

# -- How does that compare to the cpu_tuple_cost?
# select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743;
?column?

 2.400209476818
(1 row)

Maybe it's worth trying with different row counts to see if the
additional cost is consistent, but it's probably not worth being too
critical here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-10 Thread Jeevan Chalke
On Tue, Oct 10, 2017 at 10:27 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Tue, Oct 10, 2017 at 3:15 AM, David Rowley
>  wrote:
> > On 10 October 2017 at 01:10, Jeevan Chalke
> >  wrote:
> >> Attached new patch set having HEAD at 84ad4b0 with all these review
> points
> >> fixed. Let me know if I missed any thanks.
> >
> > I've only really skimmed over this thread and only opened the code
> > enough to extract the following:
> >
> > + /* Multiply the costs by partition_wise_agg_cost_factor. */
> > + apath->startup_cost *= partition_wise_agg_cost_factor;
> > + apath->total_cost *= partition_wise_agg_cost_factor;
> >
> > I've not studied how all the path plumbing is done, but I think
> > instead of doing this costing magic we should really stop pretending
> > that Append/MergeAppend nodes are cost-free. I think something like
> > charging cpu_tuple_cost per row expected through Append/MergeAppend
> > would be a better approach to this.
> >
> > If you perform grouping or partial grouping before the Append, then in
> > most cases the Append will receive less rows, so come out cheaper than
> > if you perform the grouping after it. I've not learned the
> > partition-wise join code enough to know if this is going to affect
> > that too, but for everything else, there should be no plan change,
> > since there's normally no alternative paths. I see there's even a
> > comment in create_append_path() which claims the zero cost is a bit
> > optimistic.
> >
>
> +1.


Yes. Me and Ashutosh had a thought on this offlist that we will need to
cost Append node too as having an extra GUC to control this is not a good
idea per se. Thanks for your vote too.

I will try doing this and will see how plan goes with it.

Partition-wise join will also benefit from costing Append
> processing. Number of rows * width of join result compared with the
> sum of that measure for joining relations decides whether Append node
> processes more data in Append->Join case than Join->Append case.
>
> Append node just returns the result of ExecProcNode(). Charging
> cpu_tuple_cost may make it too expensive. In other places where we
> charge cpu_tuple_cost there's some processing done to the tuple like
> ExecStoreTuple() in SeqNext(). May be we need some other measure for
> Append's processing of the tuple.
>
> May be we should try to measure the actual time spent in Append node
> as a fraction of say time spent in child seq scans. That might give us
> a clue as to how Append processing can be charged in terms of costing.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-09 Thread Ashutosh Bapat
On Tue, Oct 10, 2017 at 3:15 AM, David Rowley
 wrote:
> On 10 October 2017 at 01:10, Jeevan Chalke
>  wrote:
>> Attached new patch set having HEAD at 84ad4b0 with all these review points
>> fixed. Let me know if I missed any thanks.
>
> I've only really skimmed over this thread and only opened the code
> enough to extract the following:
>
> + /* Multiply the costs by partition_wise_agg_cost_factor. */
> + apath->startup_cost *= partition_wise_agg_cost_factor;
> + apath->total_cost *= partition_wise_agg_cost_factor;
>
> I've not studied how all the path plumbing is done, but I think
> instead of doing this costing magic we should really stop pretending
> that Append/MergeAppend nodes are cost-free. I think something like
> charging cpu_tuple_cost per row expected through Append/MergeAppend
> would be a better approach to this.
>
> If you perform grouping or partial grouping before the Append, then in
> most cases the Append will receive less rows, so come out cheaper than
> if you perform the grouping after it. I've not learned the
> partition-wise join code enough to know if this is going to affect
> that too, but for everything else, there should be no plan change,
> since there's normally no alternative paths. I see there's even a
> comment in create_append_path() which claims the zero cost is a bit
> optimistic.
>

+1. Partition-wise join will also benefit from costing Append
processing. Number of rows * width of join result compared with the
sum of that measure for joining relations decides whether Append node
processes more data in Append->Join case than Join->Append case.

Append node just returns the result of ExecProcNode(). Charging
cpu_tuple_cost may make it too expensive. In other places where we
charge cpu_tuple_cost there's some processing done to the tuple like
ExecStoreTuple() in SeqNext(). May be we need some other measure for
Append's processing of the tuple.

May be we should try to measure the actual time spent in Append node
as a fraction of say time spent in child seq scans. That might give us
a clue as to how Append processing can be charged in terms of costing.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-09 Thread David Rowley
On 10 October 2017 at 01:10, Jeevan Chalke
 wrote:
> Attached new patch set having HEAD at 84ad4b0 with all these review points
> fixed. Let me know if I missed any thanks.

I've only really skimmed over this thread and only opened the code
enough to extract the following:

+ /* Multiply the costs by partition_wise_agg_cost_factor. */
+ apath->startup_cost *= partition_wise_agg_cost_factor;
+ apath->total_cost *= partition_wise_agg_cost_factor;

I've not studied how all the path plumbing is done, but I think
instead of doing this costing magic we should really stop pretending
that Append/MergeAppend nodes are cost-free. I think something like
charging cpu_tuple_cost per row expected through Append/MergeAppend
would be a better approach to this.

If you perform grouping or partial grouping before the Append, then in
most cases the Append will receive less rows, so come out cheaper than
if you perform the grouping after it. I've not learned the
partition-wise join code enough to know if this is going to affect
that too, but for everything else, there should be no plan change,
since there's normally no alternative paths. I see there's even a
comment in create_append_path() which claims the zero cost is a bit
optimistic.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-09 Thread Jeevan Chalke
On Thu, Sep 28, 2017 at 3:12 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
> Here are comments on 0004 from last patch set. But most the comments
> still apply.
>

Thank you, Ashutosh for reviewing.


>
> Patch 0001 adds functions create_hash_agg_path() and
> create_sort_agg_path().
> Patch 0004 adds a new argument to those functions for conditions in HAVING
> clause. We should move those changes to 0001 and pass parse->havingQual to
> these functions in 0001 itself. That will keep all changes to those
> functions
> together and also make 0003 small.
>

Done.


> The prologue of try_partition_wise_grouping() mentions a restriction of
> partition keys being leading group by clauses. This restriction is not
> specified in the prologue of have_grouping_by_partkey(), which actually
> checks
> for this restriction. The requirement per prologue of that function is
> just to
> have partition keys in group clause. I think have_grouping_by_partkey() is
> correct, and we should change prologue of try_partition_wise_grouping() to
> be
> in sync with have_grouping_by_partkey().


Done.

The prologue explains why
> partition-wise aggregation/grouping would be efficient with this
> restriction,
> but it doesn't explain why partial aggregation/grouping per partition
> would be
> efficient. May be it should do that as well. Similar is the case with
> partial
> aggregation/grouping discussion in README.
>

I have tried updating it. Please check.


> +/* Do not handle grouping sets for now. */
> Is this a real restriction or just restriction for first cut of this
> feature?
> Can you please add more explanation? May be update README as well?
>

Grouping sets plan does not work with an inheritance subtree (see notes in
create_groupingsets_plan). Thus grouping sets are not handled here.


>
> +grouped_rel->part_scheme = input_rel->part_scheme;
> Is this true even for partial aggregates? I think not. Since group by
> clause
> does not contain partition keys, the rows from multiple partitions
> participate
> in one group and thus the partition keys of input relation do not apply to
> the
> grouped relation. In this case, it seems that the grouped rel will have
> part_rels but will not be partitioned.
>

I have removed this as your analysis is correct. grouped_rel is not
partitioned.


> +/*
> + * If there is no path for the child relation, then we cannot add
> + * aggregate path too.
> + */
> +if (input_child_rel->pathlist == NIL)
> +return;
> When can this happen? I think, similar to partition-wise join it happens
> when
> there is a dummy parent relation. See [1]. If that's the case, you may
> want to
> do things similar to what partition-wise join is doing. If there's some
> other
> reason for doing this, returing from here half-way is actually waste of
> memory
> and planning time. Instead, we may want to loop over the part_rels to find
> if
> any of those have empty pathlist and return from there before doing any
> actual
> work.
>

This is kind of can't happen scenario, so I have converted it to an
Assert().
And yes, I am marking a grouped_rel as dummy rel when input rel is dummy.


> +extra.pathTarget = child_target;
> +extra.inputRows = input_child_rel->cheapest_startup_path->rows;
> +extra.havingQual = (Node *) adjust_appendrel_attrs(root,
> +   (Node *)
> query->havingQual,
> +   nappinfos,
> +   appinfos);
> These lines are updating some fields of "extra" structure in every loop.
> The
> structure is passed to create_child_grouping_paths() in the loop and to
> add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel()
> only
> gets some member values for the last child. Is that right?


No. Patch do update those fields before calling add_paths_to_append_rel().

Should we split
> extra into two structures one to be used within the loop and one outside?
> Or
> may be send the members being updated within the loop separately?
>

I don't see any point in splitting. We need almost all fields at child path
creation as well as at finalization step. The patch basically just re-using
the struct variable.


> +/*
> + * Forcibly apply scan/join target to all the Paths for the
> scan/join
> + * rel.
> + *
> [ lines clipped ]
> +if (subpath == input_child_rel->cheapest_total_path)
> +input_child_rel->cheapest_total_path = path;
> +}
> +}
> This code seems to be copied from grouping_planner() almost verbatim. Is
> there
> a way we can refactor it into a function and use it in both the places.
>

Done.
Moved this in
0003-Refactor-code-applying-scanjoin-target-to-paths-into.patch


> have_grouping_by_partkey() may use match_expr_to_partition_keys() to find
> 

Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-28 Thread Ashutosh Bapat
On Wed, Sep 27, 2017 at 3:42 PM, Jeevan Chalke
 wrote:
> Thanks Ashutosh for reviewing.
>
> Attached new patch-set with following changes:
>
> 1. Removed earlier 0007 and 0008 patches which were PoC for supporting
> partial aggregation over fdw. I removed them as it will be a different
> issue altogether and hence I will tackle them separately once this is
> done.
>
> This patch-set now includes support for parallel plans within partitions.
>
> Notes:
> HEAD: 59597e6
> Partition-wise Join Version: 34
>
> (First six patches 0001 - 0006, remains the same functionality-wise)
> 0007 - Refactors partial grouping paths creation into the separate function.
> 0008 - Enables parallelism within the partition-wise aggregation.
>
> This patch also includes a fix for the crash reported by Rajkumar.
> While forcibly applying scan/join target to all the Paths for the scan/join
> rel, earlier I was using apply_projection_to_path() which modifies the path
> in-place which causing this crash as the path finally chosen has been
> updated by partition-wise agg path creation. Now I have used
> create_projection_path() like we do in partial aggregation paths.
>
> Also, fixed issues reported by Ashutosh.

Thanks.

Here are comments on 0004 from last patch set. But most the comments
still apply.

Patch 0001 adds functions create_hash_agg_path() and create_sort_agg_path().
Patch 0004 adds a new argument to those functions for conditions in HAVING
clause. We should move those changes to 0001 and pass parse->havingQual to
these functions in 0001 itself. That will keep all changes to those functions
together and also make 0003 small.

The prologue of try_partition_wise_grouping() mentions a restriction of
partition keys being leading group by clauses. This restriction is not
specified in the prologue of have_grouping_by_partkey(), which actually checks
for this restriction. The requirement per prologue of that function is just to
have partition keys in group clause. I think have_grouping_by_partkey() is
correct, and we should change prologue of try_partition_wise_grouping() to be
in sync with have_grouping_by_partkey(). The prologue explains why
partition-wise aggregation/grouping would be efficient with this restriction,
but it doesn't explain why partial aggregation/grouping per partition would be
efficient. May be it should do that as well. Similar is the case with partial
aggregation/grouping discussion in README.

+/* Do not handle grouping sets for now. */
Is this a real restriction or just restriction for first cut of this feature?
Can you please add more explanation? May be update README as well?

+grouped_rel->part_scheme = input_rel->part_scheme;
Is this true even for partial aggregates? I think not. Since group by clause
does not contain partition keys, the rows from multiple partitions participate
in one group and thus the partition keys of input relation do not apply to the
grouped relation. In this case, it seems that the grouped rel will have
part_rels but will not be partitioned.

+/*
+ * If there is no path for the child relation, then we cannot add
+ * aggregate path too.
+ */
+if (input_child_rel->pathlist == NIL)
+return;
When can this happen? I think, similar to partition-wise join it happens when
there is a dummy parent relation. See [1]. If that's the case, you may want to
do things similar to what partition-wise join is doing. If there's some other
reason for doing this, returing from here half-way is actually waste of memory
and planning time. Instead, we may want to loop over the part_rels to find if
any of those have empty pathlist and return from there before doing any actual
work.

+extra.pathTarget = child_target;
+extra.inputRows = input_child_rel->cheapest_startup_path->rows;
+extra.havingQual = (Node *) adjust_appendrel_attrs(root,
+   (Node *)
query->havingQual,
+   nappinfos,
+   appinfos);
These lines are updating some fields of "extra" structure in every loop. The
structure is passed to create_child_grouping_paths() in the loop and to
add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel() only
gets some member values for the last child. Is that right? Should we split
extra into two structures one to be used within the loop and one outside? Or
may be send the members being updated within the loop separately?

+/*
+ * Forcibly apply scan/join target to all the Paths for the scan/join
+ * rel.
+ *
[ lines clipped ]
+if (subpath == input_child_rel->cheapest_total_path)
+input_child_rel->cheapest_total_path = path;
+}
+}
This code seems to be copied from grouping_planner() almost verbatim. Is there
a way we can 

Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-27 Thread Jeevan Chalke
Thanks Ashutosh for reviewing.

Attached new patch-set with following changes:

1. Removed earlier 0007 and 0008 patches which were PoC for supporting
partial aggregation over fdw. I removed them as it will be a different
issue altogether and hence I will tackle them separately once this is
done.

This patch-set now includes support for parallel plans within partitions.

Notes:
HEAD: 59597e6
Partition-wise Join Version: 34

(First six patches 0001 - 0006, remains the same functionality-wise)
0007 - Refactors partial grouping paths creation into the separate function.
0008 - Enables parallelism within the partition-wise aggregation.

This patch also includes a fix for the crash reported by Rajkumar.
While forcibly applying scan/join target to all the Paths for the scan/join
rel, earlier I was using apply_projection_to_path() which modifies the path
in-place which causing this crash as the path finally chosen has been
updated by partition-wise agg path creation. Now I have used
create_projection_path() like we do in partial aggregation paths.

Also, fixed issues reported by Ashutosh.

On Tue, Sep 26, 2017 at 6:16 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi Jeevan,
> I have started reviewing these patches.
>
> 0001 looks fine. There might be some changes that will be needed, but
> those will be clear when I review the patch that uses this
> refactoring.
>
> 0002
> + *
> + * If targetlist is provided, we use it else use targetlist from the root.
>   */
>  static double
>  get_number_of_groups(PlannerInfo *root,
>  double path_rows,
> -grouping_sets_data *gd)
> +grouping_sets_data *gd,
> +List *tlist)
>  {
> Query  *parse = root->parse;
> double  dNumGroups;
> +   List   *targetList = (tlist == NIL) ? parse->targetList : tlist;
>
> May be we should just pass targetlist always. Instead of passing NIL,
> pass parse->targetList directly. That would save us one conditional
> assignment. May be passing NIL is required for the patches that use
> this refactoring, but that's not clear as is in this patch.
>

Done in attached patch-set.


>
> 0003
> In the documenation of enable_partition_wise_aggregate, we should
> probably explain why the default is off or like partition_wise_join
> GUC, explain the consequences of turning it off.


I have updated this. Please have a look.


> I doubt if we could
> accept something like partition_wise_agg_cost_factor looks. But we can
> discuss this at a later stage. Mean time it may be worthwhile to fix
> the reason why we would require this GUC. If the regular aggregation
> has cost lesser than partition-wise aggregation in most of the cases,
> then probably we need to fix the cost model.
>

Yep. I will have a look mean-while.


>
> I will continue reviewing rest of the patches.
>
>
Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v4.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-26 Thread Ashutosh Bapat
Hi Jeevan,
I have started reviewing these patches.

0001 looks fine. There might be some changes that will be needed, but
those will be clear when I review the patch that uses this
refactoring.

0002
+ *
+ * If targetlist is provided, we use it else use targetlist from the root.
  */
 static double
 get_number_of_groups(PlannerInfo *root,
 double path_rows,
-grouping_sets_data *gd)
+grouping_sets_data *gd,
+List *tlist)
 {
Query  *parse = root->parse;
double  dNumGroups;
+   List   *targetList = (tlist == NIL) ? parse->targetList : tlist;

May be we should just pass targetlist always. Instead of passing NIL,
pass parse->targetList directly. That would save us one conditional
assignment. May be passing NIL is required for the patches that use
this refactoring, but that's not clear as is in this patch.

0003
In the documenation of enable_partition_wise_aggregate, we should
probably explain why the default is off or like partition_wise_join
GUC, explain the consequences of turning it off. I doubt if we could
accept something like partition_wise_agg_cost_factor looks. But we can
discuss this at a later stage. Mean time it may be worthwhile to fix
the reason why we would require this GUC. If the regular aggregation
has cost lesser than partition-wise aggregation in most of the cases,
then probably we need to fix the cost model.

I will continue reviewing rest of the patches.

On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke
 wrote:
>
>
> On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke
>  wrote:
>>
>>
>>
>> On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi
>>  wrote:
>>>
>>>
>>> Hi Jeevan,
>>>
>>> I have started testing partition-wise-aggregate and got one observation,
>>> please take a look.
>>> with the v2 patch, here if I change target list order, query is not
>>> picking full partition-wise-aggregate.
>>
>>
>> Thanks Rajkumar for reporting this.
>>
>> I am looking into this issue and will post updated patch with the fix.
>
>
> Logic for checking whether partition keys lead group by keys needs to be
> updated here. The group by expressions can appear in any order without
> affecting the final result. And thus, the need for partition keys should
> be leading the group by keys to have full aggregation is not mandatory.
> Instead we must ensure that the partition keys are part of the group by
> keys to compute full aggregation on a partition.
>
> Attached, revised patch-set with above fix.
>
> Also, in test-cases, I have removed DROP/ANALYZE commands on child
> relations and also removed VERBOSE from the EXPLAIN.
>
> Notes:
> HEAD: 8edacab209957520423770851351ab4013cb0167
> Partition-wise Join patch-set version: v32
>
> Thanks
>
> --
> Jeevan Chalke
> Principal Software Engineer, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-22 Thread Jeevan Chalke
On Thu, Sep 21, 2017 at 5:00 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Hi Jeevan,
>
> while testing latest v3 patches, I am  getting a server crash if I reset
> partition_wise_agg_cost_factor, please take a look.
>
> CREATE TABLE lp (a TEXT, b FLOAT, c INT) PARTITION BY LIST(c);
> CREATE TABLE lp1 PARTITION OF lp FOR VALUES IN (10,20);
> CREATE TABLE lp2 PARTITION OF lp FOR VALUES IN (30,40);
>
> INSERT INTO lp VALUES ('a1',800, 20);
> INSERT INTO lp VALUES ('a2',1250,30);
> INSERT INTO lp VALUES ('a3',2975,20);
> INSERT INTO lp VALUES ('a3',2850,30);
>
> postgres=# SET enable_partition_wise_agg TO true;
> SET
> postgres=# SET partition_wise_agg_cost_factor TO 0.5;
> SET
> postgres=#
> postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
> BY 1,2;
>  max  |  avg
> --+
>  2975 | 2912.5
> (1 row)
>
> postgres=# RESET partition_wise_agg_cost_factor;
> RESET
> postgres=#
> postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
> BY 1,2;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
>
Oops. Will fix this.

I have added these tests in testcase, but testcase is working as expected.
However running those steps on psql reproduces the crash (not consistent
though).

Looking into it. Thanks for reporting.


> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>


-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-21 Thread Rajkumar Raghuwanshi
On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
> On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>>
>>
>> On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
>> rajkumar.raghuwan...@enterprisedb.com> wrote:
>>
>>>
>>> Hi Jeevan,
>>>
>>> I have started testing partition-wise-aggregate and got one observation,
>>> please take a look.
>>> with the v2 patch, here if I change target list order, query is not
>>> picking full partition-wise-aggregate.
>>>
>>
>> Thanks Rajkumar for reporting this.
>>
>> I am looking into this issue and will post updated patch with the fix.
>>
>
> Logic for checking whether partition keys lead group by keys needs to be
> updated here. The group by expressions can appear in any order without
> affecting the final result. And thus, the need for partition keys should
> be leading the group by keys to have full aggregation is not mandatory.
> Instead we must ensure that the partition keys are part of the group by
> keys to compute full aggregation on a partition.
>
> Attached, revised patch-set with above fix.
>
> Also, in test-cases, I have removed DROP/ANALYZE commands on child
> relations and also removed VERBOSE from the EXPLAIN.
>
> Notes:
> HEAD: 8edacab209957520423770851351ab4013cb0167
> Partition-wise Join patch-set version: v32
>

Hi Jeevan,

while testing latest v3 patches, I am  getting a server crash if I reset
partition_wise_agg_cost_factor, please take a look.

CREATE TABLE lp (a TEXT, b FLOAT, c INT) PARTITION BY LIST(c);
CREATE TABLE lp1 PARTITION OF lp FOR VALUES IN (10,20);
CREATE TABLE lp2 PARTITION OF lp FOR VALUES IN (30,40);

INSERT INTO lp VALUES ('a1',800, 20);
INSERT INTO lp VALUES ('a2',1250,30);
INSERT INTO lp VALUES ('a3',2975,20);
INSERT INTO lp VALUES ('a3',2850,30);

postgres=# SET enable_partition_wise_agg TO true;
SET
postgres=# SET partition_wise_agg_cost_factor TO 0.5;
SET
postgres=#
postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
BY 1,2;
 max  |  avg
--+
 2975 | 2912.5
(1 row)

postgres=# RESET partition_wise_agg_cost_factor;
RESET
postgres=#
postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
BY 1,2;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-18 Thread Rajkumar Raghuwanshi
On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>>
>>
>> On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
>> rajkumar.raghuwan...@enterprisedb.com> wrote:
>>
>>>
>>> Hi Jeevan,
>>>
>>> I have started testing partition-wise-aggregate and got one observation,
>>> please take a look.
>>> with the v2 patch, here if I change target list order, query is not
>>> picking full partition-wise-aggregate.
>>>
>>
>> Thanks Rajkumar for reporting this.
>>
>> I am looking into this issue and will post updated patch with the fix.
>>
>
> Logic for checking whether partition keys lead group by keys needs to be
> updated here. The group by expressions can appear in any order without
> affecting the final result. And thus, the need for partition keys should
> be leading the group by keys to have full aggregation is not mandatory.
> Instead we must ensure that the partition keys are part of the group by
> keys to compute full aggregation on a partition.
>
> Attached, revised patch-set with above fix.
>
> Also, in test-cases, I have removed DROP/ANALYZE commands on child
> relations and also removed VERBOSE from the EXPLAIN.
>
> Notes:
> HEAD: 8edacab209957520423770851351ab4013cb0167
> Partition-wise Join patch-set version: v32
>

Thanks for the patch. I have tested it and issue is fixed now.


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-18 Thread Jeevan Chalke
On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
> rajkumar.raghuwan...@enterprisedb.com> wrote:
>
>>
>> Hi Jeevan,
>>
>> I have started testing partition-wise-aggregate and got one observation,
>> please take a look.
>> with the v2 patch, here if I change target list order, query is not
>> picking full partition-wise-aggregate.
>>
>
> Thanks Rajkumar for reporting this.
>
> I am looking into this issue and will post updated patch with the fix.
>

Logic for checking whether partition keys lead group by keys needs to be
updated here. The group by expressions can appear in any order without
affecting the final result. And thus, the need for partition keys should
be leading the group by keys to have full aggregation is not mandatory.
Instead we must ensure that the partition keys are part of the group by
keys to compute full aggregation on a partition.

Attached, revised patch-set with above fix.

Also, in test-cases, I have removed DROP/ANALYZE commands on child
relations and also removed VERBOSE from the EXPLAIN.

Notes:
HEAD: 8edacab209957520423770851351ab4013cb0167
Partition-wise Join patch-set version: v32

Thanks

-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v3.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-12 Thread Jeevan Chalke
On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

>
> On Fri, Sep 8, 2017 at 5:47 PM, Jeevan Chalke  com> wrote:
>
>> Here are the new patch-set re-based on HEAD (f0a0c17) and
>> latest partition-wise join (v29) patches.
>>
>
> Hi Jeevan,
>
> I have started testing partition-wise-aggregate and got one observation,
> please take a look.
> with the v2 patch, here if I change target list order, query is not
> picking full partition-wise-aggregate.
>

Thanks Rajkumar for reporting this.

I am looking into this issue and will post updated patch with the fix.

Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>
>
>
>

-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-12 Thread Rajkumar Raghuwanshi
On Fri, Sep 8, 2017 at 5:47 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Here are the new patch-set re-based on HEAD (f0a0c17) and
> latest partition-wise join (v29) patches.
>

Hi Jeevan,

I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not picking
full partition-wise-aggregate.

SET enable_partition_wise_agg TO true;
SET partition_wise_agg_cost_factor TO 0.5;
SET enable_partition_wise_join TO true;
SET max_parallel_workers_per_gather TO 0;

CREATE TABLE pagg_tab (a int, b int, c int) PARTITION BY RANGE(a);
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10);
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20);
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30);
INSERT INTO pagg_tab SELECT i % 30, i % 30, i % 50 FROM generate_series(0,
299) i;
ANALYZE pagg_tab;

postgres=# explain (verbose, costs off) select a,b,count(*) from pagg_tab
group by a,b order by 1,2;
  QUERY PLAN
--
 Sort
   Output: pagg_tab_p1.a, pagg_tab_p1.b, (count(*))
   Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
   ->  Append
 ->  HashAggregate
   Output: pagg_tab_p1.a, pagg_tab_p1.b, count(*)
   Group Key: pagg_tab_p1.a, pagg_tab_p1.b
   ->  Seq Scan on public.pagg_tab_p1
 Output: pagg_tab_p1.a, pagg_tab_p1.b
 ->  HashAggregate
   Output: pagg_tab_p2.a, pagg_tab_p2.b, count(*)
   Group Key: pagg_tab_p2.a, pagg_tab_p2.b
   ->  Seq Scan on public.pagg_tab_p2
 Output: pagg_tab_p2.a, pagg_tab_p2.b
 ->  HashAggregate
   Output: pagg_tab_p3.a, pagg_tab_p3.b, count(*)
   Group Key: pagg_tab_p3.a, pagg_tab_p3.b
   ->  Seq Scan on public.pagg_tab_p3
 Output: pagg_tab_p3.a, pagg_tab_p3.b
(19 rows)

-- changing target list order
-- picking partial partition-wise aggregation path
postgres=# explain (verbose, costs off) select b,a,count(*) from pagg_tab
group by a,b order by 1,2;
 QUERY PLAN

 Finalize GroupAggregate
   Output: pagg_tab_p1.b, pagg_tab_p1.a, count(*)
   Group Key: pagg_tab_p1.b, pagg_tab_p1.a
   ->  Sort
 Output: pagg_tab_p1.b, pagg_tab_p1.a, (PARTIAL count(*))
 Sort Key: pagg_tab_p1.b, pagg_tab_p1.a
 ->  Append
   ->  Partial HashAggregate
 Output: pagg_tab_p1.b, pagg_tab_p1.a, PARTIAL count(*)
 Group Key: pagg_tab_p1.b, pagg_tab_p1.a
 ->  Seq Scan on public.pagg_tab_p1
   Output: pagg_tab_p1.b, pagg_tab_p1.a
   ->  Partial HashAggregate
 Output: pagg_tab_p2.b, pagg_tab_p2.a, PARTIAL count(*)
 Group Key: pagg_tab_p2.b, pagg_tab_p2.a
 ->  Seq Scan on public.pagg_tab_p2
   Output: pagg_tab_p2.b, pagg_tab_p2.a
   ->  Partial HashAggregate
 Output: pagg_tab_p3.b, pagg_tab_p3.a, PARTIAL count(*)
 Group Key: pagg_tab_p3.b, pagg_tab_p3.a
 ->  Seq Scan on public.pagg_tab_p3
   Output: pagg_tab_p3.b, pagg_tab_p3.a
(22 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-08 Thread Jeevan Chalke
On Wed, Aug 23, 2017 at 4:43 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hi,
>
> Attached is the patch to implement partition-wise aggregation/grouping.
>
> As explained earlier, we produce a full aggregation for each partition when
> partition keys are leading group by clauses and then append is performed.
> Else we do a partial aggregation on each partition, append them and then
> add
> finalization step over it.
>
> I have observed that cost estimated for partition-wise aggregation and cost
> for the plans without partition-wise aggregation is almost same. However,
> execution time shows significant improvement (as explained my in the very
> first email) with partition-wise aggregates. Planner chooses a plan
> according
> to the costs, and thus most of the time plan without partition-wise
> aggregation is chosen. Hence, to force partition-wise plans and for the
> regression runs, I have added a GUC named partition_wise_agg_cost_factor to
> adjust the costings.
>
> This feature is only used when enable_partition_wise_agg GUC is set to on.
>
> Here are the details of the patches in the patch-set:
>

Here are the new patch-set re-based on HEAD (f0a0c17) and
latest partition-wise join (v29) patches.


>
> 0001 - Refactors sort and hash final grouping paths into separate
> functions.
> Since partition-wise aggregation too builds paths same as that of
> create_grouping_paths(), separated path creation for sort and hash agg into
> separate functions. These functions later used by main partition-wise
> aggregation/grouping patch.
>
> 0002 - Passes targetlist to get_number_of_groups().
> We need to estimate groups for individual child relations and thus need to
> pass targetlist corresponding to the child rel.
>
> 0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor
> GUCs.
>
> 0004 - Implements partition-wise aggregation.
>
> 0005 - Adds test-cases.
>
> 0006 - postgres_fdw changes which enable pushing aggregation for other
> upper
> relations.
>

0007 - Provides infrastructure to allow partial aggregation
This will allow us to push the partial aggregation over fdw.
With this one can write SUM(PARTIAL x) to get a partial sum
result. Since PARTIAL is used in syntax, I need to move that
to a reserved keywords category. This is kind of PoC patch
and needs input over approach and the way it is implemented.

0008 - Teaches postgres_fdw to push partial aggregation
With this we can push aggregate on remote server when
GROUP BY key does not match with the PARTITION key too.


>
>
> Since this patch is highly dependent on partition-wise join [1], one needs
> to
> apply all those patches on HEAD (my repository head was at:
> 66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
> order.
>
> Suggestions / feedback / inputs ?
>
> [1] https://www.postgresql.org/message-id/CAFjFpRd9Vqh_=-Ldv-
> XqWY006d07TJ+VXuhXCbdj=p1juky...@mail.gmail.com
>
>
>
-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v2.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-08-23 Thread Jeevan Chalke
Hi,

Attached is the patch to implement partition-wise aggregation/grouping.

As explained earlier, we produce a full aggregation for each partition when
partition keys are leading group by clauses and then append is performed.
Else we do a partial aggregation on each partition, append them and then add
finalization step over it.

I have observed that cost estimated for partition-wise aggregation and cost
for the plans without partition-wise aggregation is almost same. However,
execution time shows significant improvement (as explained my in the very
first email) with partition-wise aggregates. Planner chooses a plan
according
to the costs, and thus most of the time plan without partition-wise
aggregation is chosen. Hence, to force partition-wise plans and for the
regression runs, I have added a GUC named partition_wise_agg_cost_factor to
adjust the costings.

This feature is only used when enable_partition_wise_agg GUC is set to on.

Here are the details of the patches in the patch-set:

0001 - Refactors sort and hash final grouping paths into separate functions.
Since partition-wise aggregation too builds paths same as that of
create_grouping_paths(), separated path creation for sort and hash agg into
separate functions. These functions later used by main partition-wise
aggregation/grouping patch.

0002 - Passes targetlist to get_number_of_groups().
We need to estimate groups for individual child relations and thus need to
pass targetlist corresponding to the child rel.

0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor
GUCs.

0004 - Implements partition-wise aggregation.

0005 - Adds test-cases.

0006 - postgres_fdw changes which enable pushing aggregation for other upper
relations.


Since this patch is highly dependent on partition-wise join [1], one needs
to
apply all those patches on HEAD (my repository head was at:
66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
order.

Suggestions / feedback / inputs ?

[1]
https://www.postgresql.org/message-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=p1juky...@mail.gmail.com


On Tue, Mar 21, 2017 at 12:47 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hi all,
>
> Declarative partitioning is supported in PostgreSQL 10 and work is already
> in
> progress to support partition-wise joins. Here is a proposal for
> partition-wise
> aggregation/grouping.  Our initial performance measurement has shown 7
> times
> performance when partitions are on foreign servers and approximately 15%
> when
> partitions are local.
>
> Partition-wise aggregation/grouping computes aggregates for each partition
> separately.  If the group clause contains the partition key, all the rows
> belonging to a given group come from one partition, thus allowing
> aggregates
> to be computed completely for each partition.  Otherwise, partial
> aggregates
> computed for each partition are combined across the partitions to produce
> the
> final aggregates. This technique improves performance because:
> i. When partitions are located on foreign server, we can push down the
> aggregate to the foreign server.
> ii. If hash table for each partition fits in memory, but that for the whole
> relation does not, each partition-wise aggregate can use an in-memory hash
> table.
> iii. Aggregation at the level of partitions can exploit properties of
> partitions like indexes, their storage etc.
>
> Attached an experimental patch for the same based on the partition-wise
> join
> patches posted in [1].
>
> This patch currently implements partition-wise aggregation when group
> clause
> contains the partitioning key.  A query below, involving a partitioned
> table
> with 3 partitions containing 1M rows each, producing total 30 groups showed
> 15% improvement over non-partition-wise aggregation. Same query showed 7
> times
> improvement when the partitions were located on the foreign servers.
>
> Here is the sample plan:
>
> postgres=# set enable_partition_wise_agg to true;
> SET
> postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
>   QUERY
> PLAN
> 
> --
>  Append  (cost=5100.00..61518.90 rows=30 width=12) (actual
> time=324.837..944.804 rows=30 loops=1)
>->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
> time=324.837..324.838 rows=10 loops=1)
>  Relations: Aggregate on (public.fplt1_p1 plt1)
>->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
> time=309.954..309.956 rows=10 loops=1)
>  Relations: Aggregate on (public.fplt1_p2 plt1)
>->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
> time=310.002..310.004 rows=10 loops=1)
>  Relations: Aggregate on (public.fplt1_p3 plt1)
>  Planning time: 0.370 ms
>  Execution time: 945.384 ms
> (9 rows)
>
> postgres=# set enable_partition_wise_agg to 

Re: [HACKERS] Partition-wise aggregation/grouping

2017-05-01 Thread Robert Haas
On Fri, Apr 28, 2017 at 3:03 AM, Antonin Houska  wrote:
> I think this is not generic enough because the result of the Append plan can
> be joined to another relation. As such a join can duplicate the
> already-aggregated values, the aggregates should not be finalized below the
> top-level plan.

If the grouping key matches the partition key, then it's correct to
push the entire aggregate down, and there's probably a large
performance advantage from avoiding aggregating twice.  If the two
don't match, then pushing the aggregate down necessarily involves a
"partial" and a "finalize" stage, which may or may not be cheaper than
doing the aggregation all at once.  If you have lots of 2-row groups
with 1 row in the first branch of the append and 1 row in the second
branch of the append, breaking the aggregate into two steps is
probably going to be a loser.  If the overall number of groups is
small, it's probably going to win.  But when the grouping key matches
the partition key, so that two-stage aggregation isn't required, I
suspect the pushdown should almost always win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-04-28 Thread Antonin Houska
Jeevan Chalke  wrote:

> On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska  wrote:
>
> > Robert Haas  wrote:
> > > Well, I'm confused. I see that there's a relationship between what
> > > Antonin is trying to do and what Jeevan is trying to do, but I can't
> > > figure out whether one is a subset of the other, whether they're both
> > > orthogonal, or something else. This plan looks similar to what I
> > > would expect Jeevan's patch to produce,

> >  The point is that the patch Jeevan wanted to work on is actually a subset 
> > of
> >  [1] combined with [2].

> Seems like, as you are targeting every relation whether or not it is
> partitioned.

Yes.

> With my patch, I am getting following plan where we push entire
> aggregation below append.
> 
> QUERY PLAN 
> --
> Append
> -> HashAggregate
> Group Key: b_1.j
> -> Hash Join
> Hash Cond: (b_1.j = c_1.k)
> -> Seq Scan on b_1
> -> Hash
> -> Seq Scan on c_1
> -> HashAggregate
> Group Key: b_2.j
> -> Hash Join
> Hash Cond: (b_2.j = c_2.k)
> -> Seq Scan on b_2
> -> Hash
> -> Seq Scan on c_2
> (15 rows)

I think this is not generic enough because the result of the Append plan can
be joined to another relation. As such a join can duplicate the
already-aggregated values, the aggregates should not be finalized below the
top-level plan.

> Antonin, I have tried applying your patch on master but it doesn't get
> apply. Can you please provide the HEAD and any other changes required
> to be applied first?

I've lost that information. I'll post a new version to the [1] thread asap.

> How the plan look like when GROUP BY key does not match with the
> partitioning key i.e. GROUP BY b.v ?

EXPLAIN (COSTS false)
SELECT  b.v, avg(b.v + c.v)
FROMb
JOIN
c ON b.j = c.k
GROUP BYb.v;

   QUERY PLAN   

 Finalize HashAggregate
   Group Key: b_1.v
   ->  Append
 ->  Partial HashAggregate
   Group Key: b_1.v
   ->  Hash Join
 Hash Cond: (b_1.j = c_1.k)
 ->  Seq Scan on b_1
 ->  Hash
   ->  Seq Scan on c_1
 ->  Partial HashAggregate
   Group Key: b_2.v
   ->  Hash Join
 Hash Cond: (b_2.j = c_2.k)
 ->  Seq Scan on b_2
 ->  Hash
   ->  Seq Scan on c_2


> >  [1] https://www.postgresql.org/message-id/9666.1491295317%40localhost
> >
> >  [2] https://commitfest.postgresql.org/14/994/


-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-04-27 Thread Jeevan Chalke
On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska  wrote:

> Robert Haas  wrote:
>
> > On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska  wrote:
> > > Attached is a diff that contains both patches merged. This is just to
> prove my
> > > assumption, details to be elaborated later. The scripts attached
> produce the
> > > following plan in my environment:
> > >
> > >QUERY PLAN
> > > 
> > >  Parallel Finalize HashAggregate
> > >Group Key: b_1.j
> > >->  Append
> > >  ->  Parallel Partial HashAggregate
> > >Group Key: b_1.j
> > >->  Hash Join
> > >  Hash Cond: (b_1.j = c_1.k)
> > >  ->  Seq Scan on b_1
> > >  ->  Hash
> > >->  Seq Scan on c_1
> > >  ->  Parallel Partial HashAggregate
> > >Group Key: b_2.j
> > >->  Hash Join
> > >  Hash Cond: (b_2.j = c_2.k)
> > >  ->  Seq Scan on b_2
> > >  ->  Hash
> > >->  Seq Scan on c_2
> >
> > Well, I'm confused.  I see that there's a relationship between what
> > Antonin is trying to do and what Jeevan is trying to do, but I can't
> > figure out whether one is a subset of the other, whether they're both
> > orthogonal, or something else.  This plan looks similar to what I
> > would expect Jeevan's patch to produce,
>
> The point is that the patch Jeevan wanted to work on is actually a subset
> of
> [1] combined with [2].
>

Seems like, as you are targeting every relation whether or not it is
partitioned. Where as I am targeting only partitioned relations in my
patch.


>
> > except i have no idea what "Parallel" would mean in a plan that contains
> no
> > Gather node.
>
> parallel_aware field was set mistakenly on the AggPath. Fixed patch is
> attached below, producing this plan:
>
>QUERY PLAN
> 
>  Finalize HashAggregate
>Group Key: b_1.j
>->  Append
>  ->  Partial HashAggregate
>Group Key: b_1.j
>->  Hash Join
>  Hash Cond: (b_1.j = c_1.k)
>  ->  Seq Scan on b_1
>  ->  Hash
>->  Seq Scan on c_1
>  ->  Partial HashAggregate
>Group Key: b_2.j
>->  Hash Join
>  Hash Cond: (b_2.j = c_2.k)
>  ->  Seq Scan on b_2
>  ->  Hash
>->  Seq Scan on c_2
>

With my patch, I am getting following plan where we push entire
aggregation below append.

QUERY PLAN
--
 Append
   ->  HashAggregate
 Group Key: b_1.j
 ->  Hash Join
   Hash Cond: (b_1.j = c_1.k)
   ->  Seq Scan on b_1
   ->  Hash
 ->  Seq Scan on c_1
   ->  HashAggregate
 Group Key: b_2.j
 ->  Hash Join
   Hash Cond: (b_2.j = c_2.k)
   ->  Seq Scan on b_2
   ->  Hash
 ->  Seq Scan on c_2
(15 rows)


Antonin, I have tried applying your patch on master but it doesn't get
apply. Can you please provide the HEAD and any other changes required
to be applied first?

How the plan look like when GROUP BY key does not match with the
partitioning key i.e. GROUP BY b.v ?


>
> [1] https://www.postgresql.org/message-id/9666.1491295317%40localhost
>
> [2] https://commitfest.postgresql.org/14/994/
>
> --
> Antonin Houska
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de, http://www.cybertec.at
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


Re: [HACKERS] Partition-wise aggregation/grouping

2017-04-26 Thread Robert Haas
On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska  wrote:
> Attached is a diff that contains both patches merged. This is just to prove my
> assumption, details to be elaborated later. The scripts attached produce the
> following plan in my environment:
>
>QUERY PLAN
> 
>  Parallel Finalize HashAggregate
>Group Key: b_1.j
>->  Append
>  ->  Parallel Partial HashAggregate
>Group Key: b_1.j
>->  Hash Join
>  Hash Cond: (b_1.j = c_1.k)
>  ->  Seq Scan on b_1
>  ->  Hash
>->  Seq Scan on c_1
>  ->  Parallel Partial HashAggregate
>Group Key: b_2.j
>->  Hash Join
>  Hash Cond: (b_2.j = c_2.k)
>  ->  Seq Scan on b_2
>  ->  Hash
>->  Seq Scan on c_2

Well, I'm confused.  I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else.  This plan looks similar to what I
would expect Jeevan's patch to produce, except i have no idea what
"Parallel" would mean in a plan that contains no Gather node.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-04-05 Thread Antonin Houska
Antonin Houska  wrote:
> 
> Jeevan Chalke  wrote:
> 
> > Our work will overlap when we are pushing down the aggregate on partitioned
> > base relation to its children/partitions.
> > 
> > I think you should continue working on pushing down aggregate onto the
> > joins/scans where as I will continue my work on pushing down aggregates to
> > partitions (joins as well as single table). Once we are done with these 
> > task,
> > then we may need to find a way to integrate them.
> > 
> > [1] 
> > https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com#CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com
> 
> My patch does also create (partial) aggregation paths below the Append node,
> but only expects SeqScan as input. Please check if you patch can be based on
> this or if there's any conflict.

Well, I haven't imposed any explicit restriction on the kind of path to be
aggregated below the Append path. Maybe the only thing to do is to merge my
patch with the "partition-wise join" patch (which I haven't checked yet).

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-04-05 Thread Antonin Houska
The promised new version of my patch is here:

https://www.postgresql.org/message-id/9666.1491295317%40localhost

Jeevan Chalke  wrote:

> On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska  wrote:
> 
>  Jeevan Chalke  wrote:
> 
> IIUC, it seems that you are trying to push down the aggregation into the
> joining relations. So basically you are converting
> Agg -> Join -> {scan1, scan2} into
> FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
> In addition to that your patch pushes aggregates on base rel to its children,
> if any.
> 
> Where as what I propose here is pushing down aggregation below the append
> node keeping join/scan as is. So basically I am converting
> Agg -> Append-> Join -> {scan1, scan2} into
> Append -> Agg -> Join -> {scan1, scan2}.
> This will require partition-wise join as posted in [1].
> But I am planning to make this work for partitioned relations and not for
> generic inheritance.
> 
> I treat these two as separate strategies/paths to be consider while planning.
> 
> Our work will overlap when we are pushing down the aggregate on partitioned
> base relation to its children/partitions.
> 
> I think you should continue working on pushing down aggregate onto the
> joins/scans where as I will continue my work on pushing down aggregates to
> partitions (joins as well as single table). Once we are done with these task,
> then we may need to find a way to integrate them.
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com#CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com

My patch does also create (partial) aggregation paths below the Append node,
but only expects SeqScan as input. Please check if you patch can be based on
this or if there's any conflict.

(I'll probably be unable to respond before Monday 04/17.)

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-03-23 Thread Jeevan Chalke
On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska  wrote:

> Jeevan Chalke  wrote:
>
> > Declarative partitioning is supported in PostgreSQL 10 and work is
> already in
> > progress to support partition-wise joins. Here is a proposal for
> partition-wise
> > aggregation/grouping. Our initial performance measurement has shown 7
> times
> > performance when partitions are on foreign servers and approximately 15%
> when
> > partitions are local.
> >
> > Partition-wise aggregation/grouping computes aggregates for each
> partition
> > separately. If the group clause contains the partition key, all the rows
> > belonging to a given group come from one partition, thus allowing
> aggregates
> > to be computed completely for each partition. Otherwise, partial
> aggregates
> > computed for each partition are combined across the partitions to
> produce the
> > final aggregates. This technique improves performance because:
>
> > i. When partitions are located on foreign server, we can push down the
> > aggregate to the foreign server.
>
> > ii. If hash table for each partition fits in memory, but that for the
> whole
> > relation does not, each partition-wise aggregate can use an in-memory
> hash
> > table.
>
> > iii. Aggregation at the level of partitions can exploit properties of
> > partitions like indexes, their storage etc.
>
> I suspect this overlaps with
>
> https://www.postgresql.org/message-id/29111.1483984605%40localhost
>
> I'm working on the next version of the patch, which will be able to
> aggregate
> the result of both base relation scans and joins. I'm trying hard to make
> the
> next version available before an urgent vacation that I'll have to take at
> random date between today and early April. I suggest that we coordinate the
> effort, it's lot of work in any case.
>

IIUC, it seems that you are trying to push down the aggregation into the
joining relations. So basically you are converting
Agg -> Join -> {scan1, scan2} into
FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
In addition to that your patch pushes aggregates on base rel to its
children,
if any.

Where as what I propose here is pushing down aggregation below the append
node keeping join/scan as is. So basically I am converting
Agg -> Append-> Join -> {scan1, scan2} into
Append -> Agg -> Join -> {scan1, scan2}.
This will require partition-wise join as posted in [1].
But I am planning to make this work for partitioned relations and not for
generic inheritance.

I treat these two as separate strategies/paths to be consider while
planning.

Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.

I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these
task,
then we may need to find a way to integrate them.

[1]
https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com#CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com



>
> --
> Antonin Houska
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de, http://www.cybertec.at
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Partition-wise aggregation/grouping

2017-03-21 Thread Antonin Houska
Jeevan Chalke  wrote:

> Declarative partitioning is supported in PostgreSQL 10 and work is already in
> progress to support partition-wise joins. Here is a proposal for 
> partition-wise
> aggregation/grouping. Our initial performance measurement has shown 7 times
> performance when partitions are on foreign servers and approximately 15% when
> partitions are local.
> 
> Partition-wise aggregation/grouping computes aggregates for each partition
> separately. If the group clause contains the partition key, all the rows
> belonging to a given group come from one partition, thus allowing aggregates
> to be computed completely for each partition. Otherwise, partial aggregates
> computed for each partition are combined across the partitions to produce the
> final aggregates. This technique improves performance because:

> i. When partitions are located on foreign server, we can push down the
> aggregate to the foreign server.

> ii. If hash table for each partition fits in memory, but that for the whole
> relation does not, each partition-wise aggregate can use an in-memory hash
> table.

> iii. Aggregation at the level of partitions can exploit properties of
> partitions like indexes, their storage etc.

I suspect this overlaps with

https://www.postgresql.org/message-id/29111.1483984605%40localhost

I'm working on the next version of the patch, which will be able to aggregate
the result of both base relation scans and joins. I'm trying hard to make the
next version available before an urgent vacation that I'll have to take at
random date between today and early April. I suggest that we coordinate the
effort, it's lot of work in any case.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers