Re: [HACKERS] Possible optimization on Function Scan

2016-09-07 Thread Andres Freund
Hi,

On 2016-09-07 15:29:08 -0500, Jim Nasby wrote:
> I was a bit surprised to discover the difference below in calling an SRF as
> part of a target list vs part of the from clause. The from clause generates
> a Function Scan, which (apparently blindly) builds a tuplestore. Is there a
> relatively easy way to either transform this type of query so the SRF is
> back in a target list, or teach Function Scan that it doesn't always need to
> create a tuplestore? It would be nice if we could just not use a tuplestore
> at all (depending on the planner to add a Materialize node if necessary),
> but AIUI functions can directly return a tuplestore, so I guess that's not
> an option...

I've recently implemented ValuePerCall support for SRF in FROM
http://archives.postgresql.org/message-id/20160827214829.zo2dfb5jaikii5nw%40alap3.anarazel.de

One mail up in 
https://www.postgresql.org/message-id/20160822214023.aaxz5l4igypowyri%40alap3.anarazel.de
there's before/after performance numbers showing that removing the
materialization fixes the issue.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible optimization on Function Scan

2016-09-07 Thread Jim Nasby
I was a bit surprised to discover the difference below in calling an SRF 
as part of a target list vs part of the from clause. The from clause 
generates a Function Scan, which (apparently blindly) builds a 
tuplestore. Is there a relatively easy way to either transform this type 
of query so the SRF is back in a target list, or teach Function Scan 
that it doesn't always need to create a tuplestore? It would be nice if 
we could just not use a tuplestore at all (depending on the planner to 
add a Materialize node if necessary), but AIUI functions can directly 
return a tuplestore, so I guess that's not an option...



~@decina/45678# explain (analyze,verbose,buffers) select count(*) from (select 
generate_series(1,)) c;
   QUERY PLAN

 Aggregate  (cost=17.51..17.52 rows=1 width=8) (actual 
time=27085.104..27085.104 rows=1 loops=1)
   Output: count(*)
   ->  Result  (cost=0.00..5.01 rows=1000 width=4) (actual 
time=0.007..14326.945 rows= loops=1)
 Output: generate_series(1, )
 Planning time: 0.125 ms
 Execution time: 27085.153 ms
(6 rows)

Time: 27087.624 ms
~@decina/45678# explain (analyze,verbose,buffers) select count(*) from 
generate_series(1,);
QUERY PLAN
--
 Aggregate  (cost=12.50..12.51 rows=1 width=8) (actual 
time=57968.811..57968.812 rows=1 loops=1)
   Output: count(*)
   Buffers: temp read=170900 written=170899
   ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 
width=0) (actual time=22407.515..44908.001 rows= loops=1)
 Output: generate_series
 Function Call: generate_series(1, )
 Buffers: temp read=170900 written=170899
 Planning time: 0.060 ms
 Execution time: 58054.981 ms
(9 rows)

Time: 58055.929 ms



--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-28 Thread Craig Ringer
On 08/28/2014 03:46 AM, Claudio Freire wrote:
 You can't with mean and stddev, only with associative aggregates.
 
 That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.

You could with a new helper function to merge the temporary states for
each scan though.

In the case of mean, for example, it'd just mean adding the counts and sums.

However, I'm not sure how interesting that is without the ability to
execute the subplans in parallel.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Pavel Stehule
Hi

one user asked about using a partitioning for faster aggregates queries.

I found so there is not any optimization.

create table x1(a int, d date);
create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);

When I have this schema, then optimizer try to do

postgres=# explain verbose select max(a) from x1 group by d order by d;
   QUERY
PLAN

 GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
   Output: max(x1.a), x1.d
   Group Key: x1.d
   -  Sort  (cost=684.79..706.19 rows=8561 width=8)
 Output: x1.d, x1.a
 Sort Key: x1.d
 -  Append  (cost=0.00..125.60 rows=8561 width=8)
   -  Seq Scan on public.x1  (cost=0.00..0.00 rows=1 width=8)
 Output: x1.d, x1.a
   -  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
width=8)
 Output: x_1.d, x_1.a
   -  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
width=8)
 Output: x_2.d, x_2.a
   -  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
width=8)
 Output: x_3.d, x_3.a
   -  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
width=8)
 Output: x_4.d, x_4.a
 Planning time: 0.333 ms

It can be reduced to:

  sort by d
  Append
   Aggegate (a), d
 seq scan from x_1
   Aggregate (a), d
 seq scan from x_2

Are there some plans to use partitioning for aggregation?

Regards

Pavel


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Merlin Moncure
On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 one user asked about using a partitioning for faster aggregates queries.

 I found so there is not any optimization.

 create table x1(a int, d date);
 create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
 create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
 create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);

 When I have this schema, then optimizer try to do

 postgres=# explain verbose select max(a) from x1 group by d order by d;
QUERY PLAN
 
  GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
Output: max(x1.a), x1.d
Group Key: x1.d
-  Sort  (cost=684.79..706.19 rows=8561 width=8)
  Output: x1.d, x1.a
  Sort Key: x1.d
  -  Append  (cost=0.00..125.60 rows=8561 width=8)
-  Seq Scan on public.x1  (cost=0.00..0.00 rows=1 width=8)
  Output: x1.d, x1.a
-  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_1.d, x_1.a
-  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_2.d, x_2.a
-  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_3.d, x_3.a
-  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_4.d, x_4.a
  Planning time: 0.333 ms

 It can be reduced to:

   sort by d
   Append
Aggegate (a), d
  seq scan from x_1
Aggregate (a), d
  seq scan from x_2

 Are there some plans to use partitioning for aggregation?

Besides min/max, what other aggregates (mean/stddev come to mind)
would you optimize and how would you determine which ones could be?
Where is that decision made?

For example, could user defined aggregates be pushed down if you had a
reaggregation routine broken out from the main one?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Claudio Freire
On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hi

 one user asked about using a partitioning for faster aggregates queries.

 I found so there is not any optimization.

 create table x1(a int, d date);
 create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
 create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
 create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);

 When I have this schema, then optimizer try to do

 postgres=# explain verbose select max(a) from x1 group by d order by d;
QUERY PLAN
 
  GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
Output: max(x1.a), x1.d
Group Key: x1.d
-  Sort  (cost=684.79..706.19 rows=8561 width=8)
  Output: x1.d, x1.a
  Sort Key: x1.d
  -  Append  (cost=0.00..125.60 rows=8561 width=8)
-  Seq Scan on public.x1  (cost=0.00..0.00 rows=1 width=8)
  Output: x1.d, x1.a
-  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_1.d, x_1.a
-  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_2.d, x_2.a
-  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_3.d, x_3.a
-  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_4.d, x_4.a
  Planning time: 0.333 ms

 It can be reduced to:

   sort by d
   Append
Aggegate (a), d
  seq scan from x_1
Aggregate (a), d
  seq scan from x_2

 Are there some plans to use partitioning for aggregation?

 Besides min/max, what other aggregates (mean/stddev come to mind)
 would you optimize and how would you determine which ones could be?
 Where is that decision made?


You can't with mean and stddev, only with associative aggregates.

That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Pavel Stehule
2014-08-27 21:41 GMT+02:00 Merlin Moncure mmonc...@gmail.com:

 On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  Hi
 
  one user asked about using a partitioning for faster aggregates queries.
 
  I found so there is not any optimization.
 
  create table x1(a int, d date);
  create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
  create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
  create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
 
  When I have this schema, then optimizer try to do
 
  postgres=# explain verbose select max(a) from x1 group by d order by d;
 QUERY PLAN
 
 
   GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
 Output: max(x1.a), x1.d
 Group Key: x1.d
 -  Sort  (cost=684.79..706.19 rows=8561 width=8)
   Output: x1.d, x1.a
   Sort Key: x1.d
   -  Append  (cost=0.00..125.60 rows=8561 width=8)
 -  Seq Scan on public.x1  (cost=0.00..0.00 rows=1
 width=8)
   Output: x1.d, x1.a
 -  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_1.d, x_1.a
 -  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_2.d, x_2.a
 -  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_3.d, x_3.a
 -  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_4.d, x_4.a
   Planning time: 0.333 ms
 
  It can be reduced to:
 
sort by d
Append
 Aggegate (a), d
   seq scan from x_1
 Aggregate (a), d
   seq scan from x_2
 
  Are there some plans to use partitioning for aggregation?

 Besides min/max, what other aggregates (mean/stddev come to mind)
 would you optimize and how would you determine which ones could be?
 Where is that decision made?


I am thinking so all aggregates are possible

when you have a partitions by column X -- then you have a natural sets by X,

so you can directly calculate any aggregates on any column when GROUP BY
clause is a GROUP BY X

isn't it?

probably some similar optimizations are possible when you have GROUP BY
X,Y -- minimally you have more sets, and you can do aggregations on
smaller sets.

Pavel



 For example, could user defined aggregates be pushed down if you had a
 reaggregation routine broken out from the main one?

 merlin



Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Pavel Stehule
2014-08-27 21:46 GMT+02:00 Claudio Freire klaussfre...@gmail.com:

 On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure mmonc...@gmail.com
 wrote:
  On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  Hi
 
  one user asked about using a partitioning for faster aggregates queries.
 
  I found so there is not any optimization.
 
  create table x1(a int, d date);
  create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
  create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
  create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
 
  When I have this schema, then optimizer try to do
 
  postgres=# explain verbose select max(a) from x1 group by d order by d;
 QUERY PLAN
 
 
   GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
 Output: max(x1.a), x1.d
 Group Key: x1.d
 -  Sort  (cost=684.79..706.19 rows=8561 width=8)
   Output: x1.d, x1.a
   Sort Key: x1.d
   -  Append  (cost=0.00..125.60 rows=8561 width=8)
 -  Seq Scan on public.x1  (cost=0.00..0.00 rows=1
 width=8)
   Output: x1.d, x1.a
 -  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_1.d, x_1.a
 -  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_2.d, x_2.a
 -  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_3.d, x_3.a
 -  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
  width=8)
   Output: x_4.d, x_4.a
   Planning time: 0.333 ms
 
  It can be reduced to:
 
sort by d
Append
 Aggegate (a), d
   seq scan from x_1
 Aggregate (a), d
   seq scan from x_2
 
  Are there some plans to use partitioning for aggregation?
 
  Besides min/max, what other aggregates (mean/stddev come to mind)
  would you optimize and how would you determine which ones could be?
  Where is that decision made?


 You can't with mean and stddev, only with associative aggregates.

 That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.


I don't think

I have a partitions by X .. and my query has group by clause GROUP BY X

so I can calculate any aggregate

Pavel


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Merlin Moncure
On Wed, Aug 27, 2014 at 2:46 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hi

 one user asked about using a partitioning for faster aggregates queries.

 I found so there is not any optimization.

 create table x1(a int, d date);
 create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
 create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
 create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);

 When I have this schema, then optimizer try to do

 postgres=# explain verbose select max(a) from x1 group by d order by d;
QUERY PLAN
 
  GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
Output: max(x1.a), x1.d
Group Key: x1.d
-  Sort  (cost=684.79..706.19 rows=8561 width=8)
  Output: x1.d, x1.a
  Sort Key: x1.d
  -  Append  (cost=0.00..125.60 rows=8561 width=8)
-  Seq Scan on public.x1  (cost=0.00..0.00 rows=1 width=8)
  Output: x1.d, x1.a
-  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_1.d, x_1.a
-  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_2.d, x_2.a
-  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_3.d, x_3.a
-  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
 width=8)
  Output: x_4.d, x_4.a
  Planning time: 0.333 ms

 It can be reduced to:

   sort by d
   Append
Aggegate (a), d
  seq scan from x_1
Aggregate (a), d
  seq scan from x_2

 Are there some plans to use partitioning for aggregation?

 Besides min/max, what other aggregates (mean/stddev come to mind)
 would you optimize and how would you determine which ones could be?
 Where is that decision made?


 You can't with mean and stddev, only with associative aggregates.

associative bit just makes it easier (which is important of course!).
mean for example can be pushed down if the 'pushed down' aggregates
return to the count to the reaggregator so that you can weight the
final average.  that's a lot more complicated though.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 associative bit just makes it easier (which is important of course!).
 mean for example can be pushed down if the 'pushed down' aggregates
 return to the count to the reaggregator so that you can weight the
 final average.  that's a lot more complicated though.

The real question is what you're expecting to get out of such an
optimization.  If the aggregate has to visit all rows then it's
not apparent to me that any win emerges from the extra complication.

We do already have optimization of min/max across inheritance trees,
and that's certainly a win because you don't have to visit all rows.

regression=# create table pp(f1 int unique);
CREATE TABLE
regression=# create table cc(unique(f1)) inherits(pp);
CREATE TABLE
regression=# create table cc2(unique(f1)) inherits(pp);
CREATE TABLE
regression=# explain select max(f1) from pp;
 QUERY PLAN 


 Result  (cost=0.51..0.52 rows=1 width=0)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.46..0.51 rows=1 width=4)
   -  Merge Append  (cost=0.46..267.71 rows=4777 width=4)
 Sort Key: pp.f1
 -  Index Only Scan Backward using pp_f1_key on pp  
(cost=0.12..8.14 rows=1 width=4)
   Index Cond: (f1 IS NOT NULL)
 -  Index Only Scan Backward using cc_f1_key on cc  
(cost=0.15..85.94 rows=2388 width=4)
   Index Cond: (f1 IS NOT NULL)
 -  Index Only Scan Backward using cc2_f1_key on cc2  
(cost=0.15..85.94 rows=2388 width=4)
   Index Cond: (f1 IS NOT NULL)
 Planning time: 0.392 ms
(12 rows)

That doesn't currently extend to the GROUP BY case unfortunately.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Pavel Stehule
2014-08-27 22:27 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Merlin Moncure mmonc...@gmail.com writes:
  associative bit just makes it easier (which is important of course!).
  mean for example can be pushed down if the 'pushed down' aggregates
  return to the count to the reaggregator so that you can weight the
  final average.  that's a lot more complicated though.

 The real question is what you're expecting to get out of such an
 optimization.  If the aggregate has to visit all rows then it's
 not apparent to me that any win emerges from the extra complication.


I expect a remove a hashing or sorting part of aggregation. It can reduce
aggregation to seq scan only.

Pavel



 We do already have optimization of min/max across inheritance trees,
 and that's certainly a win because you don't have to visit all rows.

 regression=# create table pp(f1 int unique);
 CREATE TABLE
 regression=# create table cc(unique(f1)) inherits(pp);
 CREATE TABLE
 regression=# create table cc2(unique(f1)) inherits(pp);
 CREATE TABLE
 regression=# explain select max(f1) from pp;
  QUERY PLAN

 
  Result  (cost=0.51..0.52 rows=1 width=0)
InitPlan 1 (returns $0)
  -  Limit  (cost=0.46..0.51 rows=1 width=4)
-  Merge Append  (cost=0.46..267.71 rows=4777 width=4)
  Sort Key: pp.f1
  -  Index Only Scan Backward using pp_f1_key on pp
 (cost=0.12..8.14 rows=1 width=4)
Index Cond: (f1 IS NOT NULL)
  -  Index Only Scan Backward using cc_f1_key on cc
 (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
  -  Index Only Scan Backward using cc2_f1_key on cc2
 (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
  Planning time: 0.392 ms
 (12 rows)

 That doesn't currently extend to the GROUP BY case unfortunately.

 regards, tom lane



Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Tomas Vondra
On 27 Srpen 2014, 21:41, Merlin Moncure wrote:
 On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule pavel.steh...@gmail.com

 Are there some plans to use partitioning for aggregation?

 Besides min/max, what other aggregates (mean/stddev come to mind)
 would you optimize and how would you determine which ones could be?
 Where is that decision made?

 For example, could user defined aggregates be pushed down if you had a
 reaggregation routine broken out from the main one?

I think that what Pavel suggests is that when you are aggregating by

GROUP BY x

and 'x' happens to be used for partitioning (making it impossible to
groups from different partitions to overlap), then it's perfectly fine to
perform the aggregation per partition, and just append the results.

If you need sorted output, you can sort the results (assuming the
cardinality of the output is much lower than the actual data).

This append first, then aggregate may be the cause for switch to sort
(because of fear that the amount of group will exceed work_mem), while we
could just as fine process each partition by hash aggregate separately.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Merlin Moncure
On Wed, Aug 27, 2014 at 3:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 associative bit just makes it easier (which is important of course!).
 mean for example can be pushed down if the 'pushed down' aggregates
 return to the count to the reaggregator so that you can weight the
 final average.  that's a lot more complicated though.

 The real question is what you're expecting to get out of such an
 optimization.  If the aggregate has to visit all rows then it's
 not apparent to me that any win emerges from the extra complication.

 We do already have optimization of min/max across inheritance trees,
 and that's certainly a win because you don't have to visit all rows.

 regression=# create table pp(f1 int unique);
 CREATE TABLE
 regression=# create table cc(unique(f1)) inherits(pp);
 CREATE TABLE
 regression=# create table cc2(unique(f1)) inherits(pp);
 CREATE TABLE
 regression=# explain select max(f1) from pp;
  QUERY PLAN
 
  Result  (cost=0.51..0.52 rows=1 width=0)
InitPlan 1 (returns $0)
  -  Limit  (cost=0.46..0.51 rows=1 width=4)
-  Merge Append  (cost=0.46..267.71 rows=4777 width=4)
  Sort Key: pp.f1
  -  Index Only Scan Backward using pp_f1_key on pp  
 (cost=0.12..8.14 rows=1 width=4)
Index Cond: (f1 IS NOT NULL)
  -  Index Only Scan Backward using cc_f1_key on cc  
 (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
  -  Index Only Scan Backward using cc2_f1_key on cc2  
 (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
  Planning time: 0.392 ms
 (12 rows)

 That doesn't currently extend to the GROUP BY case unfortunately.

Yeah: I was overthinking it.   My mind was on parallel processing of
the aggregate (which is not what Pavel was proposing) because that
just happens to be what I'm working on currently -- using dblink to
decompose various aggregates and distribute the calculation across
servers.  Woudn't it nice to have to the server to that itself, I
impulsively thought.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-27 Thread Claudio Freire
On Wed, Aug 27, 2014 at 6:46 PM, Merlin Moncure mmonc...@gmail.com wrote:

 Yeah: I was overthinking it.   My mind was on parallel processing of
 the aggregate (which is not what Pavel was proposing) because that
 just happens to be what I'm working on currently -- using dblink to
 decompose various aggregates and distribute the calculation across
 servers.  Woudn't it nice to have to the server to that itself, I
 impulsively thought.

But you'd have part of it too. Because then you'd have semantically
independent parallel nodes in the plan that do some meaningful data
wrangling and spit little output, whereas the previous plan did not do
much with the data and spit loads of rows as a result. This is a big
previous step for parallel execution really.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible Optimization?

2004-09-24 Thread Rod Taylor
It would appear that region_id = parent_id is not internally converted
to region_id = 1129, despite parent_id being enforced to 1129 at the top
level.

In this case, it makes a difference in performance of about 4 (2 minutes
vs 30 second).

The reason I didn't do this myself upfront, is that parent_id is
calculated by a function which I didn't want to call twice. I've split
the query into 2 parts as a result.

Plans attached from PostgreSQL 7.4.5.

QUERY:
SELECT region_id, region_title
  FROM bric_extension.region_in_region
 WHERE parent_id = 1129
   AND class = (SELECT region_class
  FROM region_classes
 WHERE order 
   (SELECT order
  FROM region
  JOIN region_classes
ON (region_class = class)
 WHERE region_id = parent_id) -- 1129
  ORDER BY order
 LIMIT 1);

 QUERY PLAN


 Subquery Scan region_in_region  (cost=1455.18..1455.21 rows=1 width=45) (actual 
time=27966.381..27966.400 rows=1 loops=1)
   InitPlan
 -  Limit  (cost=1.65..1.65 rows=1 width=36) (actual time=1.449..1.453 rows=1 
loops=1)
   InitPlan
 -  Hash Join  (cost=1.06..1.32 rows=1 width=4) (actual time=0.598..0.931 
rows=1 loops=1)   Hash Cond: (outer.region_class = 
(inner.class)::text)
   -  Subquery Scan region_classes  (cost=0.00..0.21 rows=7 width=36) 
(actual time=0.121..0.549 rows=7 loops=1)
 -  Append  (cost=0.00..0.14 rows=7 width=0) (actual 
time=0.103..0.452 rows=7 loops=1)
   -  Subquery Scan *SELECT* 1  (cost=0.00..0.02 rows=1 
width=0) (actual time=0.092..0.106 rows=1 loops=1)
 -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.072..0.076 rows=1 loops=1)
   -  Subquery Scan *SELECT* 2  (cost=0.00..0.02 rows=1 
width=0) (actual time=0.027..0.041 rows=1 loops=1)
 -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.012..0.016 rows=1 loops=1)
   -  Subquery Scan *SELECT* 3  (cost=0.00..0.02 rows=1 
width=0) (actual time=0.026..0.041 rows=1 loops=1)
 -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.010..0.015 rows=1 loops=1)
   -  Subquery Scan *SELECT* 4  (cost=0.00..0.02 rows=1 
width=0) (actual time=0.026..0.039 rows=1 loops=1)
 -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.011..0.016 rows=1 loops=1)
   -  Subquery Scan *SELECT* 5  (cost=0.00..0.02 rows=1 
width=0) (actual time=0.027..0.041 rows=1 loops=1)
 -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.012..0.017 rows=1 loops=1)
   -  Subquery Scan *SELECT* 6  (cost=0.00..0.02 rows=1 
width=0) (actual time=0.026..0.040 rows=1 loops=1)
 -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.012..0.017 rows=1 loops=1)
   -  Subquery Scan *SELECT* 7  (cost=0.00..0.02 rows=1 
width=0) (actual time=0.025..0.039 rows=1 loops=1)
 -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.012..0.016 rows=1 loops=1)
   -  Hash  (cost=1.06..1.06 rows=1 width=14) (actual 
time=0.125..0.125 rows=0 loops=1)
 -  Seq Scan on region  (cost=0.00..1.06 rows=1 width=14) 
(actual time=0.065..0.097 rows=1 loops=1)
   Filter: (region_id = 1129::numeric)
   -  Sort  (cost=0.33..0.34 rows=7 width=36) (actual time=1.434..1.434 
rows=1 loops=1)
 Sort Key: order
 -  Subquery Scan region_classes  (cost=0.00..0.23 rows=7 width=36) 
(actual time=1.102..1.364 rows=4 loops=1)
   -  Append  (cost=0.00..0.16 rows=7 width=0) (actual 
time=1.083..1.300 rows=4 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..0.02 rows=1 
width=0) (actual
time=0.976..0.976 rows=0 loops=1)
   -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.963..0.963 rows=0 loops=1)
 One-Time Filter: (10  $0)
 -  Subquery Scan *SELECT* 2  (cost=0.00..0.02 rows=1 
width=0) (actual
time=0.025..0.025 rows=0 loops=1)
   -  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.014..0.014 rows=0 loops=1)