Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread Amit Langote
On 2015/09/28 20:58, David Rowley wrote:
> On 28 September 2015 at 23:17, Amit Langote 
> wrote:
>> Moreover, would partial aggregation work below Append?
>>
> 
> Do you mean for cases like:
> 
> create table a as select x.x a from generate_series(1,100) x(x);
> select sum(a) from (select a from a union all select a from a) a;
> 
> to allow the aggregation to happen before the append?
> 

Yes.

> On testing this I do see that writing the query as:
> 
> select sum(a) from (select sum(a) a from a union all select sum(a) from a)
> a;
> 
> causes it to execute marginally faster. 174.280 ms vs 153.498 ms on my
> laptop.
> However pushing aggregation below Append nodes is not something I'm aiming
> to do for this patch.

I see. I recall reading in archives that pushing aggregates below append
was not found to make much difference as your little test suggests, too.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread David Rowley
On 28 September 2015 at 23:17, Amit Langote 
wrote:

> On 2015/09/28 17:04, David Rowley wrote:
> > On 28 September 2015 at 20:36, Amit Langote <
> langote_amit...@lab.ntt.co.jp>
> > wrote:
> >
> >>
> >> Did you perhaps attach a version of the patch you didn't intend to?
> >>
> >
> > Oops. It seems so.
> >
> > Please find the correct version attached.
>
> Thanks, this one works fine.
>
> By the way, you may have noticed that the append_rel_list would be broken
> if the proposed optimization is applied to a appendrel parent.
>
> CREATE TABLE sale_1() INHERITS(sale);
> CREATE TABLE sale_2() INHERITS(sale);
>
> EXPLAIN SELECT count(*) FROM sale;
>   QUERY PLAN
> --
>  Finalize Aggregate  (cost=0.01..0.02 rows=1 width=0)
>->  Result  (cost=0.00..0.01 rows=1 width=0)
>  One-Time Filter: false
> (3 rows)
>
>
Thanks. I've changed this locally to disable the optimisation in this case.


> Moreover, would partial aggregation work below Append?
>

Do you mean for cases like:

create table a as select x.x a from generate_series(1,100) x(x);
select sum(a) from (select a from a union all select a from a) a;

to allow the aggregation to happen before the append?

On testing this I do see that writing the query as:

select sum(a) from (select sum(a) a from a union all select sum(a) from a)
a;

causes it to execute marginally faster. 174.280 ms vs 153.498 ms on my
laptop.
However pushing aggregation below Append nodes is not something I'm aiming
to do for this patch.

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread Amit Langote
On 2015/09/28 17:04, David Rowley wrote:
> On 28 September 2015 at 20:36, Amit Langote 
> wrote:
> 
>>
>> Did you perhaps attach a version of the patch you didn't intend to?
>>
> 
> Oops. It seems so.
> 
> Please find the correct version attached.

Thanks, this one works fine.

By the way, you may have noticed that the append_rel_list would be broken
if the proposed optimization is applied to a appendrel parent.

CREATE TABLE sale_1() INHERITS(sale);
CREATE TABLE sale_2() INHERITS(sale);

EXPLAIN SELECT count(*) FROM sale;
  QUERY PLAN
--
 Finalize Aggregate  (cost=0.01..0.02 rows=1 width=0)
   ->  Result  (cost=0.00..0.01 rows=1 width=0)
 One-Time Filter: false
(3 rows)

Moreover, would partial aggregation work below Append?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread David Rowley
On 28 September 2015 at 20:36, Amit Langote 
wrote:

>
> Did you perhaps attach a version of the patch you didn't intend to?
>

Oops. It seems so.

Please find the correct version attached.

Thanks for checking and letting me know.

--
David Rowley   http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services


group_before_join_2015-09-28_c059c53.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread Amit Langote
On 2015/09/28 13:31, David Rowley wrote:
> I've been spending time working on allowing the planner to perform
> aggregation before the final join relation is created.
>
...

>
> The patch is however so far capable of giving us extremely nice performance
> improvements for some (likely artificial) queries.
> 
> Let's look at a quick example:
> 
> CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT
> NULL, PRIMARY KEY(product_id));
> CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code);
> -- create small list of products
> INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM
> generate_series(1,100) g(id);
> 
> CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity
> INT NOT NULL);
> 
> INSERT INTO sale (sale_id, product_id,quantity) SELECT
> x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM
> generate_series(1,1) x(x);
> 
> ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id);
> 
> test=# SELECT count(sale.sale_id) FROM sale, product;
> count
> -
>  100
> (1 row)
> Time: 10323.053 ms
> 
> 
> And if I disable the optimisation:
> 
> test=# set enable_earlygrouping = off;
> SET
> Time: 0.302 ms
> test=# SELECT count(sale.sale_id) FROM sale, product;
> count
> -
>  100
> (1 row)
> Time: 775790.764 ms
> 
> So, in this probably rather unlikely query, we get something around a 7500%
> performance increase. Of course as the ratio of groups per underlying
> tuples increase, the performance increase will tail off.
> 
> The explain output from the optimised version is as follows:
> 
>  QUERY PLAN
> 
>  Finalize Aggregate  (cost=1790544.37..1790544.38 rows=1 width=4)
>->  Nested Loop  (cost=1790541.10..1790544.12 rows=100 width=4)
>  ->  Partial Aggregate  (cost=1790541.10..1790541.11 rows=1 width=4)
>->  Seq Scan on sale  (cost=0.00..1540541.08 rows=10008
> width=4)
>  ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)
> 
> 

Did you perhaps attach a version of the patch you didn't intend to?

I get the following plan and hence a different result from what's shown above:

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
   QUERY PLAN


 Aggregate  (cost=17909.27..17909.28 rows=1 width=4)
   ->  Nested Loop  (cost=17906.00..17909.02 rows=100 width=4)
 ->  Aggregate  (cost=17906.00..17906.01 rows=1 width=4)
   ->  Seq Scan on sale  (cost=0.00..15406.00 rows=100
width=4)
 ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
 count
---
   100
(1 row)

postgres=# set enable_earlygrouping = off;
SET

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
QUERY PLAN
---
 Aggregate  (cost=1515408.25..1515408.26 rows=1 width=4)
   ->  Nested Loop  (cost=0.00..1265408.25 rows=1 width=4)
 ->  Seq Scan on sale  (cost=0.00..15406.00 rows=100 width=4)
 ->  Materialize  (cost=0.00..2.50 rows=100 width=0)
   ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)
(5 rows)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
   count
---
 1
(1 row)

Am I missing something?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers