On 8 March 2016 at 11:15, David Rowley <david.row...@2ndquadrant.com> wrote:
> The setrefs.c parts of the patch remain completely broken. I've not
> had time to look at this again yet, sorry.

Ok, so again, apologies for previously sending such a broken patch.
I've since managed to free up a bit of time to work on this, which now
consists of a good bit more than the sum total of my weekly lunch

The attached patch is a much more complete patch, and quite possible
now review worthy.

I set about solving the setrefs.c problem by inventing a PartialAggref
node type which wraps up Aggrefs when they're in a agg node which has
finalizeAggs = false. This node type exists all the way until executor
init time, where it's then removed and replaced with the underlying
Aggref. This seems to solve the targetlist return type issue.
I'd really like some feedback on this method of solving that problem.

I've also fixed numerous other bugs, including the HAVING clause problem.

Things left to do:
1. Make a final pass of the patch not at 3am.
2. Write some tests.
3. I've probably missed a few places that should handle T_PartialAggref.

A couple of things which I'm not 100% happy with.

1. make_partialgroup_input_target() doing lookups to the syscache.
Perhaps this job can be offloaded to a new function in a more suitable
location. Ideally the Aggref would already store the required
information, but I don't see a great place to be looking that up.
2. I don't really like how I had to add tlist to
create_grouping_paths(), but I didn't want to go to the trouble of
calculating the partial agg PathTarget if Parallel Aggregation is not
possible, as this does do syscache lookups, so it's not free, so I'd
rather only do it if we're actually going to add some parallel paths.
3. Something about the force_parallel_mode GUC. I'll think about this
when I start to think about how to test this, as likely I'll need
that, else I'd have to create tables bigger than we'd want to in the
regression tests.

I've also attached an .sql file with an aggregate function aimed to
test the new PartialAggref stuff works properly, as previously it
seemed to work by accident with sum(int).

Just some numbers to maybe make this more interesting:

create table t (num int not null, one int not null, ten int not null,
hundred int not null, thousand int not null, tenk int not null,
hundredk int not null, million int not null);
insert into t select
x,1,x%10+1,x%100+1,x%1000+1,x%10000+1,x%100000+1,x%1000000 from

-- Serial Plan
# explain select sum(num) from t;
                            QUERY PLAN
 Aggregate  (cost=198530.00..198530.01 rows=1 width=8)
   ->  Seq Scan on t  (cost=0.00..173530.00 rows=10000000 width=4)

# select sum(num) from t;
(1 row)
Time: 1036.119 ms

# set max_parallel_degree=4;

-- Parallel Plan
# explain select sum(num) from t;
                                      QUERY PLAN
 Finalize Aggregate  (cost=105780.52..105780.53 rows=1 width=8)
   ->  Gather  (cost=105780.00..105780.51 rows=5 width=8)
         Number of Workers: 4
         ->  Partial Aggregate  (cost=104780.00..104780.01 rows=1 width=8)
               ->  Parallel Seq Scan on t  (cost=0.00..98530.00
rows=2500000 width=4)
(5 rows)

# select sum(num) from t;
(1 row)

Time: 379.117 ms

I'll try and throw a bit parallel aggregate work to a 4 socket / 64
core server which I have access to... just for fun.

Reviews are now welcome.

 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment: parallel_aggregation_cc3763e_2016-03-11.patch
Description: Binary data

Attachment: parallel_aggregation_test.sql
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to