subject:"\[HACKERS\] Parallel Aggregate"

Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs

2016-04-13 Thread Robert Haas

On Tue, Apr 12, 2016 at 5:38 PM, David Rowley
 wrote:
>>> One small point which I was a little unsure of in the attached is,
>>> should the "if (aggref->aggdirectargs)" part of
>>> count_agg_clauses_walker() be within the "if
>>> (!context->combineStates)". I simply couldn't decide. We currently
>>> have no aggregates which this affects anyway, per; select * from
>>> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
>>> I've left it outwith.
>>
>> The direct arguments would be evaluated in the worker, but not in the
>> leader, right?  Or am I confused?
>
> That seems right, but I just can't think of how its possible to
> parallelise these aggregates anyway.

Well, if you could ensure that each worker would see a whole group,
you could do it, I think.  But it's probably fine to just leave this
for now.  It's not like it can't be changed if somebody figures out
some cool thing to do in this area.

>>> Another thing I thought of is that it's not too nice that I have to
>>> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
>>> was tempted to invent some bitmask flags for this, then modify
>>> create_agg_path() to use the same flags, but I thought I'd better not
>>> cause too much churn with this patch.
>>
>> I'm kinda tempted to say this should be using an enum.  I note that
>> serialStates has a subtly different meaning here than in some other
>> places where you have used the same term.
>
> hmm, I'm not sure how it's subtly different. Do you mean the
> preference towards costing the finalfn when finalizeAggs is true, and
> ignoring the serialfn in this case? nodeAgg.c should do the same,
> although it'll deserialize in such a case. We can never finalize and
> serialize in the same node.

I mean that, IIUC, in some other places where you use serialStates,
true means that (de)serialization is known to be needed.  Here,
however, it only means it might be needed, contingent on whether the
serial/deserial functions are actually present.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs

2016-04-12 Thread David Rowley

On 13 April 2016 at 08:52, Robert Haas  wrote:
> On Sun, Apr 10, 2016 at 8:47 AM, David Rowley
>  wrote:
>> I realised a few days ago that the parallel aggregate code does not
>> cost for the combine, serialisation and deserialisation functions at
>> all.
>
> Oops.
>
>> I've attached a patch which fixes this.
>
> I've committed this patch.  I wonder if it's going to produce compiler
> warnings for some people, complaining about possible use of an
> uninitialized variable.  That would kind of suck.  I don't much mind
> having to insert a dummy assignment to shut the compiler up; a smarter
> compiler will just throw it out anyway.  I'm less enthused about a
> dummy MemSet.  The compiler is less likely to be able to get rid of
> that, and it's more expensive if it doesn't.  But let's see what
> happens.

Thanks for committing.

I wondered that too, so checked a couple of compilers and got no
warnings, but the buildfarm should let us know. The other option would
be to palloc() them, and have them set to NULL initially... that's not
very nice either... Another option would be to protect the final
parallel path generation with if (grouped_rel->partial_pathlist &&
grouped_rel->consider_parallel). I'd imagine any compiler smart enough
to work out that uninitialised is not possible would also be able to
remove the check for grouped_rel->consider_parallel, but *shrug*, I
don't often look at the assembly that compilers generate, so I might
be giving them too much credit.

>> One small point which I was a little unsure of in the attached is,
>> should the "if (aggref->aggdirectargs)" part of
>> count_agg_clauses_walker() be within the "if
>> (!context->combineStates)". I simply couldn't decide. We currently
>> have no aggregates which this affects anyway, per; select * from
>> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
>> I've left it outwith.
>
> The direct arguments would be evaluated in the worker, but not in the
> leader, right?  Or am I confused?

That seems right, but I just can't think of how its possible to
parallelise these aggregates anyway.

>> Another thing I thought of is that it's not too nice that I have to
>> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
>> was tempted to invent some bitmask flags for this, then modify
>> create_agg_path() to use the same flags, but I thought I'd better not
>> cause too much churn with this patch.
>
> I'm kinda tempted to say this should be using an enum.  I note that
> serialStates has a subtly different meaning here than in some other
> places where you have used the same term.

hmm, I'm not sure how it's subtly different. Do you mean the
preference towards costing the finalfn when finalizeAggs is true, and
ignoring the serialfn in this case? nodeAgg.c should do the same,
although it'll deserialize in such a case. We can never finalize and
serialize in the same node.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs

2016-04-12 Thread Robert Haas

On Sun, Apr 10, 2016 at 8:47 AM, David Rowley
 wrote:
> I realised a few days ago that the parallel aggregate code does not
> cost for the combine, serialisation and deserialisation functions at
> all.

Oops.

> I've attached a patch which fixes this.

I've committed this patch.  I wonder if it's going to produce compiler
warnings for some people, complaining about possible use of an
uninitialized variable.  That would kind of suck.  I don't much mind
having to insert a dummy assignment to shut the compiler up; a smarter
compiler will just throw it out anyway.  I'm less enthused about a
dummy MemSet.  The compiler is less likely to be able to get rid of
that, and it's more expensive if it doesn't.  But let's see what
happens.

> One small point which I was a little unsure of in the attached is,
> should the "if (aggref->aggdirectargs)" part of
> count_agg_clauses_walker() be within the "if
> (!context->combineStates)". I simply couldn't decide. We currently
> have no aggregates which this affects anyway, per; select * from
> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
> I've left it outwith.

The direct arguments would be evaluated in the worker, but not in the
leader, right?  Or am I confused?

> Another thing I thought of is that it's not too nice that I have to
> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
> was tempted to invent some bitmask flags for this, then modify
> create_agg_path() to use the same flags, but I thought I'd better not
> cause too much churn with this patch.

I'm kinda tempted to say this should be using an enum.  I note that
serialStates has a subtly different meaning here than in some other
places where you have used the same term.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs

2016-04-10 Thread David Rowley

Hi,

I realised a few days ago that the parallel aggregate code does not
cost for the combine, serialisation and deserialisation functions at
all.

I've attached a patch which fixes this.

One small point which I was a little unsure of in the attached is,
should the "if (aggref->aggdirectargs)" part of
count_agg_clauses_walker() be within the "if
(!context->combineStates)". I simply couldn't decide. We currently
have no aggregates which this affects anyway, per; select * from
pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
I've left it outwith.

Another thing I thought of is that it's not too nice that I have to
pass 3 bools to count_agg_clauses() in order to tell it what to do. I
was tempted to invent some bitmask flags for this, then modify
create_agg_path() to use the same flags, but I thought I'd better not
cause too much churn with this patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_aggs_costs_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-21 Thread James Sewell

Good news!

On Tuesday, 22 March 2016, David Rowley 
wrote:

> On 22 March 2016 at 02:35, Robert Haas  > wrote:
> > I have committed this after changing some of the comments.
> >
> > There might still be bugs ... but I don't see them.  And the speedups
> > look very impressive.
> >
> > Really nice work, David.
>
> Thanks for that, and thank you for taking the time to carefully review
> it and commit it.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-21 Thread David Rowley

On 22 March 2016 at 02:35, Robert Haas  wrote:
> I have committed this after changing some of the comments.
>
> There might still be bugs ... but I don't see them.  And the speedups
> look very impressive.
>
> Really nice work, David.

Thanks for that, and thank you for taking the time to carefully review
it and commit it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-21 Thread Robert Haas

On Sun, Mar 20, 2016 at 7:30 PM, David Rowley
 wrote:
> An updated patch is attached. This hopefully addresses your concerns
> with the comment, and also the estimate_hashagg_tablesize() NULL
> checking.

I have committed this after changing some of the comments.

There might still be bugs ... but I don't see them.  And the speedups
look very impressive.

Really nice work, David.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-21 Thread Robert Haas

On Sun, Mar 20, 2016 at 11:24 PM, Alvaro Herrera
 wrote:
> David Rowley wrote:
>> On 21 March 2016 at 15:48, Alvaro Herrera  wrote:
>> > David Rowley wrote:
>> >
>> >> I've rewritten the comment to become:
>> >>
>> >> /*
>> >> * Providing that the estimated size of the hashtable does not exceed
>> >> * work_mem, we'll generate a HashAgg Path, although if we were unable
>> >> * to sort above, then we'd better generate a Path, so that we at least
>> >> * have one.
>> >> */
>> >>
>> >> How about that?
>> >
>> > I think "Providing" should be "Provided".
>>
>> Both make sense, although I do only see instances of "Provided that"
>> in the source.
>
> Interesting.  "Providing that" seems awkward to me, and I had only seen
> the other wording thus far, but
> http://english.stackexchange.com/questions/149459/what-is-the-difference-between-providing-that-and-provided-that
> explains that I'm wrong.

Well, my instincts are the same as yours, actually...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-20 Thread David Rowley

On 21 March 2016 at 15:48, Alvaro Herrera  wrote:
> David Rowley wrote:
>
>> I've rewritten the comment to become:
>>
>> /*
>> * Providing that the estimated size of the hashtable does not exceed
>> * work_mem, we'll generate a HashAgg Path, although if we were unable
>> * to sort above, then we'd better generate a Path, so that we at least
>> * have one.
>> */
>>
>> How about that?
>
> I think "Providing" should be "Provided".

Both make sense, although I do only see instances of "Provided that"
in the source.
I'm not opposed to changing it, it just does not seem worth emailing a
complete patch to do that, but let me know if you feel differently.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-20 Thread Alvaro Herrera

David Rowley wrote:
 
> I've rewritten the comment to become:
> 
> /*
> * Providing that the estimated size of the hashtable does not exceed
> * work_mem, we'll generate a HashAgg Path, although if we were unable
> * to sort above, then we'd better generate a Path, so that we at least
> * have one.
> */
> 
> How about that?

I think "Providing" should be "Provided".

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-20 Thread Tomas Vondra


Hi,

On 03/21/2016 12:30 AM, David Rowley wrote:

On 21 March 2016 at 09:47, Tomas Vondra  wrote:

 ...
I'm not sure changing the meaning of enable_hashagg like this is a
good idea. It worked as a hard switch before, while with this
change that would not be the case. Or more accurately - it would
not be the case for aggregates, but it would still work the old way
for other types of plans. Not sure that's a particularly good
idea.


Hmm, I don't see how it was a hard switch before. If we were unable
to sort by the group by clause then hashagg would magically be
enabled.


Sure, but that's not what I meant by the "hard switch" (sorry for the 
inaccuracy). Let's assume we can actually so the sorted aggregate. In 
that case the enable_hashagg=off entirely skipped the hashagg path, 
irrespectively of the cost or whatever. With the new code we will add 
the path, and it may actually win on the basis of cost (e.g. if there's 
enable_sort=off at the same time).


But I'm not convinced this is actually wrong, I merely pointed out the 
behavior is not exactly the same and may have unintended consequences.



The reason I did this was to simplify the logic in
create_grouping_paths(). What difference do you imagine that there
actually is here?


That the hashed path may win over the sorted path, as explained above.


The only thing I can think of is; we now generate a hashagg path
where we previously didn't. This has disable_cost added to the
startup_cost so is quite unlikely to win. Perhaps there is some
differences if someone did SET enable_sort = false; SET
enable_hashagg = false; I'm not sure if we should be worried there
though. Also maybe there's going to be a difference if the plan
costings were so high that disable_cost was drowned out by the other
costings.


Ah, I see you came to the same conclusion ...



Apart from that, It would actually be nice to be consistent with
this enable_* GUCs, as to my knowledge the others all just add
disable_cost to the startup_cost of the path.


Perhaps.




What about introducing a GUC to enable parallel aggregate, while
still allowing other types of parallel nodes (I guess that would be
handy for other types of parallel nodes - it's a bit blunt tool,
but tweaking max_parallel_degree is even blumter)? e.g.
enable_parallelagg?


Haribabu this in his version of the patch and I didn't really
understand the need for it, I assumed it was for testing only. We
don't have enable_parallelseqscan, and would we plan on adding GUCs
each time we enable a node for parallelism? I really don't think so,
we already have parallel hash join and nested loop join without GUCs
to disable them. I see no reason to add them there, and I also don't
here.


I'm not so sure about that. I certainly think we'll be debugging queries 
in a not so distant future, wishing for such GUCs.



I do also have a question about parallel aggregate vs. work_mem.
Nowadays we mostly say to users a query may allocate a multiple of
work_mem, up to one per node in the plan. Apparently with parallel
aggregate we'll have to say "multiplied by number of workers", because
each aggregate worker may allocate up to hashaggtablesize. Is that
reasonable? Shouldn't we restrict the total size of hash tables in
all workers somehow?


I did think about this, but thought, either;

1) that a project wide decision should be made on how to handle this,
not just for parallel aggregate, but parallel hash join too, which as
I understand it, for now it builds an identical hashtable per worker.

>

2) the limit is per node, per connection, and parallel aggregates have
multiple connections, so we might not be breaking our definition of
how to define work_mem, since we're still limited by max_connections
anyway.



I do agree this is not just a question for parallel aggregate, and that 
perhaps we're not breaking the definition. But I think in practice we'll 
be hitting the memory limits much more often, because parallel queries 
are very synchronized (running the same node on all parallel workers at 
exactly the same time).


Not sure if we have to reflect that in the planner, but it will probably 
impact what work_mem values are considered safe.



create_grouping_paths also contains this comment:

 /*
  * Generate HashAgg Path providing the estimated hash table size is not
  * too big, although if no other Paths were generated above, then we'll
  * begrudgingly generate one so that we actually have a Path to work
  * with.
  */

I'm not sure this is a particularly clear comment, I think the old one was
way much informative despite being a single line:

/* Consider reasons to disable hashing, but only if we can sort instead */


hmm, I find it quite clear, but perhaps that's because I wrote the
code. I'm not really sure what you're not finding clear about it to be
honest. Tom's original comment was quite generic to allow for more
reasons, but I removed one of those reasons by simplifying the logic
around

Re: [HACKERS] Parallel Aggregate

2016-03-20 Thread David Rowley

On 21 March 2016 at 09:47, Tomas Vondra  wrote:
> On 03/20/2016 09:58 AM, David Rowley wrote:
>> Thank you for the reviews. The only thing I can think to mention which
>> I've not already is that I designed estimate_hashagg_tablesize() to be
>> reusable in various places in planner.c, yet I've only made use of it
>> in create_grouping_paths(). I would imagine that it might be nice to
>> also modify the other places which perform a similar calculation to
>> use that function, but I don't think that needs to be done for this
>> patch... perhaps a follow-on cleanup.
>
> Hmmm, so how many places that could use the new function are there? I've
> only found these two:
>
>create_distinct_paths (planner.c)
>choose_hashed_setop (prepunion.c)
>
> That doesn't seem like an extremely high number, so perhaps doing it in this
> patch would be fine. However if the function is defined as static,
> choose_hashed_setop won't be able to use it anyway (well, it'll have to move
> the prototype into a header and remove the static).
>
> I wonder why we would need to allow cost_agg=NULL, though? I mean, if there
> are no costing information, wouldn't it be better to either raise an error,
> or at the very least do something like:
>
> } else
> hashentrysize += hash_agg_entry_size(0);
>
> which is what create_distinct_paths does?

Yes, it should do that... My mistake. I've ended up just removing the
NULL check as I don't want to touch create_distinct_paths() in this
patch. I'd rather leave that as a small cleanup patch for later,
although the code in create_distinct_paths() is more simple without
the aggregate sizes being calculated, so there's perhaps less of a
need to use the helper function there. If that cleanup patch
materialises then the else hashentrysize += hash_agg_entry_size(0);
can be added with that patch.

> I'm not sure changing the meaning of enable_hashagg like this is a good
> idea. It worked as a hard switch before, while with this change that would
> not be the case. Or more accurately - it would not be the case for
> aggregates, but it would still work the old way for other types of plans.
> Not sure that's a particularly good idea.

Hmm, I don't see how it was a hard switch before. If we were unable to
sort by the group by clause then hashagg would magically be enabled.
The reason I did this was to simplify the logic in
create_grouping_paths(). What difference do you imagine that there
actually is here?

The only thing I can think of is; we now generate a hashagg path where
we previously didn't. This has disable_cost added to the startup_cost
so is quite unlikely to win. Perhaps there is some differences if
someone did SET enable_sort = false; SET enable_hashagg = false; I'm
not sure if we should be worried there though. Also maybe there's
going to be a difference if the plan costings were so high that
disable_cost was drowned out by the other costings.

Apart from that, It would actually be nice to be consistent with this
enable_* GUCs, as to my knowledge the others all just add disable_cost
to the startup_cost of the path.

> What about introducing a GUC to enable parallel aggregate, while still
> allowing other types of parallel nodes (I guess that would be handy for
> other types of parallel nodes - it's a bit blunt tool, but tweaking
> max_parallel_degree is even blumter)? e.g. enable_parallelagg?

Haribabu this in his version of the patch and I didn't really
understand the need for it, I assumed it was for testing only. We
don't have enable_parallelseqscan, and would we plan on adding GUCs
each time we enable a node for parallelism? I really don't think so,
we already have parallel hash join and nested loop join without GUCs
to disable them. I see no reason to add them there, and I also don't
here.

> I do also have a question about parallel aggregate vs. work_mem. Nowadays we
> mostly say to users a query may allocate a multiple of work_mem, up to one
> per node in the plan. Apparently with parallel aggregate we'll have to say
> "multiplied by number of workers", because each aggregate worker may
> allocate up to hashaggtablesize. Is that reasonable? Shouldn't we restrict
> the total size of hash tables in all workers somehow?

I did think about this, but thought, either;

1) that a project wide decision should be made on how to handle this,
not just for parallel aggregate, but parallel hash join too, which as
I understand it, for now it builds an identical hashtable per worker.
2) the limit is per node, per connection, and parallel aggregates have
multiple connections, so we might not be breaking our definition of
how to define work_mem, since we're still limited by max_connections
anyway.

> create_grouping_paths also contains this comment:
>
>  /*
>   * Generate HashAgg Path providing the estimated hash table size is not
>   * too big, although if no other Paths were generated above, then we'll
>   * begrudgingly generate one so that we actually have

Re: [HACKERS] Parallel Aggregate

2016-03-20 Thread Tomas Vondra


Hi,

On 03/20/2016 09:58 AM, David Rowley wrote:

On 20 March 2016 at 03:19, Robert Haas  wrote:

On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
 wrote:

Updated patch is attached.


I think this looks structurally correct now, and I think it's doing
the right thing as far as parallelism is concerned.  I don't see any
obvious problems in the rest of it, either, but I haven't thought
hugely deeply about the way you are doing the costing, nor have I
totally convinced myself that all of the PathTarget and setrefs stuff
is correct.  But I think it's probably pretty close.  I'll study it
some more next week.


Thank you for the reviews. The only thing I can think to mention which
I've not already is that I designed estimate_hashagg_tablesize() to be
reusable in various places in planner.c, yet I've only made use of it
in create_grouping_paths(). I would imagine that it might be nice to
also modify the other places which perform a similar calculation to
use that function, but I don't think that needs to be done for this
patch... perhaps a follow-on cleanup.


Hmmm, so how many places that could use the new function are there? I've 
only found these two:


   create_distinct_paths (planner.c)
   choose_hashed_setop (prepunion.c)

That doesn't seem like an extremely high number, so perhaps doing it in 
this patch would be fine. However if the function is defined as static, 
choose_hashed_setop won't be able to use it anyway (well, it'll have to 
move the prototype into a header and remove the static).


I wonder why we would need to allow cost_agg=NULL, though? I mean, if 
there are no costing information, wouldn't it be better to either raise 
an error, or at the very least do something like:


} else
hashentrysize += hash_agg_entry_size(0);

which is what create_distinct_paths does?

I'm not sure changing the meaning of enable_hashagg like this is a good 
idea. It worked as a hard switch before, while with this change that 
would not be the case. Or more accurately - it would not be the case for 
aggregates, but it would still work the old way for other types of 
plans. Not sure that's a particularly good idea.


What about introducing a GUC to enable parallel aggregate, while still 
allowing other types of parallel nodes (I guess that would be handy for 
other types of parallel nodes - it's a bit blunt tool, but tweaking 
max_parallel_degree is even blumter)? e.g. enable_parallelagg?


I do also have a question about parallel aggregate vs. work_mem. 
Nowadays we mostly say to users a query may allocate a multiple of 
work_mem, up to one per node in the plan. Apparently with parallel 
aggregate we'll have to say "multiplied by number of workers", because 
each aggregate worker may allocate up to hashaggtablesize. Is that 
reasonable? Shouldn't we restrict the total size of hash tables in all 
workers somehow?


create_grouping_paths also contains this comment:

 /*
  * Generate HashAgg Path providing the estimated hash table size is not
  * too big, although if no other Paths were generated above, then we'll
  * begrudgingly generate one so that we actually have a Path to work
  * with.
  */

I'm not sure this is a particularly clear comment, I think the old one 
was way much informative despite being a single line:


/* Consider reasons to disable hashing, but only if we can sort instead */

BTW create_grouping_paths probably grew to a size when splitting it into 
smaller pieces would be helpful?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-20 Thread Robert Haas

On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
 wrote:
> I've attached an updated patch.

This looks substantially better than earlier versions, although I
haven't studied every detail of it yet.

+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial

I understand why partial aggregation doesn't work if you have an ORDER
BY clause attached to the aggregate itself, but it's not so obvious to
me that using DISTINCT should rule it out.  I guess we can do it that
way for now, but it seems aggregate-specific - e.g. AVG() can't cope
with DISTINCT, but MIN() or MAX() wouldn't care.  Maybe MIN() and
MAX() are the outliers in this regard, but they are a pretty common
case.

+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate

I think you should adjust the punctuation to "with a single executor
Agg node; however, there are".  And maybe drop the word "executor".

And on the next line, I'd add a comma: "such as parallel aggregation,
when we want".

astate->xprstate.evalfunc =
(ExprStateEvalFunc) ExecEvalAggref;
-   if (parent && IsA(parent, AggState))
+   if (!aggstate || !IsA(aggstate, AggState))
{
-   AggState   *aggstate =
(AggState *) parent;
-
-   aggstate->aggs = lcons(astate,
aggstate->aggs);
-   aggstate->numaggs++;
+   /* planner messed up */
+   elog(ERROR, "Aggref found in
non-Agg plan node");
}
-   else
+   if (aggref->aggpartial ==
aggstate->finalizeAggs)
{
/* planner messed up */
-   elog(ERROR, "Aggref found in
non-Agg plan node");
+   if (aggref->aggpartial)
+   elog(ERROR, "Partial
type Aggref found in FinalizeAgg plan node");
+   else
+   elog(ERROR,
"Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+   aggstate->aggs = lcons(astate, aggstate->aggs);
+   aggstate->numaggs++;

This seems like it involves more code rearrangement than is really
necessary here.

+* Partial paths in the input rel could allow us to perform
aggregation in
+* parallel, set_grouped_rel_consider_parallel() will determine if it's
+* going to be safe to do so.

Change comma to semicolon or period.

/*
 * Generate a HashAgg Path atop of the cheapest partial path, once
 * again, we'll only do this if it looks as though the hash table won't
 * exceed work_mem.
 */

Same here.  Commas are not the way to connect two independent sentences.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-20 Thread David Rowley

On 20 March 2016 at 03:19, Robert Haas  wrote:
> On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
>  wrote:
>> Updated patch is attached.
>
> I think this looks structurally correct now, and I think it's doing
> the right thing as far as parallelism is concerned.  I don't see any
> obvious problems in the rest of it, either, but I haven't thought
> hugely deeply about the way you are doing the costing, nor have I
> totally convinced myself that all of the PathTarget and setrefs stuff
> is correct.  But I think it's probably pretty close.  I'll study it
> some more next week.

Thank you for the reviews. The only thing I can think to mention which
I've not already is that I designed estimate_hashagg_tablesize() to be
reusable in various places in planner.c, yet I've only made use of it
in create_grouping_paths(). I would imagine that it might be nice to
also modify the other places which perform a similar calculation to
use that function, but I don't think that needs to be done for this
patch... perhaps a follow-on cleanup.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread David Rowley

On 17 March 2016 at 01:19, Amit Kapila  wrote:
> Few assorted comments:
>
> 1.
>   /*
> + * Determine if it's possible to perform aggregation in parallel using
> + * multiple worker processes. We can permit this when there's at least one
> + * partial_path in input_rel, but not if the query has grouping sets,
> + * (although this likely just requires a bit more thought). We must also
> + * ensure that any aggregate functions which are present in either the
> + * target list, or in the HAVING clause all support parallel mode.
> + */
> + can_parallel = false;
> +
> + if ((parse->hasAggs || parse->groupClause != NIL) &&
> + input_rel->partial_pathlist != NIL &&
> + parse->groupingSets == NIL &&
> + root->glob->parallelModeOK)
>
> I think here you need to use has_parallel_hazard() with second parameter as
> false to ensure expressions are parallel safe. glob->parallelModeOK flag
> indicates that there is no parallel unsafe expression, but it can still
> contain parallel restricted expression.

Yes, I'd not gotten to fixing that per Robert's original comment about
it, but I think I have now.

> 2.
>  AggPath *
>  create_agg_path(PlannerInfo *root,
> @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>
> List *groupClause,
>   List *qual,
>   const AggClauseCosts
> *aggcosts,
> - double numGroups)
> + double numGroups,
> +
> bool combineStates,
> + bool finalizeAggs)
>
> Don't you need to set parallel_aware flag in this function as we do for
> create_seqscan_path()?

I don't really know the answer to that... I mean there's nothing
special done in nodeAgg.c if the node is running in a worker or in the
main process. So I guess the only difference is that EXPLAIN will read
"Parallel Partial (Hash|Group)Aggregate" instead of "Partial
(Hash|Group)Aggregate", is that desired? What's the logic behind
having "Parallel" in EXPLAIN?

> 3.
> postgres=# explain select count(*) from t1;
>   QUERY PLAN
>
> 
> --
>  Finalize Aggregate  (cost=45420.57..45420.58 rows=1 width=8)
>->  Gather  (cost=45420.35..45420.56 rows=2 width=8)
>  Number of Workers: 2
>  ->  Partial Aggregate  (cost=44420.35..44420.36 rows=1 width=8)
>->  Parallel Seq Scan on t1  (cost=0.00..44107.88 rows=124988
> wid
> th=0)
> (5 rows)
>
> Isn't it better to call it as Parallel Aggregate instead of Partial
> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
> changed to Parallel Seq Scan, so I am not able to think why it is better to
> call Partial incase of Aggregates.

I already commented on this.

> 4.
>   /*
> + * Likewise for any partial paths, although this case is more simple as
> +
>  * we don't track the cheapest path.
> + */
> + foreach(lc, current_rel->partial_pathlist)
> +
> {
> + Path   *subpath = (Path *) lfirst(lc);
> +
> + Assert(subpath->param_info ==
> NULL);
> + lfirst(lc) = apply_projection_to_path(root, current_rel,
> +
> subpath, scanjoin_target);
> + }
> +
>
>
> Can't we do this by teaching apply_projection_to_path() as done in the
> latest patch posted by me to push down the target list beneath workers [1].

Probably, but I'm not sure I want to go changing that now. The patch
is useful without it, so perhaps it can be a follow-on fix.

> 5.
> + /*
> + * If we find any aggs with an internal transtype then we must ensure
> + * that
> pointers to aggregate states are not passed to other processes,
> + * therefore we set the maximum degree
> to PAT_INTERNAL_ONLY.
> + */
> + if (aggform->aggtranstype == INTERNALOID)
> +
> context->allowedtype = PAT_INTERNAL_ONLY;
>
> In the above comment, you have refered maximum degree which is not making
> much sense to me. If it is not a typo, then can you clarify the same.

hmm. I don't quite understand the confusion. Perhaps you think of
"degree" in the parallel sense? This is talking about the levels of
degree of partial aggregation, which is nothing to do with parallel
aggregation, parallel aggregation just requires that to work. The
different. "Degree" in this sense was just meaning that PAY_ANY is the
highest degree, PAT_INTERNAL_ONLY lesser so etc. I thought the
"degree" thing was explained ok in the comment for
aggregates_allow_partial(), but perhaps I should just remove it, if
it's confusing.

Changed to:
/*
* If we find any aggs with an internal transtype then we must ensure
* that pointers to aggregate states are not passed to other processes,
* therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
*/

> 6.
> + * fix_combine_agg_expr
> + *  Like fix_upper_expr() but additionally adjusts the Aggref->args of
> + *  Aggrefs so
> that they references the corresponding Aggref in the subplan.
> + */
> +static Node *
> +fix_combine_agg_expr(PlannerInfo
> *root,
> +   Node *node,
> +   indexed_tlist *subplan_itlist,
> +
> Index newvarno,
> +   int rtoffset)
> +{
> +

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread David Rowley

On 19 March 2016 at 06:15, Robert Haas  wrote:
> On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
>  wrote:
>> I've attached an updated patch.
>
> This looks substantially better than earlier versions, although I
> haven't studied every detail of it yet.
>
> + * Partial aggregation requires that each aggregate does not have a DISTINCT 
> or
> + * ORDER BY clause, and that it also has a combine function set. Since 
> partial
>
> I understand why partial aggregation doesn't work if you have an ORDER
> BY clause attached to the aggregate itself, but it's not so obvious to
> me that using DISTINCT should rule it out.  I guess we can do it that
> way for now, but it seems aggregate-specific - e.g. AVG() can't cope
> with DISTINCT, but MIN() or MAX() wouldn't care.  Maybe MIN() and
> MAX() are the outliers in this regard, but they are a pretty common
> case.

hmm? We'd have no way to ensure that two worker processes aggregated
only values which other workers didn't.
Of course this just happens to be equivalent for MIN() and MAX(), but
today we don't attempt to transform MIN(DISTINCT col) to MIN(col), so
I see no reason at all why this patch should go and add something
along those lines. Perhaps it's something for the future though,
although it's certainly not anything specific to parallel aggregate.

> + * An Aggref can operate in one of two modes. Normally an aggregate 
> function's
> + * value is calculated with a single executor Agg node, however there are
> + * times, such as parallel aggregation when we want to calculate the 
> aggregate
>
> I think you should adjust the punctuation to "with a single executor
> Agg node; however, there are".  And maybe drop the word "executor".
>
> And on the next line, I'd add a comma: "such as parallel aggregation,
> when we want".

Fixed. Although I've revised that block a bit after getting rid of aggpartial.

>
> astate->xprstate.evalfunc =
> (ExprStateEvalFunc) ExecEvalAggref;
> -   if (parent && IsA(parent, AggState))
> +   if (!aggstate || !IsA(aggstate, AggState))
> {
> -   AggState   *aggstate =
> (AggState *) parent;
> -
> -   aggstate->aggs = lcons(astate,
> aggstate->aggs);
> -   aggstate->numaggs++;
> +   /* planner messed up */
> +   elog(ERROR, "Aggref found in
> non-Agg plan node");
> }
> -   else
> +   if (aggref->aggpartial ==
> aggstate->finalizeAggs)
> {
> /* planner messed up */
> -   elog(ERROR, "Aggref found in
> non-Agg plan node");
> +   if (aggref->aggpartial)
> +   elog(ERROR, "Partial
> type Aggref found in FinalizeAgg plan node");
> +   else
> +   elog(ERROR,
> "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
> }
> +   aggstate->aggs = lcons(astate, 
> aggstate->aggs);
> +   aggstate->numaggs++;
>
> This seems like it involves more code rearrangement than is really
> necessary here.

This is mostly gone, as after removing aggpartial some of these checks
are not possible. I just have some additional code:

Aggref   *aggref = (Aggref *) node;

if (aggstate->finalizeAggs &&
aggref->aggoutputtype != aggref->aggtype)
{
/* planner messed up */
elog(ERROR, "Aggref aggoutputtype must match aggtype");
}

But nothing to sanity check non-finalize nodes.

>
> +* Partial paths in the input rel could allow us to perform
> aggregation in
> +* parallel, set_grouped_rel_consider_parallel() will determine if 
> it's
> +* going to be safe to do so.
>
> Change comma to semicolon or period.

Changed.

> /*
>  * Generate a HashAgg Path atop of the cheapest partial path, once
>  * again, we'll only do this if it looks as though the hash table 
> won't
>  * exceed work_mem.
>  */
>
> Same here.  Commas are not the way to connect two independent sentences.

Changed.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread David Rowley

On 18 March 2016 at 20:25, Amit Kapila  wrote:
> Few more comments:
>
> 1.
>
> + if (parse->groupClause)
> + path = (Path *) create_sort_path(root,
> + grouped_rel,
> + path,
> + root->group_pathkeys,
> + -1.0);
>
> For final path, why do you want to sort just for group by case?

If there's no GROUP BY then there will only be a single group, this
does not require sorting, e.g SELECT SUM(col) from sometable;

I added the comment:

/*
* Gather is always unsorted, so we'll need to sort, unless there's
* no GROUP BY clause, in which case there will only be a single
* group.
*/

> 2.
> + path = (Path *) create_gather_path(root, partial_grouped_rel, path,
> +   NULL, _groups);
> +
> + if (parse->groupClause)
> + path = (Path *) create_sort_path(root,
> + grouped_rel,
> + path,
> + root->group_pathkeys,
> + -1.0);
> +
> + if (parse->hasAggs)
> + add_path(grouped_rel, (Path *)
> + create_agg_path(root,
> + grouped_rel,
> + path,
> + target,
> + parse->groupClause ? AGG_SORTED : AGG_PLAIN,
> + parse->groupClause,
> + (List *) parse->havingQual,
> + _costs,
> + partial_grouped_rel->rows,
> + true,
> + true));
> + else
> + add_path(grouped_rel, (Path *)
> + create_group_path(root,
> +  grouped_rel,
> +  path,
> +  target,
> +  parse->groupClause,
> +  (List *) parse->havingQual,
> +  total_groups));
>
> In above part of patch, it seems you are using number of groups
> differenetly; for create_group_path() and create_gather_path(), you have
> used total_groups whereas for create_agg_path() partial_grouped_rel->rows is
> used, is there a reason for the same?

That's a mistake... too much code shuffling yesterday it seems.

> 3.
> + if (grouped_rel->partial_pathlist)
> + {
> + Path   *path = (Path *)
> linitial(grouped_rel->partial_pathlist);
> + double total_groups;
> +
> + total_groups =
> path->rows * path->parallel_degree;
> + path = (Path *) create_gather_path(root, partial_grouped_rel,
> path,
> +   NULL, _groups);
>
> A. Won't passing partial_grouped_rel lead to incomplete information required
> by create_gather_path() w.r.t the case of parameterized path info?

There should be no parameterized path info after joins are over, but
never-the-less I took your advice about passing PathTarget to
create_gather_path(), so this partial_grouped_rel no longer exists.

> B. You have mentioned that passing grouped_rel will make gather path contain
> the information of final path target, but what is the problem with that? I
> mean to ask why Gather node is required to contain partial path target
> information instead of final path target.

Imagine a query such as: SELECT col,SUM(this) FROM sometable GROUP BY
col HAVING SUM(somecolumn) > 0;

In this case SUM(somecolumn) won't be in the final PathTarget. The
partial grouping target will contain the Aggref from the HAVING
clause. The other difference with te partial aggregate PathTarget is
that the Aggrefs return the partial state in exprType() rather than
the final value's type, which is required so the executor knows how to
form and deform tuples, plus many other things.

> C. Can we consider passing pathtarget to create_gather_path() as that seems
> to save us from inventing new UpperRelationKind? If you are worried about
> adding the new parameter (pathtarget) to create_gather_path(), then I think
> we are already passing it in many other path generation functions, so why
> not for gather path generation as well?

That's a better idea... Changed to that...

> 4A.
> Overall function create_grouping_paths() looks better than previous, but I
> think still it is difficult to read.  I think it can be improved by
> generating partial aggregate paths separately as we do for nestloop join
> refer function consider_parallel_nestloop

hmm, perhaps the partial path generation could be moved off to another
static function, although we'd need to pass quite a few parameters to
it, like can_sort, can_hash, partial_grouping_target, grouped_rel,
root. Perhaps it's worth doing, but we still need the
partial_grouping_target for the Gather node, so it's not like that
other function can do all of the parallel stuff... We'd still need
some knowledge of that in create_grouping_paths()

> 4B.
> Rather than directly using create_gather_path(), can't we use
> generate_gather_paths as for all places where we generate gather node,
> generate_gather_paths() is used.

I don't think this is a good fit here, although it would be nice as it
would save having to special case generating the final aggregate paths
on the top of the partial paths. It does not seem that nice as it's
not really that clear if we need to make a combine aggregate node, or
a normal aggregate node on the path. The only way to determine that
would by by checking if it was a GatherPath or not, and that does not
seem like a nice way to go about doing that. Someone might go and
invent something new like MergeGather one day.

> 5.
> +make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
> {

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread David Rowley

On 17 March 2016 at 18:05, David Rowley  wrote:
> Updated patch attached.

Please disregard
0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patch. This
contained a badly thought through last minute change to how the Gather
path is generated and is broken.

I played around with ways of generating the Gather node as
create_gather_path() is not really geared up for what's needed here,
since grouped_rel cannot be passed into create_gather_path() since it
contains the final aggregate PathTarget rather than the partial
PathTarget, and also incorrect row estimates. I've ended up with an
extra double *rows argument in this function to make it possible to
override the rows from rel. I'm not sure how this'll go down... It
does not seem perfect.

In the end I've now added a new upper planner type to allow me to
create a RelOptInfo for the partial aggregate relation, so that I can
pass create_gather_path() a relation with the correct PathTarget. This
seemed better than borrowing grouped_rel, then overriding the
reltarget after create_gather_path() returned. Although I'm willing to
consider the option that someone will disagree with how I've done
things here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Allow-aggregation-to-happen-in-parallel_2016-03-18.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Robert Haas

On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
 wrote:
> On 16 March 2016 at 15:04, Robert Haas  wrote:
>> I don't think I'd be objecting if you made PartialAggref a real
>> alternative to Aggref.  But that's not what you've got here.  A
>> PartialAggref is just a wrapper around an underlying Aggref that
>> changes the interpretation of it - and I think that's not a good idea.
>> If you want to have Aggref and PartialAggref as truly parallel node
>> types, that seems cool, and possibly better than what you've got here
>> now.  Alternatively, Aggref can do everything.  But I don't think we
>> should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.

Cool!  Why not initialize aggpartialtype always?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Haribabu Kommi

On Thu, Mar 17, 2016 at 2:13 PM, James Sewell  wrote:
>
> Hi again,
>
> This is probably me missing something, but is there a reason parallel 
> aggregate doesn't seem to ever create append nodes containing Index scans?
>
> SET random_page_cost TO 0.2;
> SET max_parallel_degree TO 8;
>
> postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
>QUERY PLAN
> -
>  Finalize GroupAggregate  (cost=310596.32..310598.03 rows=31 width=16)
>Group Key: view_time_day
>->  Sort  (cost=310596.32..310596.79 rows=186 width=16)
>  Sort Key: view_time_day
>  ->  Gather  (cost=310589.00..310589.31 rows=186 width=16)
>Number of Workers: 5
>->  Partial HashAggregate  (cost=310589.00..310589.31 rows=31 
> width=16)
>  Group Key: view_time_day
>  ->  Parallel Seq Scan on base  (cost=0.00..280589.00 
> rows=600 width=12)
>
>
> SET max_parallel_degree TO 0;
>
> postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
> QUERY PLAN
> ---
>  GroupAggregate  (cost=0.56..600085.92 rows=31 width=16)
>Group Key: view_time_day
>->  Index Only Scan using base_view_time_day_count_i_idx on base  
> (cost=0.56..450085.61 rows=3000 width=12)
> (3 rows)


To get good parallelism benefit, the workers has to execute most of
the plan in parallel.
If we run only some part of the upper plan in parallel, we may not get
better parallelism
benefit. At present only seq scan node possible for parallelism at
scan node level.
Index scan is not possible as of now. So because of this reason based
on the overall
cost of the parallel aggregate + parallel seq scan, the plan is chosen.

If index scan is changed to make it parallel in future, it is possible
that parallel aggregate +
parallel index scan plan may chosen.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread David Rowley

On 17 March 2016 at 00:57, Robert Haas  wrote:
> On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
>  wrote:
>> On 16 March 2016 at 15:04, Robert Haas  wrote:
>>> I don't think I'd be objecting if you made PartialAggref a real
>>> alternative to Aggref.  But that's not what you've got here.  A
>>> PartialAggref is just a wrapper around an underlying Aggref that
>>> changes the interpretation of it - and I think that's not a good idea.
>>> If you want to have Aggref and PartialAggref as truly parallel node
>>> types, that seems cool, and possibly better than what you've got here
>>> now.  Alternatively, Aggref can do everything.  But I don't think we
>>> should go with this wrapper concept.
>>
>> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
>> quite happy with how it turned out. I made
>> search_indexed_tlist_for_partial_aggref() to follow-on the series of
>> other search_indexed_tlist_for_* functions and have made it behave the
>> same way, by returning the newly created Var instead of doing that in
>> fix_combine_agg_expr_mutator(), as the last version did.
>>
>> Thanks for the suggestion.
>>
>> New patch attached.
>
> Cool!  Why not initialize aggpartialtype always?

Because the follow-on patch sets that to either the serialtype or the
aggtranstype, depending on if serialisation is required. Serialisation
is required for parallel aggregate, but if we're performing the
partial agg in the main process, then we'd not need to do that. This
could be solved by adding more fields to AggRef to cover the
aggserialtype and perhaps expanding aggpartial into an enum mode which
allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
attention to the mode and return 1 of the 3 possible types.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Robert Haas

On Wed, Mar 16, 2016 at 5:05 PM, David Rowley
 wrote:
>> Cool!  Why not initialize aggpartialtype always?
>
> Because the follow-on patch sets that to either the serialtype or the
> aggtranstype, depending on if serialisation is required. Serialisation
> is required for parallel aggregate, but if we're performing the
> partial agg in the main process, then we'd not need to do that. This
> could be solved by adding more fields to AggRef to cover the
> aggserialtype and perhaps expanding aggpartial into an enum mode which
> allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
> attention to the mode and return 1 of the 3 possible types.

Urk.  That might still be better than what you have right now, but
it's obviously not great.  How about ditching aggpartialtype and
adding aggoutputtype instead?  Then you can always initialize that to
whatever it's supposed to be based on the type of aggregation you are
doing, and exprType() can simply return that field.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Alvaro Herrera

I read this a bit, as an exercise to try to follow parallel query a bit.
I think the most interesting thing I have to say is that the new error
messages in ExecInitExpr do not conform to our style.  Probably just
downcase everything except Aggref and you're done, since they're
can't-happen conditions anyway.  The comments below are mostly as I
learn how the whole thing works so if I'm talking nonsense, I'm happy to
be ignored or, better yet, educated.

I think the way we set the "consider_parallel" flag is a bit odd (we
just "return" out of the function in the cases were it mustn't be set);
but that mechanism is already part of set_rel_consider_parallel and
similar to (but not quite like) longstanding routines such as
set_rel_width, so nothing new in this patch.  I find this a bit funny
coding, but then this is the planner so maybe it's okay.

I think the comment on search_indexed_tlist_for_partial_aggref is a bit
bogus; it says it returns an existing Var, but what it does is
manufacture one itself.  I *think* the code is all right, but the
comment seems misleading.

In set_combineagg_references(), there are two calls to
fix_combine_agg_expr(); I think the one hanging on the
search_indexed_tlist_for_sortgroupref call is useless; you could just
have the "if newexpr != NULL" in the outer block (after initializing to
NULL before the ressortgroupref check).

set_combineagg_references's comment says it does something "similar to
set_upper_references, and additionally" some more stuff, but reading the
code for both functions I think that's not quite true.  I think the
comment should say that both functions are parallel, but one works for
partial aggs and the other doesn't.  Actually, what happens if you feed
an agg plan with combineStates to set_upper_references?  If it still
works but the result is not optimal, maybe we should check against that
case, so as to avoid the case where somebody hacks this further and the
plans are suddenly not agg-combined anymore.  What I actually expect to
happen is that something would explode during execution; in that case
perhaps it's better to add a comment?  (In further looking at other
setrefs.c similar functions, maybe it's fine the way you have it.)

Back at make_partialgroup_input_target, the comment says "so that they
can be found later in Combine Aggregate nodes during setrefs".  I think
it's better to be explicit and say ".. can be found later during
set_combineagg_references" or something.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread David Rowley

On 17 March 2016 at 03:47, Robert Haas  wrote:
> More review comments:
>
> /*
> +* Likewise for any partial paths, although this case
> is more simple as
> +* we don't track the cheapest path.
> +*/
>
> I think in the process of getting rebased over the rapidly-evolving
> underlying substructure, this comment no longer makes much sense where
> it is in the file. IIUC, the comment is referring back to "Forcibly
> apply that target to all the Paths for the scan/join rel", but there's
> now enough other stuff in the middle that it doesn't really make sense
> any more.  And actually, I think you should move the code up higher,
> not change the comment.  This belongs before setting
> root->upper_targets[foo].

Yes, that's not where it was originally... contents may settle during transit...

> The logic in create_grouping_paths() is too ad-hoc and, as Amit and I
> have both complained about, wrong in detail because it doesn't call
> has_parallel_hazard anywhere.  Basically, you have the wrong design.
> There shouldn't be any need to check parallelModeOK here.  Rather,
> what you should be doing is setting consider_parallel to true or false
> on the upper rel.  See set_rel_consider_parallel for how this is set
> for base relations, set_append_rel_size() for append relations, and
> perhaps most illustratively build_join_rel() for join relations.  You
> should have some equivalent of this logic for upper rels, or at least
> the upper rels you care about:
>
>if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
> !has_parallel_hazard((Node *) restrictlist, false))
> joinrel->consider_parallel = true;
>
> Then, you can consider parallel aggregation if consider_parallel is
> true and any other conditions that you care about are also met.
>
> I think that the way you are considering sorted aggregation, hashed
> aggregation, and mixed strategies does not make very much sense.  It
> seems to me that what you should do is:
>
> 1. Figure out the cheapest PartialAggregate path.  You will need to
> compare the costs of (a) a hash aggregate, (b) an explicit sort +
> group aggregate, and (c) grouping a presorted path.  (We can
> technically skip (c) for right now since it can't occur.)  I would go
> ahead and use add_partial_path() on these to stuff them into the
> partial_pathlist for the upper rel.
>
> 2. Take the first (cheapest) path in the partial_pathlist and stick a
> Gather node on top of it.  Store this in a local variable, say,
> partial_aggregate_path.
>
> 3. Construct a finalize-hash-aggregate path for partial_aggregate_path
> and also a sort+finalize-group/plain-aggregate path for
> partial_aggregate_path, and add each of those to the upper rel.  They
> will either beat out the non-parallel paths or they won't.
>
> The point is that the decision as to whether to use hashing or sorting
> below the Gather is completely independent from the choice of which
> one to use above the Gather.  Pick the best strategy below the Gather;
> then pick the best strategy to stick on top of that above the Gather.

Good point. I've made local alterations to the patch so that partial
paths are now generated on the grouped_rel.
I also got rid of the enable_hashagg test in create_grouping_paths().
The use of this did seemed rather old school planner. Instead I
altered cost_agg() to add disable_cost appropriately, which simplifies
the logic of when and when not to consider hash aggregate paths. The
only downside of this that I can see is that the hash agg Path is
still generated when enable_hashagg is off, and that means a tiny bit
more work creating the path and calling add_path() for it.

This change also allows nodeGroup to be used for parallel aggregate
when there are no aggregate functions. This was a bit broken in the
old version.

>  * This is very similar to make_group_input_target(), only we do not recurse
>  * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
>  * also add any Aggrefs which are located in the HAVING clause into the
>  * PathTarget.
>  *
>  * Aggrefs are also setup into partial mode and the partial return types are
>  * set to become the type of the aggregate transition state rather than the
>  * aggregate function's return type.
>
> This comment isn't very helpful because it tells you what the function
> does, which you can find out anyway from reading the code.  What it
> should do is explain why it does it.  Just to take one particular
> point, why would we not want to recurse into Aggrefs in this case?

Good point. I've updated it to:

/*
 * make_partialgroup_input_target
 *  Generate appropriate PathTarget for input for Partial Aggregate nodes.
 *
 * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
 * we need these to remain intact so that they can be found later in  Combine
 * Aggregate nodes during setrefs. Vars will be still

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread David Rowley

On 17 March 2016 at 01:29, Robert Haas  wrote:
> On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila  wrote:
>> Isn't it better to call it as Parallel Aggregate instead of Partial
>> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
>> changed to Parallel Seq Scan, so I am not able to think why it is better to
>> call Partial incase of Aggregates.
>
> I think partial is the right terminology.  Unlike a parallel
> sequential scan, a partial aggregate isn't parallel-aware and could be
> used in contexts having nothing to do with parallelism.  It's just
> that it outputs transition values instead of a finalized value.

+1  the reason the partial aggregate patches have been kept separate
from the parallel aggregate patches is that partial aggregate will
serve for many other purposes. Parallel Aggregate is just one of many
possible use cases for this, so it makes little sense to give it a
name according to a single use case.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Robert Haas

On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
 wrote:
> Updated patch is attached.

I think this looks structurally correct now, and I think it's doing
the right thing as far as parallelism is concerned.  I don't see any
obvious problems in the rest of it, either, but I haven't thought
hugely deeply about the way you are doing the costing, nor have I
totally convinced myself that all of the PathTarget and setrefs stuff
is correct.  But I think it's probably pretty close.  I'll study it
some more next week.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Amit Kapila

On Thu, Mar 17, 2016 at 6:41 PM, David Rowley 
wrote:
>
> On 18 March 2016 at 01:22, Amit Kapila  wrote:
> > On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> >  wrote:
>
> Updated patch is attached. Thanks for the re-review.
>

Few more comments:

1.

+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);

For final path, why do you want to sort just for group by case?

2.
+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,
+   NULL, _groups);
+
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ _costs,
+ partial_grouped_rel->rows,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+  grouped_rel,
+  path,
+  target,
+  parse->groupClause,
+  (List *) parse->havingQual,
+  total_groups));

In above part of patch, it seems you are using number of groups
differenetly; for create_group_path() and create_gather_path(), you have
used total_groups whereas for create_agg_path()
partial_grouped_rel->rows is used, is there a reason for the same?

3.
+ if (grouped_rel->partial_pathlist)
+ {
+ Path   *path = (Path *)
linitial(grouped_rel->partial_pathlist);
+ double total_groups;
+
+ total_groups =
path->rows * path->parallel_degree;
+ path = (Path *) create_gather_path(root, partial_grouped_rel,
path,
+   NULL, _groups);

A. Won't passing partial_grouped_rel lead to incomplete information
required by create_gather_path() w.r.t the case of parameterized path info?
B. You have mentioned that passing grouped_rel will make gather path
contain the information of final path target, but what is the problem with
that? I mean to ask why Gather node is required to contain partial path
target information instead of final path target.
C. Can we consider passing pathtarget to create_gather_path() as that seems
to save us from inventing new UpperRelationKind? If you are worried about
adding the new parameter (pathtarget) to create_gather_path(), then I think
we are already passing it in many other path generation functions, so why
not for gather path generation as well?

4A.
Overall function create_grouping_paths() looks better than previous, but I
think still it is difficult to read.  I think it can be improved by
generating partial aggregate paths separately as we do for nestloop
join refer function consider_parallel_nestloop

4B.
Rather than directly using create_gather_path(), can't we use
generate_gather_paths as for all places where we generate gather node,
generate_gather_paths() is used.

5.
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
{
..
..
+ foreach(lc, final_target->exprs)
+ {
+ Expr   *expr = (Expr *) lfirst(lc);
+
+
i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target-
>sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+
!= NULL)
+ {
+ /*
+
 * It's a grouping column, so add it to the input target as-is.
+ */
+
add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+
}
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+
* call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+
}
..
}

Do we want to achieve something different in the above foreach loop than
the similar loop in make_group_input_target(), if not then why are they not
exactly same?

6.
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root,
input_target);
+ return input_target;

Can't we use return set_pathtarget_cost_width() directly rather than
fetching it in input_target and then returning input_target?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Amit Kapila

On Wed, Mar 16, 2016 at 4:19 PM, David Rowley 
wrote:
>
> On 16 March 2016 at 15:04, Robert Haas  wrote:
> > I don't think I'd be objecting if you made PartialAggref a real
> > alternative to Aggref.  But that's not what you've got here.  A
> > PartialAggref is just a wrapper around an underlying Aggref that
> > changes the interpretation of it - and I think that's not a good idea.
> > If you want to have Aggref and PartialAggref as truly parallel node
> > types, that seems cool, and possibly better than what you've got here
> > now.  Alternatively, Aggref can do everything.  But I don't think we
> > should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.
>

Few assorted comments:

1.
  /*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK)

I think here you need to use has_parallel_hazard() with second parameter as
false to ensure expressions are parallel safe. glob->parallelModeOK flag
indicates that there is no parallel unsafe expression, but it can still
contain parallel restricted expression.


2.
 AggPath *
 create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,

List *groupClause,
  List *qual,
  const AggClauseCosts
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)

Don't you need to set parallel_aware flag in this function as we do for
create_seqscan_path()?

3.
postgres=# explain select count(*) from t1;
  QUERY PLAN


--
 Finalize Aggregate  (cost=45420.57..45420.58 rows=1 width=8)
   ->  Gather  (cost=45420.35..45420.56 rows=2 width=8)
 Number of Workers: 2
 ->  Partial Aggregate  (cost=44420.35..44420.36 rows=1 width=8)
   ->  Parallel Seq Scan on t1  (cost=0.00..44107.88
rows=124988 wid
th=0)
(5 rows)

Isn't it better to call it as Parallel Aggregate instead of Partial
Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
changed to Parallel Seq Scan, so I am not able to think why it is better to
call Partial incase of Aggregates.

4.
  /*
+ * Likewise for any partial paths, although this case is more simple as
+
 * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+
{
+ Path   *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info ==
NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+
subpath, scanjoin_target);
+ }
+


Can't we do this by teaching apply_projection_to_path() as done in the
latest patch posted by me to push down the target list beneath workers [1].

5.
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that
pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree
to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+
context->allowedtype = PAT_INTERNAL_ONLY;

In the above comment, you have refered maximum degree which is not making
much sense to me. If it is not a typo, then can you clarify the same.

6.
+ * fix_combine_agg_expr
+ *  Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ *  Aggrefs so
that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo
*root,
+   Node *node,
+   indexed_tlist *subplan_itlist,
+
Index newvarno,
+   int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root =
root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+
return fix_combine_agg_expr_mutator(node, );
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node,
fix_upper_expr_context *context)

Don't we want to handle the case of context->subplan_itlist->has_non_vars
as it is handled in fix_upper_expr_mutator()?  If not then, I think adding
the reason for same in comments above function would be better.

7.
tlist.c

+}
\ No

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Robert Haas

On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila  wrote:
> Isn't it better to call it as Parallel Aggregate instead of Partial
> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
> changed to Parallel Seq Scan, so I am not able to think why it is better to
> call Partial incase of Aggregates.

I think partial is the right terminology.  Unlike a parallel
sequential scan, a partial aggregate isn't parallel-aware and could be
used in contexts having nothing to do with parallelism.  It's just
that it outputs transition values instead of a finalized value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Robert Haas

On Wed, Mar 16, 2016 at 7:57 AM, Robert Haas  wrote:
> On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
>  wrote:
>> On 16 March 2016 at 15:04, Robert Haas  wrote:
>>> I don't think I'd be objecting if you made PartialAggref a real
>>> alternative to Aggref.  But that's not what you've got here.  A
>>> PartialAggref is just a wrapper around an underlying Aggref that
>>> changes the interpretation of it - and I think that's not a good idea.
>>> If you want to have Aggref and PartialAggref as truly parallel node
>>> types, that seems cool, and possibly better than what you've got here
>>> now.  Alternatively, Aggref can do everything.  But I don't think we
>>> should go with this wrapper concept.
>>
>> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
>> quite happy with how it turned out. I made
>> search_indexed_tlist_for_partial_aggref() to follow-on the series of
>> other search_indexed_tlist_for_* functions and have made it behave the
>> same way, by returning the newly created Var instead of doing that in
>> fix_combine_agg_expr_mutator(), as the last version did.
>>
>> Thanks for the suggestion.
>>
>> New patch attached.
>
> Cool!  Why not initialize aggpartialtype always?

More review comments:

/*
+* Likewise for any partial paths, although this case
is more simple as
+* we don't track the cheapest path.
+*/

I think in the process of getting rebased over the rapidly-evolving
underlying substructure, this comment no longer makes much sense where
it is in the file. IIUC, the comment is referring back to "Forcibly
apply that target to all the Paths for the scan/join rel", but there's
now enough other stuff in the middle that it doesn't really make sense
any more.  And actually, I think you should move the code up higher,
not change the comment.  This belongs before setting
root->upper_targets[foo].

The logic in create_grouping_paths() is too ad-hoc and, as Amit and I
have both complained about, wrong in detail because it doesn't call
has_parallel_hazard anywhere.  Basically, you have the wrong design.
There shouldn't be any need to check parallelModeOK here.  Rather,
what you should be doing is setting consider_parallel to true or false
on the upper rel.  See set_rel_consider_parallel for how this is set
for base relations, set_append_rel_size() for append relations, and
perhaps most illustratively build_join_rel() for join relations.  You
should have some equivalent of this logic for upper rels, or at least
the upper rels you care about:

   if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
!has_parallel_hazard((Node *) restrictlist, false))
joinrel->consider_parallel = true;

Then, you can consider parallel aggregation if consider_parallel is
true and any other conditions that you care about are also met.

I think that the way you are considering sorted aggregation, hashed
aggregation, and mixed strategies does not make very much sense.  It
seems to me that what you should do is:

1. Figure out the cheapest PartialAggregate path.  You will need to
compare the costs of (a) a hash aggregate, (b) an explicit sort +
group aggregate, and (c) grouping a presorted path.  (We can
technically skip (c) for right now since it can't occur.)  I would go
ahead and use add_partial_path() on these to stuff them into the
partial_pathlist for the upper rel.

2. Take the first (cheapest) path in the partial_pathlist and stick a
Gather node on top of it.  Store this in a local variable, say,
partial_aggregate_path.

3. Construct a finalize-hash-aggregate path for partial_aggregate_path
and also a sort+finalize-group/plain-aggregate path for
partial_aggregate_path, and add each of those to the upper rel.  They
will either beat out the non-parallel paths or they won't.

The point is that the decision as to whether to use hashing or sorting
below the Gather is completely independent from the choice of which
one to use above the Gather.  Pick the best strategy below the Gather;
then pick the best strategy to stick on top of that above the Gather.

 * This is very similar to make_group_input_target(), only we do not recurse
 * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
 * also add any Aggrefs which are located in the HAVING clause into the
 * PathTarget.
 *
 * Aggrefs are also setup into partial mode and the partial return types are
 * set to become the type of the aggregate transition state rather than the
 * aggregate function's return type.

This comment isn't very helpful because it tells you what the function
does, which you can find out anyway from reading the code.  What it
should do is explain why it does it.  Just to take one particular
point, why would we not want to recurse into Aggrefs in this case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread Amit Kapila

On Thu, Mar 17, 2016 at 10:35 AM, David Rowley 
wrote:
>
> On 17 March 2016 at 01:19, Amit Kapila  wrote:
> > Few assorted comments:
> >
> > 2.
> >  AggPath *
> >  create_agg_path(PlannerInfo *root,
> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
> >
> > List *groupClause,
> >   List *qual,
> >   const AggClauseCosts
> > *aggcosts,
> > - double numGroups)
> > + double numGroups,
> > +
> > bool combineStates,
> > + bool finalizeAggs)
> >
> > Don't you need to set parallel_aware flag in this function as we do for
> > create_seqscan_path()?
>
> I don't really know the answer to that... I mean there's nothing
> special done in nodeAgg.c if the node is running in a worker or in the
> main process.
>

On again thinking about it, I think it is okay to set parallel_aware flag
as false.  This flag means whether that particular node has any parallelism
behaviour which is true for seqscan, but I think not for partial aggregate
node.

Few other comments on latest patch:

1.

+ /*
+ * XXX does this need estimated for each partial path, or are they
+ * all
going to be the same anyway?
+ */
+ dNumPartialGroups = get_number_of_groups(root,
+
clamp_row_est(partial_aggregate_path->rows),
+
rollup_lists,
+
rollup_groupclauses);

For considering partial groups, do we need to rollup related lists?

2.
+ hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
+
 _costs,
+
 dNumPartialGroups);
+
+ /*
+ * Generate a
hashagg Path, if we can, but we'll skip this if the hash
+ * table looks like it'll exceed work_mem.
+
*/
+ if (can_hash && hashaggtablesize < work_mem * 1024L)


hash table size should be estimated only if can_hash is true.

3.
+ foreach(lc, grouped_rel->partial_pathlist)
+ {
+ Path   *path =
(Path *) lfirst(lc);
+ double total_groups;
+
+ total_groups = path-
>parallel_degree * path->rows;
+
+ path = (Path *) create_gather_path(root, grouped_rel, path,
NULL,
+   _groups);

Do you need to perform it foreach partial path or just do it for
firstpartial path?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread James Sewell

Hi again,

This is probably me missing something, but is there a reason parallel
aggregate doesn't seem to ever create append nodes containing Index scans?

SET random_page_cost TO 0.2;
SET max_parallel_degree TO 8;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
   QUERY PLAN
-
 Finalize GroupAggregate  (cost=310596.32..310598.03 rows=31 width=16)
   Group Key: view_time_day
   ->  Sort  (cost=310596.32..310596.79 rows=186 width=16)
 Sort Key: view_time_day
 ->  Gather  (cost=310589.00..310589.31 rows=186 width=16)
   Number of Workers: 5
   ->  Partial HashAggregate  (cost=310589.00..310589.31
rows=31 width=16)
 Group Key: view_time_day
 ->  Parallel Seq Scan on base  (cost=0.00..280589.00
rows=600 width=12)


SET max_parallel_degree TO 0;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
QUERY PLAN
---
 GroupAggregate  (cost=0.56..600085.92 rows=31 width=16)
   Group Key: view_time_day
   ->  Index Only Scan using base_view_time_day_count_i_idx on base
 (cost=0.56..450085.61 rows=3000 width=12)
(3 rows)


Cheers,


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Mar 17, 2016 at 8:08 AM, David Rowley 
wrote:

> On 17 March 2016 at 01:29, Robert Haas  wrote:
> > On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila 
> wrote:
> >> Isn't it better to call it as Parallel Aggregate instead of Partial
> >> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
> >> changed to Parallel Seq Scan, so I am not able to think why it is
> better to
> >> call Partial incase of Aggregates.
> >
> > I think partial is the right terminology.  Unlike a parallel
> > sequential scan, a partial aggregate isn't parallel-aware and could be
> > used in contexts having nothing to do with parallelism.  It's just
> > that it outputs transition values instead of a finalized value.
>
> +1  the reason the partial aggregate patches have been kept separate
> from the parallel aggregate patches is that partial aggregate will
> serve for many other purposes. Parallel Aggregate is just one of many
> possible use cases for this, so it makes little sense to give it a
> name according to a single use case.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-18 Thread David Rowley

On 18 March 2016 at 01:22, Amit Kapila  wrote:
> On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
>  wrote:
>>
>> On 17 March 2016 at 01:19, Amit Kapila  wrote:
>> > Few assorted comments:
>> >
>> > 2.
>> >  AggPath *
>> >  create_agg_path(PlannerInfo *root,
>> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>> >
>> > List *groupClause,
>> >   List *qual,
>> >   const AggClauseCosts
>> > *aggcosts,
>> > - double numGroups)
>> > + double numGroups,
>> > +
>> > bool combineStates,
>> > + bool finalizeAggs)
>> >
>> > Don't you need to set parallel_aware flag in this function as we do for
>> > create_seqscan_path()?
>>
>> I don't really know the answer to that... I mean there's nothing
>> special done in nodeAgg.c if the node is running in a worker or in the
>> main process.
>>
>
> On again thinking about it, I think it is okay to set parallel_aware flag as
> false.  This flag means whether that particular node has any parallelism
> behaviour which is true for seqscan, but I think not for partial aggregate
> node.
>
> Few other comments on latest patch:
>
> 1.
>
> + /*
> + * XXX does this need estimated for each partial path, or are they
> + * all
> going to be the same anyway?
> + */
> + dNumPartialGroups = get_number_of_groups(root,
> +
> clamp_row_est(partial_aggregate_path->rows),
> +
> rollup_lists,
> +
> rollup_groupclauses);
>
> For considering partial groups, do we need to rollup related lists?

No it doesn't you're right. I did mean to remove these, but they're
NIL anyway. Seems better to remove them to prevent confusion.

> 2.
> + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
> +
>  _costs,
> +
>  dNumPartialGroups);
> +
> + /*
> + * Generate a
> hashagg Path, if we can, but we'll skip this if the hash
> + * table looks like it'll exceed work_mem.
> +
> */
> + if (can_hash && hashaggtablesize < work_mem * 1024L)
>
>
> hash table size should be estimated only if can_hash is true.

Good point. Changed.

>
> 3.
> + foreach(lc, grouped_rel->partial_pathlist)
> + {
> + Path   *path =
> (Path *) lfirst(lc);
> + double total_groups;
> +
> + total_groups = path-
>>parallel_degree * path->rows;
> +
> + path = (Path *) create_gather_path(root, grouped_rel, path,
> NULL,
> +   _groups);
>
> Do you need to perform it foreach partial path or just do it for
> firstpartial path?

That's true. The order here does not matter since we're passing
directly into a Gather node, so it's wasteful to consider anything
apart from the cheapest path. -- Fixed.

There was also a missing hash table size check on the Finalize
HashAggregate Path consideration. I've added that now.

Updated patch is attached. Thanks for the re-review.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-18 Thread David Rowley

On 19 March 2016 at 05:46, Robert Haas  wrote:
> On Wed, Mar 16, 2016 at 5:05 PM, David Rowley
>  wrote:
>>> Cool!  Why not initialize aggpartialtype always?
>>
>> Because the follow-on patch sets that to either the serialtype or the
>> aggtranstype, depending on if serialisation is required. Serialisation
>> is required for parallel aggregate, but if we're performing the
>> partial agg in the main process, then we'd not need to do that. This
>> could be solved by adding more fields to AggRef to cover the
>> aggserialtype and perhaps expanding aggpartial into an enum mode which
>> allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
>> attention to the mode and return 1 of the 3 possible types.
>
> Urk.  That might still be better than what you have right now, but
> it's obviously not great.  How about ditching aggpartialtype and
> adding aggoutputtype instead?  Then you can always initialize that to
> whatever it's supposed to be based on the type of aggregation you are
> doing, and exprType() can simply return that field.

hmm, that might be better, but it kinda leaves aggpartial without much
of a job to do. The only code which depends on that is the sanity
checks that I added in execQual.c, and it does not really seem worth
keeping it for that. The only sanity check that I can think to do here
is if (aggstate->finalizeAggs && aggref->aggoutputtype !=
aggref->aggtype) -- we have a problem. Obviously we can't check that
for non-finalize nodes since the aggtype can match the aggoutputtype
for legitimate reasons.



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-18 Thread David Rowley

On 19 March 2016 at 09:53, Alvaro Herrera  wrote:
> I read this a bit, as an exercise to try to follow parallel query a bit.

Thanks for taking a look at this.

> I think the most interesting thing I have to say is that the new error
> messages in ExecInitExpr do not conform to our style.  Probably just
> downcase everything except Aggref and you're done, since they're
> can't-happen conditions anyway.  The comments below are mostly as I
> learn how the whole thing works so if I'm talking nonsense, I'm happy to
> be ignored or, better yet, educated.

Per a comment above from Robert I ended up just removing most of that
code due to the removal of Aggref->aggpartial. It did not seem worth
keeping aggpartial around just for these errors either, but let me
know if you think otherwise.

> I think the way we set the "consider_parallel" flag is a bit odd (we
> just "return" out of the function in the cases were it mustn't be set);
> but that mechanism is already part of set_rel_consider_parallel and
> similar to (but not quite like) longstanding routines such as
> set_rel_width, so nothing new in this patch.  I find this a bit funny
> coding, but then this is the planner so maybe it's okay.

hmm, I think its very similar to set_rel_consider_parallel(), as that
just does return for non-supported cases, and if it makes it until the
end of the function consider_parallel is set to true.

Would you rather see;

/* we can do nothing in parallel if there's no aggregates or group by */
if (!parse->hasAggs && parse->groupClause == NIL)
grouped_rel->consider_parallel = false;

/* grouping sets are currently not supported by parallel aggregate */
else if (parse->groupingSets)
grouped_rel->consider_parallel = false;

...?

> I think the comment on search_indexed_tlist_for_partial_aggref is a bit
> bogus; it says it returns an existing Var, but what it does is
> manufacture one itself.  I *think* the code is all right, but the
> comment seems misleading.

Ok, I've changed it to:

search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist

which seems to be equivalent to search_indexed_tlist_for_non_var()'s comment.

> In set_combineagg_references(), there are two calls to
> fix_combine_agg_expr(); I think the one hanging on the
> search_indexed_tlist_for_sortgroupref call is useless; you could just
> have the "if newexpr != NULL" in the outer block (after initializing to
> NULL before the ressortgroupref check).

Yes, but see set_upper_references(), it just follows the pattern there
but calls fix_combine_agg_expr() instead of fix_upper_expr(). For
simplicity of review it seems to be nice to keep it following the same
pattern.

> set_combineagg_references's comment says it does something "similar to
> set_upper_references, and additionally" some more stuff, but reading the
> code for both functions I think that's not quite true.  I think the
> comment should say that both functions are parallel, but one works for
> partial aggs and the other doesn't.

Ok, I've changed the comment to:
/*
 * set_combineagg_references
 *  This does a similar job as set_upper_references(), but treats Aggrefs
 *  in a different way. Here we transforms Aggref nodes args to suit the
 *  combine aggregate phase. This means that the Aggref->args are converted
 *  to reference the corresponding aggregate function in the subplan rather
 *  than simple Var(s), as would be the case for a non-combine aggregate
 *  node.
 */

> Actually, what happens if you feed
> an agg plan with combineStates to set_upper_references?  If it still
> works but the result is not optimal, maybe we should check against that
> case, so as to avoid the case where somebody hacks this further and the
> plans are suddenly not agg-combined anymore.  What I actually expect to
> happen is that something would explode during execution; in that case
> perhaps it's better to add a comment?  (In further looking at other
> setrefs.c similar functions, maybe it's fine the way you have it.)

This simply won't work as this is the code which causes the
sum((sum(num))) in the combine aggregate's target list.
The normal code is trying to look for Vars, but this code is trying to
find that sum(num) inside the subplan target list.

Example EXPLAIN VERBOSE output:
Finalize Aggregate  (cost=105780.67..105780.68 rows=1 width=8)
  Output: pg_catalog.sum((sum(num)))

Try commenting out;

if (aggplan->combineStates)
set_combineagg_references(root, plan, rtoffset);
else

to see the horrors than ensue. It'll basically turn aggregate
functions in to a rather inefficient random number generator. This is
because the combine Aggref->args still point to "num", instead of
sum(num).

> Back at make_partialgroup_input_target, the comment says "so that they
> can be found later in Combine Aggregate nodes during setrefs".  I think
> it's better to be explicit and say ".. can be found later during
> set_combineagg_references" or something.

Changed. Thanks for the review.

Re: [HACKERS] Parallel Aggregate

2016-03-16 Thread David Rowley

On 16 March 2016 at 15:04, Robert Haas  wrote:
> I don't think I'd be objecting if you made PartialAggref a real
> alternative to Aggref.  But that's not what you've got here.  A
> PartialAggref is just a wrapper around an underlying Aggref that
> changes the interpretation of it - and I think that's not a good idea.
> If you want to have Aggref and PartialAggref as truly parallel node
> types, that seems cool, and possibly better than what you've got here
> now.  Alternatively, Aggref can do everything.  But I don't think we
> should go with this wrapper concept.

Ok, I've now gotten rid of the PartialAggref node, and I'm actually
quite happy with how it turned out. I made
search_indexed_tlist_for_partial_aggref() to follow-on the series of
other search_indexed_tlist_for_* functions and have made it behave the
same way, by returning the newly created Var instead of doing that in
fix_combine_agg_expr_mutator(), as the last version did.

Thanks for the suggestion.

New patch attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread Robert Haas

On Tue, Mar 15, 2016 at 9:23 PM, David Rowley
 wrote:
>>> Should I update the patch to use the method you describe?
>>
>> Well, my feeling is that is going to make this a lot smaller and
>> simpler, so I think so.  But if you disagree strongly, let's discuss
>> further.
>
> Not strongly. It means that Aggref will need another field to store
> the transtype and/or the serialtype (for the follow-on patches in
> Combining Aggregates thread)
> The only other issue which I don't think I've addressed yet is target
> list width estimates. Probably that can just pay attention to the
> aggpartial flag too, and if I fix that then likely the Aggref needs to
> carry around a aggtransspace field too, which we really only need to
> populate when the Aggref is in partial mode... it would be wasteful to
> bother looking that up from the catalogs if we're never going to use
> the Aggref in partial mode, and it seems weird to leave it as zero and
> only populate it when we need it. So... if I put on my object oriented
> hat and think about this My brain says that Aggref should inherit
> from PartialAggref and that's what I have now... or at least some
> C variant. So I really do think it's cleaner and easier to understand
> by keeping the ParialAggref.
>
> I see on looking at exprType() that we do have other node types which
> conditionally return a different type, but seeing that does not fill
> me with joy.

I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref.  But that's not what you've got here.  A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now.  Alternatively, Aggref can do everything.  But I don't think we
should go with this wrapper concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread David Rowley

On 16 March 2016 at 14:08, Robert Haas  wrote:
> On Tue, Mar 15, 2016 at 8:55 PM, David Rowley
>  wrote:
>> On 16 March 2016 at 13:42, Robert Haas  wrote:
>>> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
>>>  wrote:
 On 16 March 2016 at 12:58, Robert Haas  wrote:
> ...and why would one be true and the other false?

 One would be the combine aggregate (having aggpartial = false), and
 the one in the subnode would be the partial aggregate (having
 aggpartial = true)
 Notice in create_grouping_paths() I build a partial aggregate version
 of the PathTarget named partial_group_target, this one goes into the
 partial agg node, and Gather node. In this case the aggpartial will be
 set differently for the Aggrefs in each of the two PathTargets, so it
 would not be possible in setrefs.c to find the correct target list
 entry in the subnode by using equal(). It'll just end up triggering
 the elog(ERROR, "Aggref not found in subplan target list"); error.
>>>
>>> OK, I get it now.  I still don't like it very much.  There's no
>>> ironclad requirement that we use equal() here as opposed to some
>>> bespoke comparison function with the exact semantics we need, and ISTM
>>> that getting rid of PartialAggref would shrink this patch down quite a
>>> bit.
>>
>> Well that might work. I'd not thought of doing it that way. The only
>> issue that I can foresee with that is that when new fields are added
>> to Aggref in the future, we might miss updating that custom comparison
>> function to include them.
>
> That's true, but it doesn't seem like that big a deal.  A code comment
> in the Aggref definition seems like sufficient insurance against such
> a mistake.
>
>> Should I update the patch to use the method you describe?
>
> Well, my feeling is that is going to make this a lot smaller and
> simpler, so I think so.  But if you disagree strongly, let's discuss
> further.

Not strongly. It means that Aggref will need another field to store
the transtype and/or the serialtype (for the follow-on patches in
Combining Aggregates thread)
The only other issue which I don't think I've addressed yet is target
list width estimates. Probably that can just pay attention to the
aggpartial flag too, and if I fix that then likely the Aggref needs to
carry around a aggtransspace field too, which we really only need to
populate when the Aggref is in partial mode... it would be wasteful to
bother looking that up from the catalogs if we're never going to use
the Aggref in partial mode, and it seems weird to leave it as zero and
only populate it when we need it. So... if I put on my object oriented
hat and think about this My brain says that Aggref should inherit
from PartialAggref and that's what I have now... or at least some
C variant. So I really do think it's cleaner and easier to understand
by keeping the ParialAggref.

I see on looking at exprType() that we do have other node types which
conditionally return a different type, but seeing that does not fill
me with joy.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread Robert Haas

On Tue, Mar 15, 2016 at 8:55 PM, David Rowley
 wrote:
> On 16 March 2016 at 13:42, Robert Haas  wrote:
>> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
>>  wrote:
>>> On 16 March 2016 at 12:58, Robert Haas  wrote:
 ...and why would one be true and the other false?
>>>
>>> One would be the combine aggregate (having aggpartial = false), and
>>> the one in the subnode would be the partial aggregate (having
>>> aggpartial = true)
>>> Notice in create_grouping_paths() I build a partial aggregate version
>>> of the PathTarget named partial_group_target, this one goes into the
>>> partial agg node, and Gather node. In this case the aggpartial will be
>>> set differently for the Aggrefs in each of the two PathTargets, so it
>>> would not be possible in setrefs.c to find the correct target list
>>> entry in the subnode by using equal(). It'll just end up triggering
>>> the elog(ERROR, "Aggref not found in subplan target list"); error.
>>
>> OK, I get it now.  I still don't like it very much.  There's no
>> ironclad requirement that we use equal() here as opposed to some
>> bespoke comparison function with the exact semantics we need, and ISTM
>> that getting rid of PartialAggref would shrink this patch down quite a
>> bit.
>
> Well that might work. I'd not thought of doing it that way. The only
> issue that I can foresee with that is that when new fields are added
> to Aggref in the future, we might miss updating that custom comparison
> function to include them.

That's true, but it doesn't seem like that big a deal.  A code comment
in the Aggref definition seems like sufficient insurance against such
a mistake.

> Should I update the patch to use the method you describe?

Well, my feeling is that is going to make this a lot smaller and
simpler, so I think so.  But if you disagree strongly, let's discuss
further.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread David Rowley

On 16 March 2016 at 13:42, Robert Haas  wrote:
> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
>  wrote:
>> On 16 March 2016 at 12:58, Robert Haas  wrote:
>>> ...and why would one be true and the other false?
>>
>> One would be the combine aggregate (having aggpartial = false), and
>> the one in the subnode would be the partial aggregate (having
>> aggpartial = true)
>> Notice in create_grouping_paths() I build a partial aggregate version
>> of the PathTarget named partial_group_target, this one goes into the
>> partial agg node, and Gather node. In this case the aggpartial will be
>> set differently for the Aggrefs in each of the two PathTargets, so it
>> would not be possible in setrefs.c to find the correct target list
>> entry in the subnode by using equal(). It'll just end up triggering
>> the elog(ERROR, "Aggref not found in subplan target list"); error.
>
> OK, I get it now.  I still don't like it very much.  There's no
> ironclad requirement that we use equal() here as opposed to some
> bespoke comparison function with the exact semantics we need, and ISTM
> that getting rid of PartialAggref would shrink this patch down quite a
> bit.

Well that might work. I'd not thought of doing it that way. The only
issue that I can foresee with that is that when new fields are added
to Aggref in the future, we might miss updating that custom comparison
function to include them.

Should I update the patch to use the method you describe?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread Robert Haas

On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
 wrote:
> On 16 March 2016 at 12:58, Robert Haas  wrote:
>> On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
>>  wrote:
>>> On 16 March 2016 at 11:00, Robert Haas  wrote:
 I don't see why we would need to leave aggpartial out of the equals()
 check.  I must be missing something.
>>>
>>> See fix_combine_agg_expr_mutator()
>>>
>>> This piece of code:
>>>
>>> /*
>>> * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
>>> * we need to look into the PartialAggref to find the Aggref within.
>>> */
>>> foreach(lc, context->subplan_itlist->tlist)
>>> {
>>> PartialAggref *paggref;
>>> tle = (TargetEntry *) lfirst(lc);
>>> paggref = (PartialAggref *) tle->expr;
>>>
>>> if (IsA(paggref, PartialAggref) &&
>>> equal(paggref->aggref, aggref))
>>> break;
>>> }
>>>
>>> if equals() compared the aggpartial then this code would fail to find
>>> the Aggref in the subnode due to the aggpartial field being true on
>>> one and false on the other Aggref.
>>
>> ...and why would one be true and the other false?
>
> One would be the combine aggregate (having aggpartial = false), and
> the one in the subnode would be the partial aggregate (having
> aggpartial = true)
> Notice in create_grouping_paths() I build a partial aggregate version
> of the PathTarget named partial_group_target, this one goes into the
> partial agg node, and Gather node. In this case the aggpartial will be
> set differently for the Aggrefs in each of the two PathTargets, so it
> would not be possible in setrefs.c to find the correct target list
> entry in the subnode by using equal(). It'll just end up triggering
> the elog(ERROR, "Aggref not found in subplan target list"); error.

OK, I get it now.  I still don't like it very much.  There's no
ironclad requirement that we use equal() here as opposed to some
bespoke comparison function with the exact semantics we need, and ISTM
that getting rid of PartialAggref would shrink this patch down quite a
bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread David Rowley

On 16 March 2016 at 12:58, Robert Haas  wrote:
> On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
>  wrote:
>> On 16 March 2016 at 11:00, Robert Haas  wrote:
>>> I don't see why we would need to leave aggpartial out of the equals()
>>> check.  I must be missing something.
>>
>> See fix_combine_agg_expr_mutator()
>>
>> This piece of code:
>>
>> /*
>> * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
>> * we need to look into the PartialAggref to find the Aggref within.
>> */
>> foreach(lc, context->subplan_itlist->tlist)
>> {
>> PartialAggref *paggref;
>> tle = (TargetEntry *) lfirst(lc);
>> paggref = (PartialAggref *) tle->expr;
>>
>> if (IsA(paggref, PartialAggref) &&
>> equal(paggref->aggref, aggref))
>> break;
>> }
>>
>> if equals() compared the aggpartial then this code would fail to find
>> the Aggref in the subnode due to the aggpartial field being true on
>> one and false on the other Aggref.
>
> ...and why would one be true and the other false?

One would be the combine aggregate (having aggpartial = false), and
the one in the subnode would be the partial aggregate (having
aggpartial = true)
Notice in create_grouping_paths() I build a partial aggregate version
of the PathTarget named partial_group_target, this one goes into the
partial agg node, and Gather node. In this case the aggpartial will be
set differently for the Aggrefs in each of the two PathTargets, so it
would not be possible in setrefs.c to find the correct target list
entry in the subnode by using equal(). It'll just end up triggering
the elog(ERROR, "Aggref not found in subplan target list"); error.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread Robert Haas

On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
 wrote:
> On 16 March 2016 at 11:00, Robert Haas  wrote:
>> I don't see why we would need to leave aggpartial out of the equals()
>> check.  I must be missing something.
>
> See fix_combine_agg_expr_mutator()
>
> This piece of code:
>
> /*
> * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
> * we need to look into the PartialAggref to find the Aggref within.
> */
> foreach(lc, context->subplan_itlist->tlist)
> {
> PartialAggref *paggref;
> tle = (TargetEntry *) lfirst(lc);
> paggref = (PartialAggref *) tle->expr;
>
> if (IsA(paggref, PartialAggref) &&
> equal(paggref->aggref, aggref))
> break;
> }
>
> if equals() compared the aggpartial then this code would fail to find
> the Aggref in the subnode due to the aggpartial field being true on
> one and false on the other Aggref.

...and why would one be true and the other false?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread David Rowley

On 16 March 2016 at 11:00, Robert Haas  wrote:
> I don't see why we would need to leave aggpartial out of the equals()
> check.  I must be missing something.

See fix_combine_agg_expr_mutator()

This piece of code:

/*
* Aggrefs for partial aggregates are wrapped up in a PartialAggref,
* we need to look into the PartialAggref to find the Aggref within.
*/
foreach(lc, context->subplan_itlist->tlist)
{
PartialAggref *paggref;
tle = (TargetEntry *) lfirst(lc);
paggref = (PartialAggref *) tle->expr;

if (IsA(paggref, PartialAggref) &&
equal(paggref->aggref, aggref))
break;
}

if equals() compared the aggpartial then this code would fail to find
the Aggref in the subnode due to the aggpartial field being true on
one and false on the other Aggref.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread Robert Haas

On Tue, Mar 15, 2016 at 5:50 PM, David Rowley
 wrote:
>> I still think that's solving the problem the wrong way.  Why can't
>> exprType(), when applied to the Aggref, do something like this?
>>
>> {
>> Aggref *aref = (Aggref *) expr;
>> if (aref->aggpartial)
>> return aref->aggtranstype;
>> else
>> return aref->aggtype;
>> }
>>
>> The obvious answer is "well, because those fields don't exist in
>> Aggref".  But shouldn't they?  From here, it looks like PartialAggref
>> is a cheap hack around not having whacked Aggref around hard for
>> partial aggregation.
>
> We could do it that way if we left the aggpartial field out of the
> equals() check, but I think we go at length to not do that. Just look
> at what's done for all of the location fields. In any case if we did
> that then it might not actually be what we want all of the time...
> Perhaps in some cases we'd want equals() to return false when the
> aggpartial does not match, and in other cases we'd want it to return
> true. There's no way to control that behaviour, so to get around the
> setrefs.c problem I created the wrapper node type, which I happen to
> think is quite clean. Just see Tom's comments about Haribabu's temp
> fix for the problem where he put some hacks into the equals for aggref
> in [1].

I don't see why we would need to leave aggpartial out of the equals()
check.  I must be missing something.

 I don't see where this applies has_parallel_hazard or anything
 comparable to the aggregate functions.  I think it needs to do that.
>>>
>>> Not sure what you mean here.
>>
>> If the aggregate isn't parallel-safe, you can't do this optimization.
>> For example, imagine an aggregate function written in PL/pgsql that
>> for some reason writes data to a side table.  It's
>> has_parallel_hazard's job to check the parallel-safety properties of
>> the functions used in the query.
>
> Sorry, I do know what you mean by that. I might have been wrong to
> assume that the parallelModeOK check did this. I will dig into how
> that is set exactly.

Hmm, sorry, I wasn't very accurate, there.  The parallelModeOK check
will handle indeed the case where there are parallel-unsafe functions,
but it will not handle the case where there are parallel-restricted
functions.  In that latter case, the query can still use parallelism
someplace, but the parallel-restricted functions cannot be executed
beneath the Gather.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread David Rowley

On 16 March 2016 at 09:23, Robert Haas  wrote:
> On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
>  wrote:
>> A comment does explain this, but perhaps it's not good enough, so I've
>> rewritten it to become.
>>
>> /*
>>  * PartialAggref
>>  *
>>  * When partial aggregation is required in a plan, the nodes from the partial
>>  * aggregate node, up until the finalize aggregate node must pass the 
>> partially
>>  * aggregated states up the plan tree. In regards to target list construction
>>  * in setrefs.c, this requires that exprType() returns the state's type 
>> rather
>>  * than the final aggregate value's type, and since exprType() for Aggref is
>>  * coded to return the aggtype, this is not correct for us. We can't fix this
>>  * by going around modifying the Aggref to change it's return type as 
>> setrefs.c
>>  * requires searching for that Aggref using equals() which compares all 
>> fields
>>  * in Aggref, and changing the aggtype would cause such a comparison to fail.
>>  * To get around this problem we wrap the Aggref up in a PartialAggref, this
>>  * allows exprType() to return the correct type and we can handle a
>>  * PartialAggref in setrefs.c by just peeking inside the PartialAggref to 
>> check
>>  * the underlying Aggref. The PartialAggref lives as long as executor 
>> start-up,
>>  * where it's removed and replaced with it's underlying Aggref.
>>  */
>> typedef struct PartialAggref
>>
>> does that help explain it better?
>
> I still think that's solving the problem the wrong way.  Why can't
> exprType(), when applied to the Aggref, do something like this?
>
> {
> Aggref *aref = (Aggref *) expr;
> if (aref->aggpartial)
> return aref->aggtranstype;
> else
> return aref->aggtype;
> }
>
> The obvious answer is "well, because those fields don't exist in
> Aggref".  But shouldn't they?  From here, it looks like PartialAggref
> is a cheap hack around not having whacked Aggref around hard for
> partial aggregation.

We could do it that way if we left the aggpartial field out of the
equals() check, but I think we go at length to not do that. Just look
at what's done for all of the location fields. In any case if we did
that then it might not actually be what we want all of the time...
Perhaps in some cases we'd want equals() to return false when the
aggpartial does not match, and in other cases we'd want it to return
true. There's no way to control that behaviour, so to get around the
setrefs.c problem I created the wrapper node type, which I happen to
think is quite clean. Just see Tom's comments about Haribabu's temp
fix for the problem where he put some hacks into the equals for aggref
in [1].

>>> I don't see where this applies has_parallel_hazard or anything
>>> comparable to the aggregate functions.  I think it needs to do that.
>>
>> Not sure what you mean here.
>
> If the aggregate isn't parallel-safe, you can't do this optimization.
> For example, imagine an aggregate function written in PL/pgsql that
> for some reason writes data to a side table.  It's
> has_parallel_hazard's job to check the parallel-safety properties of
> the functions used in the query.

Sorry, I do know what you mean by that. I might have been wrong to
assume that the parallelModeOK check did this. I will dig into how
that is set exactly.

>>> +   /* XXX +1 ? do we expect the main process to actually do real work? 
>>> */
>>> +   numPartialGroups = Min(numGroups, subpath->rows) *
>>> +   (subpath->parallel_degree + 
>>> 1);
>>> I'd leave out the + 1, but I don't think it matters much.
>>
>> Actually I meant to ask you about this. I see that subpath->rows is
>> divided by the Path's parallel_degree, but it seems the main process
>> does some work too, so this is why I added + 1, as during my tests
>> using a query which produces 10 groups, and had 4 workers, I noticed
>> that Gather was getting 50 groups back, rather than 40, I assumed this
>> is because the main process is helping too, but from my reading of the
>> parallel query threads I believe this is because the Gather, instead
>> of sitting around idle tries to do a bit of work too, if it appears
>> that nothing else is happening quickly enough. I should probably go
>> read nodeGather.c to learn that though.
>
> Yes, the main process does do some work, but less and less as the
> query gets more complicated.  See comments in cost_seqscan().

Thanks

[1] http://www.postgresql.org/message-id/10158.1457329...@sss.pgh.pa.us

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread Robert Haas

On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
 wrote:
>> More generally, why are we inventing PartialAggref
>> instead of reusing Aggref?  The code comments seem to contain no
>> indication as to why we shouldn't need all the same details for
>> PartialAggref that we do for Aggref, instead of only a small subset of
>> them.  Hmm... actually, it looks like PartialAggref is intended to
>> wrap Aggref, but that seems like a weird design.  Why not make Aggref
>> itself DTRT?   There's not enough explanation in the patch of what is
>> going on here and why.
>
> A comment does explain this, but perhaps it's not good enough, so I've
> rewritten it to become.
>
> /*
>  * PartialAggref
>  *
>  * When partial aggregation is required in a plan, the nodes from the partial
>  * aggregate node, up until the finalize aggregate node must pass the 
> partially
>  * aggregated states up the plan tree. In regards to target list construction
>  * in setrefs.c, this requires that exprType() returns the state's type rather
>  * than the final aggregate value's type, and since exprType() for Aggref is
>  * coded to return the aggtype, this is not correct for us. We can't fix this
>  * by going around modifying the Aggref to change it's return type as 
> setrefs.c
>  * requires searching for that Aggref using equals() which compares all fields
>  * in Aggref, and changing the aggtype would cause such a comparison to fail.
>  * To get around this problem we wrap the Aggref up in a PartialAggref, this
>  * allows exprType() to return the correct type and we can handle a
>  * PartialAggref in setrefs.c by just peeking inside the PartialAggref to 
> check
>  * the underlying Aggref. The PartialAggref lives as long as executor 
> start-up,
>  * where it's removed and replaced with it's underlying Aggref.
>  */
> typedef struct PartialAggref
>
> does that help explain it better?

I still think that's solving the problem the wrong way.  Why can't
exprType(), when applied to the Aggref, do something like this?

{
Aggref *aref = (Aggref *) expr;
if (aref->aggpartial)
return aref->aggtranstype;
else
return aref->aggtype;
}

The obvious answer is "well, because those fields don't exist in
Aggref".  But shouldn't they?  From here, it looks like PartialAggref
is a cheap hack around not having whacked Aggref around hard for
partial aggregation.

>> I don't see where this applies has_parallel_hazard or anything
>> comparable to the aggregate functions.  I think it needs to do that.
>
> Not sure what you mean here.

If the aggregate isn't parallel-safe, you can't do this optimization.
For example, imagine an aggregate function written in PL/pgsql that
for some reason writes data to a side table.  It's
has_parallel_hazard's job to check the parallel-safety properties of
the functions used in the query.

>> +   /* XXX +1 ? do we expect the main process to actually do real work? 
>> */
>> +   numPartialGroups = Min(numGroups, subpath->rows) *
>> +   (subpath->parallel_degree + 
>> 1);
>> I'd leave out the + 1, but I don't think it matters much.
>
> Actually I meant to ask you about this. I see that subpath->rows is
> divided by the Path's parallel_degree, but it seems the main process
> does some work too, so this is why I added + 1, as during my tests
> using a query which produces 10 groups, and had 4 workers, I noticed
> that Gather was getting 50 groups back, rather than 40, I assumed this
> is because the main process is helping too, but from my reading of the
> parallel query threads I believe this is because the Gather, instead
> of sitting around idle tries to do a bit of work too, if it appears
> that nothing else is happening quickly enough. I should probably go
> read nodeGather.c to learn that though.

Yes, the main process does do some work, but less and less as the
query gets more complicated.  See comments in cost_seqscan().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-15 Thread David Rowley

On 15 March 2016 at 13:48, David Rowley  wrote:
> An updated patch will follow soon.

I've attached an updated patch which addresses some of Robert's and
Tomas' concerns.
I've not done anything about the exprCollation() yet, as I'm still
unsure of what it should do. I just don't see why returning the
Aggref's collation is correct, and we have nothing else to return.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread David Rowley

On 15 March 2016 at 11:24, James Sewell  wrote:
> On Tuesday, 15 March 2016, Robert Haas  wrote:
>>
>> > Does the cost of the aggregate function come into this calculation at
>> > all? In PostGIS land, much smaller numbers of rows can generate loads
>> > that would be effective to parallelize (worker time much >> than
>> > startup cost).
>>
>> Unfortunately, no - only the table size.  This is a problem, and needs
>> to be fixed.  However, it's probably not going to get fixed for 9.6.
>> :-(
>
>
> Any chance of getting a GUC (say min_parallel_degree) added to allow setting
> the initial value of parallel_degree, then changing the small relation check
> to also pass if parallel_degree > 1?
>
> That way you could set min_parallel_degree on a query by query basis if you
> are running aggregates which you know will take a lot of CPU.
>
> I suppose it wouldn't make much sense at all to set globally though, so it
> could just confuse matters.

I agree that it would be nice to have more influence on this decision,
but let's start a new thread for that. I don't want this one getting
bloated with debates on that. It's not code I'm planning on going
anywhere near for this patch.

I'll start a thread...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread David Rowley

On 15 March 2016 at 11:39, Tomas Vondra  wrote:
> I've looked at this patch today. The patch seems quite solid, but I do have
> a few minor comments (or perhaps questions, given that this is the first
> time I looked at the patch).
>
> 1) exprCollation contains this bit:
> ---
>
> case T_PartialAggref:
> coll = InvalidOid; /* XXX is this correct? */
> break;
>
> I doubt this is the right thing to do. Can we actually get to this piece of
> code? I haven't tried too hard, but regression tests don't seem to trigger
> this piece of code.

Thanks for looking at this.

Yeah, it's there because it it being called during setrefs.c in
makeVarFromTargetEntry() via fix_combine_agg_expr_mutator(), so it's
required when building the new target list for Aggref->args to point
to the underlying Aggref's Var.

As for the collation, I'm still not convinced if it's right or wrong.
I know offlist you mentioned about string_agg() and sorting, but
there' no code that'll sort the agg's state. The only possible thing
that gets sorted there is the group by key.

> Moreover, if we're handling PartialAggref in exprCollation(), maybe we
> should handle it also in exprInputCollation and exprSetCollation?

hmm, maybe, that I'm not sure about. I don't see where we'd call
exprSetCollation() for this, but I think I need to look at
exprInputCollation()

> And if we really need the collation there, why not to fetch the actual
> collation from the nested Aggref? Why should it be InvalidOid?

It seems quite random to me to do that. If the trans type is bytea,
why would it be useful to inherit the collation from the aggregate?
I'm not confident I'm right with InvalidOId... I just don't think we
can pretend the collation is the same as the Aggref's.

> 2) partial_aggregate_walker
> ---
>
> I think this should follow the naming convention that clearly identifies the
> purpose of the walker, not what kind of nodes it is supposed to walk. So it
> should be:
>
> aggregates_allow_partial_walker

Agreed and changed.

> 3) create_parallelagg_path
> --
>
> I do agree with the logic that under-estimates are more dangerous than
> over-estimates, so the current estimate is safer. But I think this would be
> a good place to apply the formula I proposed a few days ago (or rather the
> one Dean Rasheed proposed in response).
>
> That is, we do know that there are numGroups in total, and each parallel
> worker sees subpath->rows then it's expected to see
>
> sel = (subpath->rows / rel->tuples);
> perGroup = (rel->tuples / numGroups);
> workerGroups = numGroups * (1 - powl(1 - s, perGroup));
> numPartialGroups = numWorkers * workerGroups
>
> It's probably better to see Dean's message from 13/3.

I think what I have works well when there's a small number of groups,
as there's a good chance that each worker will see at least 1 tuple
from each group. However I understand that will become increasingly
unlikely with a larger number of groups, which is why I capped it to
the total input rows, but even in cases before that cap is reached I
think it will still overestimate. I'd need to analyze the code above
to understand it better, but my initial reaction is that, you're
probably right, but I don't think I want to inherit the fight for
this. Perhaps it's better to wait until GROUP BY estimate improvement
patch gets in, and change this, or if this gets in first, then you can
include this change in your patch. I'm not trying to brush off the
work, I just would rather it didn't delay parallel aggregate.

> 4) Is clauses.h the right place for PartialAggType?

I'm not sure that it is to be honest. I just put it there because the
patch never persisted the value of a PartialAggType in any struct
field anywhere and checks it later in some other file. In all places
where we use PartialAggType we're also calling
aggregates_allow_partial(), which does require clauses.h. So that's
why it ended up there... I think I'll leave it there until someone
gives me a good reason to move it.

An updated patch will follow soon.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread David Rowley

On 15 March 2016 at 08:53, Robert Haas  wrote:
> I haven't fully studied every line of this yet, but here are a few comments:
>
> +   case T_PartialAggref:
> +   coll = InvalidOid; /* XXX is this correct? */
> +   break;
>
> I doubt it.

Thanks for looking at this.

Yeah, I wasn't so sure of the collation thing either, so stuck a
reminder on there. The way I'm seeing it at the moment is that since
the partial aggregate is never displayed to the user, and we never
perform equality comparison on them (since HAVING is applied in the
final aggregate stage), then my line of thought was that the collation
should not matter. Over on the combine aggregate states thread I'm
doing work to make the standard serialize functions use bytea, and
bytea don't allow collate:

# create table c (b bytea collate "en_NZ");
ERROR:  collations are not supported by type bytea
LINE 1: create table c (b bytea collate "en_NZ");

I previously did think of reusing the Aggref's collation, but I ended
up leaning towards the more "does not matter" side of the argument. Of
course, I may have completely failed to think of some important
reason, which is why I left that comment, so it might provoke some
thought with someone else with more collation knowledge.

> More generally, why are we inventing PartialAggref
> instead of reusing Aggref?  The code comments seem to contain no
> indication as to why we shouldn't need all the same details for
> PartialAggref that we do for Aggref, instead of only a small subset of
> them.  Hmm... actually, it looks like PartialAggref is intended to
> wrap Aggref, but that seems like a weird design.  Why not make Aggref
> itself DTRT?   There's not enough explanation in the patch of what is
> going on here and why.

A comment does explain this, but perhaps it's not good enough, so I've
rewritten it to become.

/*
 * PartialAggref
 *
 * When partial aggregation is required in a plan, the nodes from the partial
 * aggregate node, up until the finalize aggregate node must pass the partially
 * aggregated states up the plan tree. In regards to target list construction
 * in setrefs.c, this requires that exprType() returns the state's type rather
 * than the final aggregate value's type, and since exprType() for Aggref is
 * coded to return the aggtype, this is not correct for us. We can't fix this
 * by going around modifying the Aggref to change it's return type as setrefs.c
 * requires searching for that Aggref using equals() which compares all fields
 * in Aggref, and changing the aggtype would cause such a comparison to fail.
 * To get around this problem we wrap the Aggref up in a PartialAggref, this
 * allows exprType() to return the correct type and we can handle a
 * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
 * the underlying Aggref. The PartialAggref lives as long as executor start-up,
 * where it's removed and replaced with it's underlying Aggref.
 */
typedef struct PartialAggref

does that help explain it better?

> }
> +   if (can_parallel)
> +   {
>
> Seems like a blank line would be in order.

Fixed.

> I don't see where this applies has_parallel_hazard or anything
> comparable to the aggregate functions.  I think it needs to do that.

Not sure what you mean here.

>
> +   /* XXX +1 ? do we expect the main process to actually do real work? */
> +   numPartialGroups = Min(numGroups, subpath->rows) *
> +   (subpath->parallel_degree + 
> 1);
> I'd leave out the + 1, but I don't think it matters much.

Actually I meant to ask you about this. I see that subpath->rows is
divided by the Path's parallel_degree, but it seems the main process
does some work too, so this is why I added + 1, as during my tests
using a query which produces 10 groups, and had 4 workers, I noticed
that Gather was getting 50 groups back, rather than 40, I assumed this
is because the main process is helping too, but from my reading of the
parallel query threads I believe this is because the Gather, instead
of sitting around idle tries to do a bit of work too, if it appears
that nothing else is happening quickly enough. I should probably go
read nodeGather.c to learn that though.

In the meantime I've removed the + 1, as it's not correct to do
subpath->rows * (subpath->parallel_degree + 1), since it was divided
by subpath->parallel_degree in the first place, we'd end up with an
extra worker's worth of rows for queries which estimate a larger
number of groups than partial path rows.

> +   aggstate->finalizeAggs == true)
>
> We usually just say if (a) not if (a == true) when it's a boolean.
> Similarly !a rather than a == false.

hmm, thanks. It appears that I've not been all that consistent in that
area. I didn't know that was convention. I see that some of my way
have crept into the explain.c

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell

On Tue, Mar 15, 2016 at 9:32 AM, Robert Haas  wrote:

>
> I kind of doubt this would work well, but somebody could write a patch
> for it and try it out.

OK I'll give this a go today and report back.

Would the eventual plan be to use pg_proc.procost for the functions from
each aggregate concerned? If so I might have a peek at that too, although I
imagine I won't get far.

Cheers,

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Tomas Vondra


Hi,

On 03/13/2016 10:44 PM, David Rowley wrote:

On 12 March 2016 at 16:31, David Rowley  wrote:

I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.


The attached patch fixes a harmless compiler warning about a
possible uninitialised variable.


I've looked at this patch today. The patch seems quite solid, but I do 
have a few minor comments (or perhaps questions, given that this is the 
first time I looked at the patch).


1) exprCollation contains this bit:
---

case T_PartialAggref:
coll = InvalidOid; /* XXX is this correct? */
break;

I doubt this is the right thing to do. Can we actually get to this piece 
of code? I haven't tried too hard, but regression tests don't seem to 
trigger this piece of code.


Moreover, if we're handling PartialAggref in exprCollation(), maybe we 
should handle it also in exprInputCollation and exprSetCollation?


And if we really need the collation there, why not to fetch the actual 
collation from the nested Aggref? Why should it be InvalidOid?



2) partial_aggregate_walker
---

I think this should follow the naming convention that clearly identifies 
the purpose of the walker, not what kind of nodes it is supposed to 
walk. So it should be:


aggregates_allow_partial_walker


3) create_parallelagg_path
--

I do agree with the logic that under-estimates are more dangerous than 
over-estimates, so the current estimate is safer. But I think this would 
be a good place to apply the formula I proposed a few days ago (or 
rather the one Dean Rasheed proposed in response).


That is, we do know that there are numGroups in total, and each parallel 
worker sees subpath->rows then it's expected to see


sel = (subpath->rows / rel->tuples);
perGroup = (rel->tuples / numGroups);
workerGroups = numGroups * (1 - powl(1 - s, perGroup));
numPartialGroups = numWorkers * workerGroups

It's probably better to see Dean's message from 13/3.

4) Is clauses.h the right place for PartialAggType?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 6:24 PM, James Sewell  wrote:
> Any chance of getting a GUC (say min_parallel_degree) added to allow setting
> the initial value of parallel_degree, then changing the small relation check
> to also pass if parallel_degree > 1?
>
> That way you could set min_parallel_degree on a query by query basis if you
> are running aggregates which you know will take a lot of CPU.
>
> I suppose it wouldn't make much sense at all to set globally though, so it
> could just confuse matters.

I kind of doubt this would work well, but somebody could write a patch
for it and try it out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell

On Tuesday, 15 March 2016, Robert Haas  wrote:
>
> > Does the cost of the aggregate function come into this calculation at
> > all? In PostGIS land, much smaller numbers of rows can generate loads
> > that would be effective to parallelize (worker time much >> than
> > startup cost).
>
> Unfortunately, no - only the table size.  This is a problem, and needs
> to be fixed.  However, it's probably not going to get fixed for 9.6.
> :-(
>

Any chance of getting a GUC (say min_parallel_degree) added to allow
setting the initial value of parallel_degree, then changing the small
relation check to also pass if parallel_degree > 1?

That way you could set min_parallel_degree on a query by query basis if you
are running aggregates which you know will take a lot of CPU.

I suppose it wouldn't make much sense at all to set globally though, so it
could just confuse matters.

Cheers,

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 3:56 PM, Paul Ramsey  wrote:
> On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
>  wrote:
>> On 14 March 2016 at 14:52, James Sewell  wrote:
>>> One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>
> Does the cost of the aggregate function come into this calculation at
> all? In PostGIS land, much smaller numbers of rows can generate loads
> that would be effective to parallelize (worker time much >> than
> startup cost).

Unfortunately, no - only the table size.  This is a problem, and needs
to be fixed.  However, it's probably not going to get fixed for 9.6.
:-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Paul Ramsey

On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
 wrote:
> On 14 March 2016 at 14:52, James Sewell  wrote:
>> One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.

Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).

P

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Robert Haas

On Sun, Mar 13, 2016 at 5:44 PM, David Rowley
 wrote:
> On 12 March 2016 at 16:31, David Rowley  wrote:
>> I've attached an updated patch which is based on commit 7087166,
>> things are really changing fast in the grouping path area at the
>> moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.

I haven't fully studied every line of this yet, but here are a few comments:

+   case T_PartialAggref:
+   coll = InvalidOid; /* XXX is this correct? */
+   break;

I doubt it.  More generally, why are we inventing PartialAggref
instead of reusing Aggref?  The code comments seem to contain no
indication as to why we shouldn't need all the same details for
PartialAggref that we do for Aggref, instead of only a small subset of
them.  Hmm... actually, it looks like PartialAggref is intended to
wrap Aggref, but that seems like a weird design.  Why not make Aggref
itself DTRT?   There's not enough explanation in the patch of what is
going on here and why.

}
+   if (can_parallel)
+   {

Seems like a blank line would be in order.

I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions.  I think it needs to do that.

+   /* XXX +1 ? do we expect the main process to actually do real work? */
+   numPartialGroups = Min(numGroups, subpath->rows) *
+   (subpath->parallel_degree + 1);

I'd leave out the + 1, but I don't think it matters much.

+   aggstate->finalizeAggs == true)

We usually just say if (a) not if (a == true) when it's a boolean.
Similarly !a rather than a == false.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 17:05, James Sewell  wrote:
>
> Hi again,
>
> I've been playing around with inheritance combined with this patch. Currently 
> it looks like you are taking max(parallel_degree) from all the child tables 
> and using that for the number of workers.
>
> For large machines it makes much more sense to use sum(parallel_degree) - but 
> I've just seen this comment in the code:
>
> /*
>  * Decide what parallel degree to request for this append path.  For
>  * now, we just use the maximum parallel degree of any member.  It
>  * might be useful to use a higher number if the Append node were
>  * smart enough to spread out the workers, but it currently isn't.
>  */
>
> Does this mean that even though we are aggregating in parallel, we are only 
> operating on one child table at a time currently?

There is nothing in the planner yet, or any patch that I know of to
push the Partial Aggregate node to below an Append node. That will
most likely come in 9.7.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

On Mon, Mar 14, 2016 at 3:05 PM, David Rowley 
wrote:

>
> Things to try:
> 1. alter table a add column ts_date date; update a set ts_date =
> date_trunc('DAY',ts); vacuum full analyze ts;
> 2. or, create index on a (date_trunc('DAY',ts)); analyze a;
> 3. or for testing, set the work_mem higher.
>
>
Ah, that makes sense.

Tried with a BTREE index, and it works as perfectly but the index is 428MB
- which is a bit rough.

Removed that and put on a BRIN index, same result for 48kB - perfect!

Thanks for the help,

James

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi again,

I've been playing around with inheritance combined with this patch.
Currently it looks like you are taking max(parallel_degree) from all the
child tables and using that for the number of workers.

For large machines it makes much more sense to use sum(parallel_degree) -
but I've just seen this comment in the code:

/*
 * Decide what parallel degree to request for this append path.  For
 * now, we just use the maximum parallel degree of any member.  It
 * might be useful to use a higher number if the Append node were
 * smart enough to spread out the workers, but it currently isn't.
 */

Does this mean that even though we are aggregating in parallel, we are only
operating on one child table at a time currently?

Cheers,

James Sewell,
 Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 2:39 PM, James Sewell 
wrote:

> Cool,
>
> I've been testing how this works with partitioning (which seems to be
> strange, but I'll post separately about that) and something odd seems to be
> going on now with the parallel triggering:
>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 2000
>
> postgres=# select * from a limit 1;
>  ts | count |  a  |  b   |  c   |  d   | e
> +---+-+--+--+--+---
>  2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
> (1 row)
>
> postgres-# \d a
>  Table "datamart_owner.a"
>  Column |Type | Modifiers
> +-+---
>  ts | timestamp without time zone |
>  count  | integer |
>  a  | integer |
>  b  | integer |
>  c  | integer |
>  d  | integer |
>  e  | integer |
>
> postgres=# select pg_size_pretty(pg_relation_size('a'));
>  pg_size_pretty
> 
>  1149 MB
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>   QUERY PLAN
>
> --
>  Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
>Number of Workers: 5
>->  Partial HashAggregate  (cost=217059.08..217061.58
> rows=200 width=16)
>  Group Key: date_trunc('DAY'::text, ts)
>  ->  Parallel Seq Scan on a  (cost=0.00..197059.06
> rows=405 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
> --
>  GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
> (5 rows)
>
> Unsure what's happening here.
>
>
>
> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
> __
>
>
> Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>
>
> On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <
> david.row...@2ndquadrant.com> wrote:
>
>> On 14 March 2016 at 14:52, James Sewell 
>> wrote:
>> > One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>>
>> There is also a comment in that function which states:
>> /*
>> * Limit the degree of parallelism logarithmically based on the size of the
>> * relation.  This probably needs to be a good deal more sophisticated,
>> but we
>> * need something here for now.
>> */
>>
>> So this will likely see some revision at some point, after 9.6.
>>
>> --
>>  David Rowley   http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 16:39, James Sewell  wrote:
>
> I've been testing how this works with partitioning (which seems to be 
> strange, but I'll post separately about that) and something odd seems to be 
> going on now with the parallel triggering:
>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 2000
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>   QUERY PLAN
> --
>  Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
>Number of Workers: 5
>->  Partial HashAggregate  (cost=217059.08..217061.58 rows=200 
> width=16)
>  Group Key: date_trunc('DAY'::text, ts)
>  ->  Parallel Seq Scan on a  (cost=0.00..197059.06 
> rows=405 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
> --
>  GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
> (5 rows)
>
> Unsure what's happening here.

This just comes down to the fact that PostgreSQL is quite poor at
estimating the number of groups that will be produced by the
expression date_trunc('DAY',ts). Due to lack of stats when you run the
query before ANALYZE, PostgreSQL just uses a hardcoded guess of 200,
which it thinks will fit quite nicely in the HashAggregate node's hash
table. After you run ANALYZE this estimate goes up to 2024, and
the grouping planner thinks that's a little to much be storing in a
hash table, based on the size of your your work_mem setting, so it
uses a Sort plan instead.

Things to try:
1. alter table a add column ts_date date; update a set ts_date =
date_trunc('DAY',ts); vacuum full analyze ts;
2. or, create index on a (date_trunc('DAY',ts)); analyze a;
3. or for testing, set the work_mem higher.

So, basically, it's no fault of this patch. It's just there's no real
good way for the planner to go estimating something like
date_trunc('DAY',ts) without either adding a column which explicitly
stores that value (1), or collecting stats on the expression (2), or
teaching the planner about the internals of that function, which is
likely just never going to happen. (3) is just going to make the
outlook of a hash plan look a little brighter, although you'll likely
need a work_mem of over 1GB to make the plan change.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Cool,

I've been testing how this works with partitioning (which seems to be
strange, but I'll post separately about that) and something odd seems to be
going on now with the parallel triggering:

postgres=# create table a as select * from base_p2015_11;
SELECT 2000

postgres=# select * from a limit 1;
 ts | count |  a  |  b   |  c   |  d   | e
+---+-+--+--+--+---
 2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
(1 row)

postgres-# \d a
 Table "datamart_owner.a"
 Column |Type | Modifiers
+-+---
 ts | timestamp without time zone |
 count  | integer |
 a  | integer |
 b  | integer |
 c  | integer |
 d  | integer |
 e  | integer |

postgres=# select pg_size_pretty(pg_relation_size('a'));
 pg_size_pretty

 1149 MB

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
  QUERY PLAN
--
 Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
 Sort Key: (date_trunc('DAY'::text, ts))
 ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
   Number of Workers: 5
   ->  Partial HashAggregate  (cost=217059.08..217061.58
rows=200 width=16)
 Group Key: date_trunc('DAY'::text, ts)
 ->  Parallel Seq Scan on a  (cost=0.00..197059.06
rows=405 width=12)
(9 rows)

postgres=# analyze a;

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--
 GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
 Sort Key: (date_trunc('DAY'::text, ts))
 ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
(5 rows)

Unsure what's happening here.



James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 1:31 PM, David Rowley 
wrote:

> On 14 March 2016 at 14:52, James Sewell  wrote:
> > One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.
>
> There is also a comment in that function which states:
> /*
> * Limit the degree of parallelism logarithmically based on the size of the
> * relation.  This probably needs to be a good deal more sophisticated, but
> we
> * need something here for now.
> */
>
> So this will likely see some revision at some point, after 9.6.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 14:52, James Sewell  wrote:
> One question - how is the upper limit of workers chosen?

See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.

There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation.  This probably needs to be a good deal more sophisticated, but we
* need something here for now.
*/

So this will likely see some revision at some point, after 9.6.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi,

Happy to test, really looking forward to seeing this stuff in core.

The explain analyze is below:

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
(actual time=2282.092..2282.202 rows=15 loops=1)
   Group Key: (date_trunc('DAY'::text, pageview_start_tstamp))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16) (actual
time=2281.749..2282.060 rows=105 loops=1)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16) (actual time=2276.879..2277.030 rows=15 loops=7)
   Group Key: date_trunc('DAY'::text, pageview_start_tstamp)
   ->  Parallel Seq Scan on celebrus_fact_agg_1_p2015_12
 (cost=0.00..743769.76 rows=4221741 width=12) (actual time=0.066..1631
.650 rows=3618887 loops=7)

One question - how is the upper limit of workers chosen?


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 12:30 PM, David Rowley  wrote:

> On 14 March 2016 at 14:16, James Sewell  wrote:
>
>> I've done some testing with one of my data sets in an 8VPU virtual
>> environment and this is looking really, really good.
>>
>> My test query is:
>>
>> SELECT pageview, sum(pageview_count)
>> FROM fact_agg_2015_12
>> GROUP BY date_trunc('DAY'::text, pageview);
>>
>> The query returns 15 rows. The fact_agg table is 5398MB and holds around
>> 25 million records.
>>
>> Explain with a max_parallel_degree of 8 tells me that the query will
>> only use 6 background workers. I have no indexes on the table currently.
>>
>> Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
>>Group Key: (date_trunc('DAY'::text, pageview))
>>->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
>>  Number of Workers: 6
>>  ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
>> width=16)
>>Group Key: date_trunc('DAY'::text, pageview)
>>->  Parallel Seq Scan on fact_agg_2015_12
>>  (cost=0.00..743769.76 rows=4221741 width=12)
>>
>
> Great! Thanks for testing this.
>
> If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
> number of Gather rows come out at 105? I'd just like to get an idea of my
> cost estimate for the Gather are going to be accurate for real world data
> sets.
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 14:16, James Sewell  wrote:

> I've done some testing with one of my data sets in an 8VPU virtual
> environment and this is looking really, really good.
>
> My test query is:
>
> SELECT pageview, sum(pageview_count)
> FROM fact_agg_2015_12
> GROUP BY date_trunc('DAY'::text, pageview);
>
> The query returns 15 rows. The fact_agg table is 5398MB and holds around
> 25 million records.
>
> Explain with a max_parallel_degree of 8 tells me that the query will only
> use 6 background workers. I have no indexes on the table currently.
>
> Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
>Group Key: (date_trunc('DAY'::text, pageview))
>->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
>  Number of Workers: 6
>  ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
> width=16)
>Group Key: date_trunc('DAY'::text, pageview)
>->  Parallel Seq Scan on fact_agg_2015_12
>  (cost=0.00..743769.76 rows=4221741 width=12)
>

Great! Thanks for testing this.

If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
number of Gather rows come out at 105? I'd just like to get an idea of my
cost estimate for the Gather are going to be accurate for real world data
sets.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi,

I've done some testing with one of my data sets in an 8VPU virtual
environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25
million records.

Explain with a max_parallel_degree of 8 tells me that the query will only
use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
   Group Key: (date_trunc('DAY'::text, pageview))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16)
   Group Key: date_trunc('DAY'::text, pageview)
   ->  Parallel Seq Scan on fact_agg_2015_12
 (cost=0.00..743769.76 rows=4221741 width=12)


I am getting the following timings (everything was cached before I started
tested). I didn't average the runtime, but I ran each one three times and
took the middle value.

*max_parallel_degree runtime*
0  11693.537 ms
1  6387.937 ms
2 4328.629 ms
3 3292.376 ms
4 2743.148 ms
5 2278.449 ms
6 2000.599 ms


I'm pretty happy!

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 8:44 AM, David Rowley 
wrote:

> On 12 March 2016 at 16:31, David Rowley 
> wrote:
> > I've attached an updated patch which is based on commit 7087166,
> > things are really changing fast in the grouping path area at the
> > moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread Haribabu Kommi

On Mon, Mar 14, 2016 at 8:44 AM, David Rowley
 wrote:
> On 12 March 2016 at 16:31, David Rowley  wrote:
>> I've attached an updated patch which is based on commit 7087166,
>> things are really changing fast in the grouping path area at the
>> moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.

The setrefs.c fix for updating the finalize-aggregate target list is nice.
I tested all the float aggregates and are working fine.

Overall the patch is fine. I will do some test and provide the update later.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 12 March 2016 at 16:31, David Rowley  wrote:
> I've attached an updated patch which is based on commit 7087166,
> things are really changing fast in the grouping path area at the
> moment, but hopefully the dust is starting to settle now.

The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_aggregation_015971a_2016-03-14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-11 Thread David Rowley

On 11 March 2016 at 03:39, David Rowley  wrote:
> A couple of things which I'm not 100% happy with.
>
> 1. make_partialgroup_input_target() doing lookups to the syscache.
> Perhaps this job can be offloaded to a new function in a more suitable
> location. Ideally the Aggref would already store the required
> information, but I don't see a great place to be looking that up.

I've made some changes and moved this work off to a new function in tlist.c.

> 2. I don't really like how I had to add tlist to
> create_grouping_paths(), but I didn't want to go to the trouble of
> calculating the partial agg PathTarget if Parallel Aggregation is not
> possible, as this does do syscache lookups, so it's not free, so I'd
> rather only do it if we're actually going to add some parallel paths.

This is now fixed. The solution was much easier after 49635d7b.

> 3. Something about the force_parallel_mode GUC. I'll think about this
> when I start to think about how to test this, as likely I'll need
> that, else I'd have to create tables bigger than we'd want to in the
> regression tests.

On further analysis it seems that this GUC does not do what I thought
it did, which will be why Robert said that I don't need to think about
it here. The GUC just seems to add a Gather node at the base of the
plan tree, when possible. Which leaves me a bit lost when it comes to
how to write tests for this... It seems like I need to add at least
136k rows to a test table to get a Parallel Aggregate plan, which I
think is a no-go for the regression test suite... that's with
parallel_setup_cost=0;

It would be nice if that GUC just, when enabled, preferred the
cheapest parallel plan (when available), rather than hacking in a
Gather node into the plan's root. This should have the same result in
many cases anyway, and would allow me to test this without generating
oversized tables in the regression tests.

I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.

This patch also fixes a couple of bugs, one in the cost estimates for
the number of groups that will be produced by the final aggregate,
also there was a missing copyObject() in the setrefs.c code which
caused a Var not found in targetlist problem in setrefs.c for plans
with more than 1 partial aggregate node... I had to modify the planner
to get it to add an additional aggregate node to test this (separate
test patch for this is attached).

Comments/Reviews/Testing all welcome.

Thanks

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

parallel_aggregation_8a26089_2016-03-12.patch
Description: Binary data

parallel_aggregation_extra_combine_node.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-10 Thread David Rowley

On 8 March 2016 at 11:15, David Rowley  wrote:
> The setrefs.c parts of the patch remain completely broken. I've not
> had time to look at this again yet, sorry.

Ok, so again, apologies for previously sending such a broken patch.
I've since managed to free up a bit of time to work on this, which now
consists of a good bit more than the sum total of my weekly lunch
hours.

The attached patch is a much more complete patch, and quite possible
now review worthy.

I set about solving the setrefs.c problem by inventing a PartialAggref
node type which wraps up Aggrefs when they're in a agg node which has
finalizeAggs = false. This node type exists all the way until executor
init time, where it's then removed and replaced with the underlying
Aggref. This seems to solve the targetlist return type issue.
I'd really like some feedback on this method of solving that problem.

I've also fixed numerous other bugs, including the HAVING clause problem.

Things left to do:
1. Make a final pass of the patch not at 3am.
2. Write some tests.
3. I've probably missed a few places that should handle T_PartialAggref.

A couple of things which I'm not 100% happy with.

1. make_partialgroup_input_target() doing lookups to the syscache.
Perhaps this job can be offloaded to a new function in a more suitable
location. Ideally the Aggref would already store the required
information, but I don't see a great place to be looking that up.
2. I don't really like how I had to add tlist to
create_grouping_paths(), but I didn't want to go to the trouble of
calculating the partial agg PathTarget if Parallel Aggregation is not
possible, as this does do syscache lookups, so it's not free, so I'd
rather only do it if we're actually going to add some parallel paths.
3. Something about the force_parallel_mode GUC. I'll think about this
when I start to think about how to test this, as likely I'll need
that, else I'd have to create tables bigger than we'd want to in the
regression tests.

I've also attached an .sql file with an aggregate function aimed to
test the new PartialAggref stuff works properly, as previously it
seemed to work by accident with sum(int).

Just some numbers to maybe make this more interesting:

create table t (num int not null, one int not null, ten int not null,
hundred int not null, thousand int not null, tenk int not null,
hundredk int not null, million int not null);
insert into t select
x,1,x%10+1,x%100+1,x%1000+1,x%1+1,x%10+1,x%100 from
generate_series(1,1000)x(x);

-- Serial Plan
# explain select sum(num) from t;
QUERY PLAN
---
 Aggregate  (cost=198530.00..198530.01 rows=1 width=8)
   ->  Seq Scan on t  (cost=0.00..173530.00 rows=1000 width=4)

# select sum(num) from t;
  sum

 500500
(1 row)
Time: 1036.119 ms

# set max_parallel_degree=4;

-- Parallel Plan
# explain select sum(num) from t;
  QUERY PLAN
--
 Finalize Aggregate  (cost=105780.52..105780.53 rows=1 width=8)
   ->  Gather  (cost=105780.00..105780.51 rows=5 width=8)
 Number of Workers: 4
 ->  Partial Aggregate  (cost=104780.00..104780.01 rows=1 width=8)
   ->  Parallel Seq Scan on t  (cost=0.00..98530.00
rows=250 width=4)
(5 rows)

# select sum(num) from t;
  sum

 500500
(1 row)

Time: 379.117 ms

I'll try and throw a bit parallel aggregate work to a 4 socket / 64
core server which I have access to... just for fun.

Reviews are now welcome.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

parallel_aggregation_cc3763e_2016-03-11.patch
Description: Binary data

parallel_aggregation_test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-10 Thread Amit Langote

On Thu, Mar 10, 2016 at 10:55 PM, Robert Haas  wrote:
> On Thu, Mar 10, 2016 at 6:42 AM, David Rowley
>  wrote:
>> The one reason that I asked about force_parallel_mode was that I
>> assumed there was some buildfarm member running somewhere that
>> switches this on and runs the regression tests. I figured that if it
>> exists for other parallel features, then it probably should for this
>> too. Can you explain why you think this should be handled differently?
>
> Yeah, I think Noah set up such a buildfarm member, but I can't
> remember the name of it off-hand.

mandrill [1]?

Thanks,
Amit

[1] http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=mandrill=HEAD


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 6:42 AM, David Rowley
 wrote:
> Hmm, it appears I only looked as far as the enum declaration, which I
> expected to have something. Perhaps I'm just not used to looking up
> the manual for things relating to code.

I don't mind adding some comments there, it just didn't seem all that
important to me.  Feel free to propose something.

> The one reason that I asked about force_parallel_mode was that I
> assumed there was some buildfarm member running somewhere that
> switches this on and runs the regression tests. I figured that if it
> exists for other parallel features, then it probably should for this
> too. Can you explain why you think this should be handled differently?

Yeah, I think Noah set up such a buildfarm member, but I can't
remember the name of it off-hand.

I think running the tests with this patch and
force_parallel_mode=regress, max_parallel_degree>0 is a good idea, but
I don't expect it to turn up too much.  That configuration is mostly
designed to test whether the basic parallelism infrastructure works or
breaks things.  It's not intended to test whether your parallel query
plans are any good - you have to write your own tests for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-10 Thread David Rowley

On 9 March 2016 at 04:06, Robert Haas  wrote:
> On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
>  wrote:
>> 3. Nothing in create_grouping_paths() looks at the force_parallel_mode
>> GUC. I had a quick look at this GUC and was a bit surprised to see 3
>> possible states, but no explanation of what they do, so I've not added
>> code which pays attention to this setting yet. I'd imagine this is
>> just a matter of skipping serial path generation when parallel is
>> possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
>> what FORCE_PARALLEL_REGRESS is for yet.
>
> The GUC is documented just like all the other GUCs are documented.
> Maybe that's not enough, but I don't think "no explanation of what
> they do" is accurate.  But I don't see why this patch should need to
> care about force_parallel_mode at all.  force_parallel_mode is about
> making queries that wouldn't choose to run in parallel do on their own
> do so anyway, whereas this patch is about making more queries able to
> do more work in parallel.

Hmm, it appears I only looked as far as the enum declaration, which I
expected to have something. Perhaps I'm just not used to looking up
the manual for things relating to code.

The one reason that I asked about force_parallel_mode was that I
assumed there was some buildfarm member running somewhere that
switches this on and runs the regression tests. I figured that if it
exists for other parallel features, then it probably should for this
too. Can you explain why you think this should be handled differently?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-09 Thread Haribabu Kommi

On Mon, Mar 7, 2016 at 4:39 PM, Tom Lane  wrote:
> Haribabu Kommi  writes:
>> 2. Temporary fix for float aggregate types in _equalAggref because of
>> a change in aggtype to trans type, otherwise the parallel aggregation
>> plan failure in set_plan_references. whenever the aggtype is not matched,
>> it verifies with the trans type also.
>
> That is a completely unacceptable kluge.  Quite aside from being ugly as
> sin, it probably breaks more things than it fixes, first because it breaks
> the fundamental semantics of equal() across the board, and second because
> it puts catalog lookups into equal(), which *will* cause problems.  You
> can not expect that this will get committed, not even as a "temporary fix".

I am not able to find a better solution to handle this problem, i will provide
the details of the problem and why I did the change, so if you can provide
some point where to look into, that would be helpful.

In parallel aggregate, as the aggregate operation is divided into two steps
such as finalize and partial aggregate. The partial aggregate is executed
in the worker and returns the results of transition data which is of type
aggtranstype. This can work fine even if we don't change the targetlist
aggref return type from aggtype to aggtranstype for aggregates whose
aggtype is a variable length data type. The output slot that is generated
with variable length type, so even if we send the aggtrantype data that
is also a variable length, this can work.

But when it comes to the float aggregates, the aggtype is a fixed length
and aggtranstype is a variable length data type. so if we try to change
the aggtype of a aggref in set_plan_references function with aggtrantype
only the partial aggregate targetlist is getting changed, because the
set_plan_references works from top of the plan.

To avoid this problem, I changed the target list type during the partial
aggregate path generation itself and that leads to failure in _equalAggref
function in set_plan_references. Because of which I put the temporary
fix.

Do you have any point in handling this problem?

Regards,
Hari Babu
Fujitsu Australia

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-09 Thread Robert Haas

On Tue, Mar 8, 2016 at 4:26 PM, David Rowley
 wrote:
>> The first one in the list will be the cheapest; why not just look at
>> that?  Sorted partial paths are interesting if some subsequent path
>> construction step can make use of that sort ordering, but they're
>> never interesting from the point of view of matching the query's
>> pathkeys because of the fact that Gather is order-destroying.
>
> In this case a sorted partial path is useful as the partial agg node
> sits below Gather. The sorted input is very interesting for the
> partial agg node with a strategy of AGG_SORTED. In most cases with
> parallel aggregate it's the partial stage that will take the most
> time, so if we do get pre-sorted partial paths, this will be very good
> indeed for parallel agg.

OK.  So then you probably want to consider the cheapest one, which
will be first.  And then, if there's one that has the pathkeys you
want, also consider that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-08 Thread David Rowley

On 9 March 2016 at 04:06, Robert Haas  wrote:
> On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
>  wrote:
>> My concerns are:
>> 1. Since there's no cheapest_partial_path in RelOptInfo the code is
>> currently considering every partial_path for parallel hash aggregate.
>> With normal aggregation we only ever use the cheapest path, so this
>> may not be future proof. As of today we do only have at most one
>> partial path in the list, but there's no reason to code this with that
>> assumption. I didn't put in much effort to improve this as I see code
>> in generate_gather_paths() which also makes assumptions about there
>> just being 1 partial path. Perhaps we should expand RelOptInfo to
>> track the cheapest partial path? or maybe allpaths.c should have a
>> function to fetch the cheapest out of the list?
>
> The first one in the list will be the cheapest; why not just look at
> that?  Sorted partial paths are interesting if some subsequent path
> construction step can make use of that sort ordering, but they're
> never interesting from the point of view of matching the query's
> pathkeys because of the fact that Gather is order-destroying.

In this case a sorted partial path is useful as the partial agg node
sits below Gather. The sorted input is very interesting for the
partial agg node with a strategy of AGG_SORTED. In most cases with
parallel aggregate it's the partial stage that will take the most
time, so if we do get pre-sorted partial paths, this will be very good
indeed for parallel agg.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-08 Thread Robert Haas

On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
 wrote:
> My concerns are:
> 1. Since there's no cheapest_partial_path in RelOptInfo the code is
> currently considering every partial_path for parallel hash aggregate.
> With normal aggregation we only ever use the cheapest path, so this
> may not be future proof. As of today we do only have at most one
> partial path in the list, but there's no reason to code this with that
> assumption. I didn't put in much effort to improve this as I see code
> in generate_gather_paths() which also makes assumptions about there
> just being 1 partial path. Perhaps we should expand RelOptInfo to
> track the cheapest partial path? or maybe allpaths.c should have a
> function to fetch the cheapest out of the list?

The first one in the list will be the cheapest; why not just look at
that?  Sorted partial paths are interesting if some subsequent path
construction step can make use of that sort ordering, but they're
never interesting from the point of view of matching the query's
pathkeys because of the fact that Gather is order-destroying.

> 3. Nothing in create_grouping_paths() looks at the force_parallel_mode
> GUC. I had a quick look at this GUC and was a bit surprised to see 3
> possible states, but no explanation of what they do, so I've not added
> code which pays attention to this setting yet. I'd imagine this is
> just a matter of skipping serial path generation when parallel is
> possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
> what FORCE_PARALLEL_REGRESS is for yet.

The GUC is documented just like all the other GUCs are documented.
Maybe that's not enough, but I don't think "no explanation of what
they do" is accurate.  But I don't see why this patch should need to
care about force_parallel_mode at all.  force_parallel_mode is about
making queries that wouldn't choose to run in parallel do on their own
do so anyway, whereas this patch is about making more queries able to
do more work in parallel.

> The setrefs.c parts of the patch remain completely broken. I've not
> had time to look at this again yet, sorry.

I hope you get time soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-07 Thread David Rowley

On 5 March 2016 at 07:25, Robert Haas  wrote:
> On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
>> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
>> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
>> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
>> stage. I just thought doing this is more complex than what's really
>> needed, but if someone can think of a case where this would be a great
>> win then I'll listen, but you have to remember we don't have any
>> pre-sorted partial paths at this stage, so an explicit sort is
>> required *always*. This might change if someone invented partial btree
>> index scans... but until then...
>
> Actually, Rahila Syed is working on that.  But it's not done yet, so
> presumably will not go into 9.6.
>
> I don't really see the logic of this, though.  Currently, Gather
> destroys the input ordering, so it seems preferable for the
> finalize-aggregates stage to use a hash aggregate whenever possible,
> whatever the partial-aggregate stage did.  Otherwise, we need an
> explicit sort.  Anyway, it seems like the two stages should be costed
> and decided on their own merits - there's no reason to chain the two
> decisions together.

Thanks for looking at this.
I've attached an updated patch which re-bases the whole patch on top
of the upper planner changes which have just been committed.
In this version create_grouping_paths() does now consider mixed
strategies of hashed and sorted, although I have a few concerns with
the code that I've written. I'm solely posting this early to minimise
any duplicate work.

My concerns are:
1. Since there's no cheapest_partial_path in RelOptInfo the code is
currently considering every partial_path for parallel hash aggregate.
With normal aggregation we only ever use the cheapest path, so this
may not be future proof. As of today we do only have at most one
partial path in the list, but there's no reason to code this with that
assumption. I didn't put in much effort to improve this as I see code
in generate_gather_paths() which also makes assumptions about there
just being 1 partial path. Perhaps we should expand RelOptInfo to
track the cheapest partial path? or maybe allpaths.c should have a
function to fetch the cheapest out of the list?

2. In mixed parallel aggregate mode, when the query has no aggregate
functions, the code currently will use a nodeAgg for AGG_SORTED
strategy rather than a nodeGroup, as it would in serial agg mode. This
probably needs to be changed.

3. Nothing in create_grouping_paths() looks at the force_parallel_mode
GUC. I had a quick look at this GUC and was a bit surprised to see 3
possible states, but no explanation of what they do, so I've not added
code which pays attention to this setting yet. I'd imagine this is
just a matter of skipping serial path generation when parallel is
possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
what FORCE_PARALLEL_REGRESS is for yet.

The setrefs.c parts of the patch remain completely broken. I've not
had time to look at this again yet, sorry.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

parallel_aggregation_cc75f61_2016-03-08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-06 Thread David Rowley

On 7 March 2016 at 18:19, Haribabu Kommi  wrote:
> Here I attached update patch with further changes,
> 1. Explain plan changes for parallel grouping

Perhaps someone might disagree with me, but I'm not all that sure I
really get the need for that. With nodeAgg.c we're doing something
fundamentally different in Partial mode as we are in Finalize mode,
that's why I wanted to give an indication of that in the explain.c
originally. A query with no Aggregate functions using nodeGroup.c
there is no special handling in the executor for partial and final
stages, so I really don't see why we need to give the impression that
there is in EXPLAIN.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-06 Thread Tom Lane

Haribabu Kommi  writes:
> 2. Temporary fix for float aggregate types in _equalAggref because of
> a change in aggtype to trans type, otherwise the parallel aggregation
> plan failure in set_plan_references. whenever the aggtype is not matched,
> it verifies with the trans type also.

That is a completely unacceptable kluge.  Quite aside from being ugly as
sin, it probably breaks more things than it fixes, first because it breaks
the fundamental semantics of equal() across the board, and second because
it puts catalog lookups into equal(), which *will* cause problems.  You
can not expect that this will get committed, not even as a "temporary fix".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-06 Thread Haribabu Kommi

On Sun, Mar 6, 2016 at 10:21 PM, Haribabu Kommi
 wrote:
>
> Pending:
> 1. Explain plan needs to be corrected for parallel grouping similar like
> parallel aggregate.

Here I attached update patch with further changes,
1. Explain plan changes for parallel grouping

2. Temporary fix for float aggregate types in _equalAggref because of
a change in aggtype to trans type, otherwise the parallel aggregation
plan failure in set_plan_references. whenever the aggtype is not matched,
it verifies with the trans type also.

3. Generates parallel path for all partial paths and add it to the path_list,
based on the cheapest_path, the plan is chosen.

To apply this patch, first apply the patch in [1]

[1] - http://www.postgresql.org/message-id/14172.1457228...@sss.pgh.pa.us

Regards,
Hari Babu
Fujitsu Australia

parallelagg_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-06 Thread Haribabu Kommi

On Fri, Mar 4, 2016 at 3:00 PM, David Rowley
 wrote:
> On 17 February 2016 at 17:50, Haribabu Kommi  wrote:
>> Here I attached a draft patch based on previous discussions. It still needs
>> better comments and optimization.
>
> Over in [1] Tom posted a large change to the grouping planner which
> causes large conflict with the parallel aggregation patch. I've been
> looking over Tom's patch and reading the related thread and I've
> observed 3 things:
>
> 1. Parallel Aggregate will be much easier to write and less code to
> base it up top of Tom's upper planner changes. The latest patch does
> add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
> be required after Tom pushes the changes to the upper planner.
> 2. If we apply parallel aggregate before Tom's upper planner changes
> go in, then Tom needs to reinvent it again when rebasing his patch.
> This seems senseless, so this is why I did this work.
> 3. Based on the thread, most people are leaning towards getting Tom's
> changes in early to allow a bit more settle time before beta, and
> perhaps also to allow other patches to go in after (e.g this)
>
> So, I've done a bit of work and I've rewritten the parallel aggregate
> code to base it on top of Tom's patch posted in [1]. There's a few
> things that are left unsolved at this stage.
>
> 1. exprType() for Aggref still returns the aggtype, where it needs to
> return the trans type for partial agg nodes, this need to return the
> trans type rather than the aggtype. I had thought I might fix this by
> adding a proxy node type that sits in the targetlist until setrefs.c
> where it can be plucked out and replaced by the Aggref. I need to
> investigate this further.
> 2. There's an outstanding bug relating to HAVING clause not seeing the
> right state of aggregation and returning wrong results. I've not had
> much time to look into this yet, but I suspect its an existing bug
> that's already in master from my combine aggregate patch. I will
> investigate this on Sunday.
>

Thanks for updating the patch. Here I attached updated patch
with the additional changes,

1. Now parallel aggregation works with expressions along with aggregate
functions also.
2. Aggref return the trans type instead of agg type, this change adds
the support of parallel aggregate to float aggregates, still it needs a
fix in _equalAggref function.

Pending:
1. Explain plan needs to be corrected for parallel grouping similar like
parallel aggregate.

To apply this patch, first apply the patch in [1]

[1] - http://www.postgresql.org/message-id/14172.1457228...@sss.pgh.pa.us

Regards,
Hari Babu
Fujitsu Australia


parallelagg_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-04 Thread Robert Haas

On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
 wrote:
> On 17 February 2016 at 17:50, Haribabu Kommi  wrote:
>> Here I attached a draft patch based on previous discussions. It still needs
>> better comments and optimization.
>
> Over in [1] Tom posted a large change to the grouping planner which
> causes large conflict with the parallel aggregation patch. I've been
> looking over Tom's patch and reading the related thread and I've
> observed 3 things:
>
> 1. Parallel Aggregate will be much easier to write and less code to
> base it up top of Tom's upper planner changes. The latest patch does
> add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
> be required after Tom pushes the changes to the upper planner.
> 2. If we apply parallel aggregate before Tom's upper planner changes
> go in, then Tom needs to reinvent it again when rebasing his patch.
> This seems senseless, so this is why I did this work.
> 3. Based on the thread, most people are leaning towards getting Tom's
> changes in early to allow a bit more settle time before beta, and
> perhaps also to allow other patches to go in after (e.g this)
>
> So, I've done a bit of work and I've rewritten the parallel aggregate
> code to base it on top of Tom's patch posted in [1].

Great!

> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
> stage. I just thought doing this is more complex than what's really
> needed, but if someone can think of a case where this would be a great
> win then I'll listen, but you have to remember we don't have any
> pre-sorted partial paths at this stage, so an explicit sort is
> required *always*. This might change if someone invented partial btree
> index scans... but until then...

Actually, Rahila Syed is working on that.  But it's not done yet, so
presumably will not go into 9.6.

I don't really see the logic of this, though.  Currently, Gather
destroys the input ordering, so it seems preferable for the
finalize-aggregates stage to use a hash aggregate whenever possible,
whatever the partial-aggregate stage did.  Otherwise, we need an
explicit sort.  Anyway, it seems like the two stages should be costed
and decided on their own merits - there's no reason to chain the two
decisions together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-03 Thread David Rowley

On 17 February 2016 at 17:50, Haribabu Kommi  wrote:
> Here I attached a draft patch based on previous discussions. It still needs
> better comments and optimization.

Over in [1] Tom posted a large change to the grouping planner which
causes large conflict with the parallel aggregation patch. I've been
looking over Tom's patch and reading the related thread and I've
observed 3 things:

1. Parallel Aggregate will be much easier to write and less code to
base it up top of Tom's upper planner changes. The latest patch does
add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
be required after Tom pushes the changes to the upper planner.
2. If we apply parallel aggregate before Tom's upper planner changes
go in, then Tom needs to reinvent it again when rebasing his patch.
This seems senseless, so this is why I did this work.
3. Based on the thread, most people are leaning towards getting Tom's
changes in early to allow a bit more settle time before beta, and
perhaps also to allow other patches to go in after (e.g this)

So, I've done a bit of work and I've rewritten the parallel aggregate
code to base it on top of Tom's patch posted in [1]. There's a few
things that are left unsolved at this stage.

1. exprType() for Aggref still returns the aggtype, where it needs to
return the trans type for partial agg nodes, this need to return the
trans type rather than the aggtype. I had thought I might fix this by
adding a proxy node type that sits in the targetlist until setrefs.c
where it can be plucked out and replaced by the Aggref. I need to
investigate this further.
2. There's an outstanding bug relating to HAVING clause not seeing the
right state of aggregation and returning wrong results. I've not had
much time to look into this yet, but I suspect its an existing bug
that's already in master from my combine aggregate patch. I will
investigate this on Sunday.

In regards to the patch, there's a few things worth mentioning here:

1. I've had to add a parallel_degree parameter to create_group_path()
and create_agg_path(). I think Tom is going to make changes to his
patch so that the Path's parallel_degree is propagated to subnodes,
this should allow me to remove this parameter and just use
parallel_degree the one from the subpath.
2. I had to add a new parameter to pass an optional row estimate to
cost_gather() as I don't have a RelOptInfo available to get a row
estimate from which represents the state after partial aggregation. I
thought this change was ok, but I'll listen to anyone who thinks of a
better way to do it.
3. The code never attempts to mix and match Grouping Agg and Hash Agg
plans. e.g it could be an idea to perform Partial Hash Aggregate ->
Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
stage. I just thought doing this is more complex than what's really
needed, but if someone can think of a case where this would be a great
win then I'll listen, but you have to remember we don't have any
pre-sorted partial paths at this stage, so an explicit sort is
required *always*. This might change if someone invented partial btree
index scans... but until then...

Due to the existence of the outstanding issues above, I feel like I
might be posting the patch a little earlier, but wanted to do so since
this is quite a hot area in the code at the moment and I wanted to
post for transparency.

To apply the patch please apply [1] first.

[1] http://www.postgresql.org/message-id/3795.1456689...@sss.pgh.pa.us

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

parallel_aggregation_d6850a9f_2016-03-04.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-02-16 Thread Haribabu Kommi

On Sat, Feb 13, 2016 at 3:51 PM, Robert Haas  wrote:
> On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi
>  wrote:future.
>> Here I attached updated patch with the corrections.
>
> So, what about the main patch, for parallel aggregation itself?  I'm
> reluctant to spend too much time massaging combine functions if we
> don't have the infrastructure to use them.

Here I attached a draft patch based on previous discussions. It still needs
better comments and optimization.

Overview:
1. Before creating the plan for the best path, verify whether parallel aggregate
plan is possible or not? If possible check whether it is the cheapest plan
compared to normal aggregate? If parallel is cheaper then replace the best
path with the cheapest_partial_path.

2. while generating parallel aggregate plan, first generate targetlist of
partial aggregate by generating bare aggregate references and group by
expressions.

3. Change the aggref->aggtype with aggtranstype in the partial aggregate
targetlist to return a proper tuple data from worker.

4. Generate partial aggregate node using the generated targetlist.

5. Add gather and finalize aggregate nodes on top of partial aggregate plan.

To do:
1. Optimize the aggregate cost calculation mechanism, currently it is used
many times.
2. Better comments and etc.

Please verify whether the patch is in the right direction as per your
expectation?

Regards,
Hari Babu
Fujitsu Australia

parallelagg_poc_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-02-12 Thread Robert Haas

On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi
 wrote:future.
> Here I attached updated patch with the corrections.

So, what about the main patch, for parallel aggregation itself?  I'm
reluctant to spend too much time massaging combine functions if we
don't have the infrastructure to use them.

This patch removes the comment from float8_regr_sxx in pg_proc.h for
no apparent reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-02-08 Thread Haribabu Kommi

On Mon, Feb 8, 2016 at 9:01 AM, Robert Haas  wrote:
>  On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi
>  wrote:
>>  [ new patch ]
>
> This patch contains a number of irrelevant hunks that really ought not
> to be here and make the patch harder to understand, like this:
>
> -* Generate appropriate target list for
> scan/join subplan; may be
> -* different from tlist if grouping or
> aggregation is needed.
> +* Generate appropriate target list for
> subplan; may be different from
> +* tlist if grouping or aggregation is needed.
>
> Please make a habit of getting rid of that sort of thing before submitting.

sure. I will take of such things in future.

> Generally, I'm not quite sure I understand the code here.  It seems to
> me that what we ought to be doing is that grouping_planner, right
> after considering using a presorted path (that is, just after the if
> (sorted_path) block between lines 1822-1850), ought to then consider
> using a partial path.  For the moment, it need not consider the
> possibility that there may be a presorted partial path, because we
> don't have any way to generate those yet.  (I have plans to fix that,
> but not in time for 9.6.)  So it can just consider doing a Partial
> Aggregate on the cheapest partial path using an explicit sort, or
> hashing; then, above the Gather, it can finalize either by hashing or
> by sorting and grouping.
>
> The trick is that there's no path representation of an aggregate, and
> there won't be until Tom finishes his upper planner path-ification
> work.  But it seems to me we can work around that.  Set best_path to
> the cheapest partial path, add a partial aggregate rather than a
> regular one around where it says "Insert AGG or GROUP node if needed,
> plus an explicit sort step if necessary", and then push a Gather node
> and a Finalize Aggregate onto the result.

Thanks, i will update the patch accordingly. Along with those changes,
I will try to calculate the cost involved in normal aggregate without
generating the plan and compare it against the parallel cost plan before
generating the actual plan. Because with less number of groups
normal aggregate is performing better than parallel aggregate in tests.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-02-07 Thread Robert Haas

On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi
 wrote:
> On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
>  wrote:
>> Here I attached updated patch with additional combine function for
>> two stage aggregates also.
>
> A wrong combine function was added in pg_aggregate.h in the earlier
> patch that leading to
> initdb problem. Corrected one is attached.

I'm not entirely sure I know what's going on here, but I'm pretty sure
that it makes no sense for the new float8_pl function to reject
non-aggregate callers at the beginning and then have a comment at the
end indicating what it does when not invoked as an aggregate.
Similarly for the other new function.

It would be a lot more clear what this patch was trying to accomplish
if the new functions had header comments explaining their purpose -
not what they do, but why they exist.

float8_regr_pl is labeled in pg_proc.h as an aggregate transition
function, but I'm wondering if it should say combine function.

The changes to pg_aggregate.h include a large number of
whitespace-only changes which are unacceptable.  Please change only
the lines that need to be changed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-02-07 Thread Robert Haas

 On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi
 wrote:
>  [ new patch ]

This patch contains a number of irrelevant hunks that really ought not
to be here and make the patch harder to understand, like this:

-* Generate appropriate target list for
scan/join subplan; may be
-* different from tlist if grouping or
aggregation is needed.
+* Generate appropriate target list for
subplan; may be different from
+* tlist if grouping or aggregation is needed.

Please make a habit of getting rid of that sort of thing before submitting.

Generally, I'm not quite sure I understand the code here.  It seems to
me that what we ought to be doing is that grouping_planner, right
after considering using a presorted path (that is, just after the if
(sorted_path) block between lines 1822-1850), ought to then consider
using a partial path.  For the moment, it need not consider the
possibility that there may be a presorted partial path, because we
don't have any way to generate those yet.  (I have plans to fix that,
but not in time for 9.6.)  So it can just consider doing a Partial
Aggregate on the cheapest partial path using an explicit sort, or
hashing; then, above the Gather, it can finalize either by hashing or
by sorting and grouping.

The trick is that there's no path representation of an aggregate, and
there won't be until Tom finishes his upper planner path-ification
work.  But it seems to me we can work around that.  Set best_path to
the cheapest partial path, add a partial aggregate rather than a
regular one around where it says "Insert AGG or GROUP node if needed,
plus an explicit sort step if necessary", and then push a Gather node
and a Finalize Aggregate onto the result.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-02-07 Thread Haribabu Kommi

On Mon, Feb 8, 2016 at 2:00 AM, Robert Haas  wrote:
> On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi
>  wrote:
>> On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
>>  wrote:
>>> Here I attached updated patch with additional combine function for
>>> two stage aggregates also.
>>
>> A wrong combine function was added in pg_aggregate.h in the earlier
>> patch that leading to
>> initdb problem. Corrected one is attached.
>
> I'm not entirely sure I know what's going on here, but I'm pretty sure
> that it makes no sense for the new float8_pl function to reject
> non-aggregate callers at the beginning and then have a comment at the
> end indicating what it does when not invoked as an aggregate.
> Similarly for the other new function.
>
> It would be a lot more clear what this patch was trying to accomplish
> if the new functions had header comments explaining their purpose -
> not what they do, but why they exist.

I added some header comments explaining the need of these functions
and when they will be used? These combine functions are necessary
to float4 and float8 for parallel aggregation.

> float8_regr_pl is labeled in pg_proc.h as an aggregate transition
> function, but I'm wondering if it should say combine function.

corrected.

> The changes to pg_aggregate.h include a large number of
> whitespace-only changes which are unacceptable.  Please change only
> the lines that need to be changed.

I try to align the other rows according to the new combine function addition,
that leads to a white space problem, I will take care of such things in future.
Here I attached updated patch with the corrections.

Regards,
Hari Babu
Fujitsu Australia


additional_combine_fns_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-01-24 Thread Haribabu Kommi

On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
 wrote:
>
> Here I attached updated patch with additional combine function for
> two stage aggregates also.

A wrong combine function was added in pg_aggregate.h in the earlier
patch that leading to
initdb problem. Corrected one is attached.

Regards,
Hari Babu
Fujitsu Australia


additional_combine_fns_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-01-22 Thread David Rowley

On 22 January 2016 at 17:25, Haribabu Kommi  wrote:
> Along with these changes, I added a float8 combine function to see
> how it works under parallel aggregate, it is working fine for float4, but
> giving small data mismatch with float8 data type.
>
> postgres=# select avg(f3), avg(f4) from tbl;
>avg|   avg
> --+--
>  1.1002384186 | 100.12344879
> (1 row)
>
> postgres=# set enable_parallelagg = true;
> SET
> postgres=# select avg(f3), avg(f4) from tbl;
>avg|   avg
> --+--
>  1.1002384186 | 100.12344918
> (1 row)
>
> Column - f3 - float4
> Column - f4 - float8
>
> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
> aggregates. Any special care is needed for float8 datatype?

I'm not sure if this is what's going on here, as I don't really know
the range of numbers that you've used to populate f4 with. It would be
good to know, does "f4" contain negative values too?

It's not all that hard to demonstrate the instability of addition with
float8. Take the following example:

create table d (d float8);
insert into d 
values(1223123223412324.2231),(0.23),(-1223123223412324.2231);

# select sum(d order by random()) from d;
 sum
-
   0
(1 row)

same query, once more.

# select sum(d order by random()) from d;
   sum
--
 2.3e-013
(1 row)

Here the result just depends on the order which the numbers have been
added. You may need to execute a few more times to see the result
change.

Perhaps a good test would be to perform a sum(f4 order by random()) in
serial mode, and see if you're getting a stable result from the
numbers that you have populated the table with.

If that's the only problem at play here, then I for one am not worried
about it, as the instability already exists today depending on which
path is chosen to scan the relation. For example an index scan is
likely not to return rows in the same order as a seq scan.

We do also warn about this in the manual: "Inexact means that some
values cannot be converted exactly to the internal format and are
stored as approximations, so that storing and retrieving a value might
show slight discrepancies. Managing these errors and how they
propagate through calculations is the subject of an entire branch of
mathematics and computer science and will not be discussed here,
except for the following points:" [1]

[1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-01-22 Thread Haribabu Kommi

On Fri, Jan 22, 2016 at 10:13 PM, David Rowley
 wrote:
> On 22 January 2016 at 17:25, Haribabu Kommi  wrote:
>> Along with these changes, I added a float8 combine function to see
>> how it works under parallel aggregate, it is working fine for float4, but
>> giving small data mismatch with float8 data type.
>>
>> postgres=# select avg(f3), avg(f4) from tbl;
>>avg|   avg
>> --+--
>>  1.1002384186 | 100.12344879
>> (1 row)
>>
>> postgres=# set enable_parallelagg = true;
>> SET
>> postgres=# select avg(f3), avg(f4) from tbl;
>>avg|   avg
>> --+--
>>  1.1002384186 | 100.12344918
>> (1 row)
>>
>> Column - f3 - float4
>> Column - f4 - float8
>>
>> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
>> aggregates. Any special care is needed for float8 datatype?
>
> I'm not sure if this is what's going on here, as I don't really know
> the range of numbers that you've used to populate f4 with. It would be
> good to know, does "f4" contain negative values too?

No negative values are present in the f4 column.
Following are the SQL statements,

create table tbl(f1 int, f2 char(100), f3 float4, f4 float8);
insert into tbl values(generate_series(1,10), 'Fujitsu', 1.1, 100.12345);


> It's not all that hard to demonstrate the instability of addition with
> float8. Take the following example:
>
> create table d (d float8);
> insert into d 
> values(1223123223412324.2231),(0.23),(-1223123223412324.2231);
>
> # select sum(d order by random()) from d;
>  sum
> -
>0
> (1 row)
>
> same query, once more.
>
> # select sum(d order by random()) from d;
>sum
> --
>  2.3e-013
> (1 row)
>
> Here the result just depends on the order which the numbers have been
> added. You may need to execute a few more times to see the result
> change.
>
> Perhaps a good test would be to perform a sum(f4 order by random()) in
> serial mode, and see if you're getting a stable result from the
> numbers that you have populated the table with.
>
> If that's the only problem at play here, then I for one am not worried
> about it, as the instability already exists today depending on which
> path is chosen to scan the relation. For example an index scan is
> likely not to return rows in the same order as a seq scan.
>
> We do also warn about this in the manual: "Inexact means that some
> values cannot be converted exactly to the internal format and are
> stored as approximations, so that storing and retrieving a value might
> show slight discrepancies. Managing these errors and how they
> propagate through calculations is the subject of an entire branch of
> mathematics and computer science and will not be discussed here,
> except for the following points:" [1]
>
>
> [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html
>

Thanks for the detailed explanation. Now I understood.

Here I attached updated patch with additional combine function for
two stage aggregates also.


Regards,
Hari Babu
Fujitsu Australia


additional_combine_fns_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-01-21 Thread David Rowley

On 21 January 2016 at 18:26, Haribabu Kommi  wrote:
> Here I attached update parallel aggregate patch on top of recent commits
> of combine aggregate and parallel join patch. It still lacks of cost 
> comparison
> code to compare both parallel and normal aggregates.

Thanks for the updated patch.

I'm just starting to look over this now.

# create table t1 as select x from generate_series(1,100) x(x);
# vacuum ANALYZE t1;
# set max_parallel_degree =8;
# explain select sum(x) from t1;
   QUERY PLAN
-
 Aggregate  (cost=9633.33..9633.34 rows=1 width=4)
   ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667 width=4)
(2 rows)

I'm not quite sure what's happening here yet as I've not ran it
through my debugger, but how can we have a Parallel Seq Scan without a
Gather node? It appears to give correct results, so I can only assume
it's not actually a parallel scan at all.

Let's check:

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
 relname | seq_scan
-+--
 t1  |0
(1 row)

# explain analyze select sum(x) from t1;
QUERY PLAN
--
 Aggregate  (cost=9633.33..9633.34 rows=1 width=4) (actual
time=161.820..161.821 rows=1 loops=1)
   ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667
width=4) (actual time=0.051..85.348 rows=100 loops=1)
 Planning time: 0.040 ms
 Execution time: 161.861 ms
(4 rows)

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
 relname | seq_scan
-+--
 t1  |1
(1 row)

Only 1 scan.

# explain analyze select * from t1 where x=1;
   QUERY PLAN

 Gather  (cost=1000.00..10633.43 rows=1 width=4) (actual
time=0.231..49.105 rows=1 loops=1)
   Number of Workers: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..9633.33 rows=0 width=4)
(actual time=29.060..45.302 rows=0 loops=3)
 Filter: (x = 1)
 Rows Removed by Filter: 33
 Planning time: 0.049 ms
 Execution time: 51.438 ms
(7 rows)

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
 relname | seq_scan
-+--
 t1  |4
(1 row)

3 more scans. This one seems to actually be parallel, and makes sense
based on "Number of Workers: 2"

Also looking at the patch:

+bool
+aggregates_allow_partial(Node *clause)
+{

In the latest patch that I sent on the combine aggregates thread:
http://www.postgresql.org/message-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=rzq3kg6rnpn-fzzhddi...@mail.gmail.com
I made it so there's 3 possible return values from this function. As
your patch stands now, if I create an aggregate function with an
INTERNAL state with a combine function set, then this patch might try
to parallel aggregate that and pass around the pointer to the internal
state in the Tuple going from the worker to the main process, when the
main process dereferences this pointer we'll get a segmentation
violation. So I'd say you should maybe use a modified version of my
latest aggregates_allow_partial() and check for PAT_ANY, and only
parallelise the aggregate if you get that value.  If the use of
partial aggregate was within a single process then you could be quite
content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic
that checks for serial and deserial functions, since that's not in
yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates
are found which have combine functions set.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-01-21 Thread Haribabu Kommi

On Fri, Jan 22, 2016 at 7:44 AM, David Rowley
 wrote:
> On 21 January 2016 at 18:26, Haribabu Kommi  wrote:
>> Here I attached update parallel aggregate patch on top of recent commits
>> of combine aggregate and parallel join patch. It still lacks of cost 
>> comparison
>> code to compare both parallel and normal aggregates.
>
> Thanks for the updated patch.
>
> I'm just starting to look over this now.
>
> # create table t1 as select x from generate_series(1,100) x(x);
> # vacuum ANALYZE t1;
> # set max_parallel_degree =8;
> # explain select sum(x) from t1;
>QUERY PLAN
> -
>  Aggregate  (cost=9633.33..9633.34 rows=1 width=4)
>->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667 width=4)
> (2 rows)
>
> I'm not quite sure what's happening here yet as I've not ran it
> through my debugger, but how can we have a Parallel Seq Scan without a
> Gather node? It appears to give correct results, so I can only assume
> it's not actually a parallel scan at all.
>
> Let's check:
>
> # select relname,seq_scan from pg_stat_user_tables where relname ='t1';
>  relname | seq_scan
> -+--
>  t1  |0
> (1 row)
>
> # explain analyze select sum(x) from t1;
> QUERY PLAN
> --
>  Aggregate  (cost=9633.33..9633.34 rows=1 width=4) (actual
> time=161.820..161.821 rows=1 loops=1)
>->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667
> width=4) (actual time=0.051..85.348 rows=100 loops=1)
>  Planning time: 0.040 ms
>  Execution time: 161.861 ms
> (4 rows)
>
> # select relname,seq_scan from pg_stat_user_tables where relname ='t1';
>  relname | seq_scan
> -+--
>  t1  |1
> (1 row)
>
> Only 1 scan.
>
>
> # explain analyze select * from t1 where x=1;
>QUERY PLAN
> 
>  Gather  (cost=1000.00..10633.43 rows=1 width=4) (actual
> time=0.231..49.105 rows=1 loops=1)
>Number of Workers: 2
>->  Parallel Seq Scan on t1  (cost=0.00..9633.33 rows=0 width=4)
> (actual time=29.060..45.302 rows=0 loops=3)
>  Filter: (x = 1)
>  Rows Removed by Filter: 33
>  Planning time: 0.049 ms
>  Execution time: 51.438 ms
> (7 rows)
>
> # select relname,seq_scan from pg_stat_user_tables where relname ='t1';
>  relname | seq_scan
> -+--
>  t1  |4
> (1 row)
>
> 3 more scans. This one seems to actually be parallel, and makes sense
> based on "Number of Workers: 2"


The problem was the gather path that is generated on partial path list is
not getting added to path list, because of which, there is  a mismatch in
sorted path and cheapest_path, so it leads to a wrong plan.

For temporarily, I marked the sorted_path and cheapest_path as same
and it works fine.


> Also looking at the patch:
>
> +bool
> +aggregates_allow_partial(Node *clause)
> +{
>
> In the latest patch that I sent on the combine aggregates thread:
> http://www.postgresql.org/message-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=rzq3kg6rnpn-fzzhddi...@mail.gmail.com
> I made it so there's 3 possible return values from this function. As
> your patch stands now, if I create an aggregate function with an
> INTERNAL state with a combine function set, then this patch might try
> to parallel aggregate that and pass around the pointer to the internal
> state in the Tuple going from the worker to the main process, when the
> main process dereferences this pointer we'll get a segmentation
> violation. So I'd say you should maybe use a modified version of my
> latest aggregates_allow_partial() and check for PAT_ANY, and only
> parallelise the aggregate if you get that value.  If the use of
> partial aggregate was within a single process then you could be quite
> content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic
> that checks for serial and deserial functions, since that's not in
> yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates
> are found which have combine functions set.
>

I took the suggested code changes from combine aggregate patch and
changed accordingly.

Along with these changes, I added a float8 combine function to see
how it works under parallel aggregate, it is working fine for float4, but
giving small data mismatch with float8 data type.

postgres=# select avg(f3), avg(f4) from tbl;
   avg|   avg
--+--
 1.1002384186 | 100.12344879
(1 row)

postgres=# set enable_parallelagg = true;
SET
postgres=# select avg(f3), avg(f4) from tbl;
   avg|   avg

Re: [HACKERS] Parallel Aggregate

2016-01-20 Thread Haribabu Kommi

On Thu, Dec 24, 2015 at 5:12 AM, Robert Haas  wrote:
> On Mon, Dec 21, 2015 at 6:38 PM, David Rowley
>  wrote:
>> On 22 December 2015 at 04:16, Paul Ramsey  wrote:
>>>
>>> Shouldn’t parallel aggregate come into play regardless of scan
>>> selectivity?
>>
>> I'd say that the costing should take into account the estimated number of
>> groups.
>>
>> The more tuples that make it into each group, the more attractive parallel
>> grouping should seem. In the extreme case if there's 1 tuple per group, then
>> it's not going to be of much use to use parallel agg, this would be similar
>> to a scan with 100% selectivity. So perhaps the costings for it can be
>> modeled around a the parallel scan costing, but using the estimated groups
>> instead of the estimated tuples.
>
> Generally, the way that parallel costing is supposed to work (with the
> parallel join patch, anyway) is that you've got the same nodes costed
> the same way you would otherwise, but the row counts are lower because
> you're only processing 1/Nth of the rows.  That's probably not exactly
> the whole story here, but it's something to think about.

Here I attached update parallel aggregate patch on top of recent commits
of combine aggregate and parallel join patch. It still lacks of cost comparison
code to compare both parallel and normal aggregates.


Regards,
Hari Babu
Fujitsu Australia


parallelagg_poc_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-23 Thread Robert Haas

On Mon, Dec 21, 2015 at 6:38 PM, David Rowley
 wrote:
> On 22 December 2015 at 04:16, Paul Ramsey  wrote:
>>
>> Shouldn’t parallel aggregate come into play regardless of scan
>> selectivity?
>
> I'd say that the costing should take into account the estimated number of
> groups.
>
> The more tuples that make it into each group, the more attractive parallel
> grouping should seem. In the extreme case if there's 1 tuple per group, then
> it's not going to be of much use to use parallel agg, this would be similar
> to a scan with 100% selectivity. So perhaps the costings for it can be
> modeled around a the parallel scan costing, but using the estimated groups
> instead of the estimated tuples.

Generally, the way that parallel costing is supposed to work (with the
parallel join patch, anyway) is that you've got the same nodes costed
the same way you would otherwise, but the row counts are lower because
you're only processing 1/Nth of the rows.  That's probably not exactly
the whole story here, but it's something to think about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-21 Thread David Rowley

On 22 December 2015 at 04:16, Paul Ramsey  wrote:

> Shouldn’t parallel aggregate come into play regardless of scan selectivity?
>

I'd say that the costing should take into account the estimated number of
groups.

The more tuples that make it into each group, the more attractive parallel
grouping should seem. In the extreme case if there's 1 tuple per group,
then it's not going to be of much use to use parallel agg, this would be
similar to a scan with 100% selectivity. So perhaps the costings for it can
be modeled around a the parallel scan costing, but using the estimated
groups instead of the estimated tuples.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

1 2 >

1 - 100 of 126 matches

Mail list logo