On 15 March 2016 at 11:39, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > I've looked at this patch today. The patch seems quite solid, but I do have > a few minor comments (or perhaps questions, given that this is the first > time I looked at the patch). > > 1) exprCollation contains this bit: > ----------------------------------- > > case T_PartialAggref: > coll = InvalidOid; /* XXX is this correct? */ > break; > > I doubt this is the right thing to do. Can we actually get to this piece of > code? I haven't tried too hard, but regression tests don't seem to trigger > this piece of code.
Thanks for looking at this. Yeah, it's there because it it being called during setrefs.c in makeVarFromTargetEntry() via fix_combine_agg_expr_mutator(), so it's required when building the new target list for Aggref->args to point to the underlying Aggref's Var. As for the collation, I'm still not convinced if it's right or wrong. I know offlist you mentioned about string_agg() and sorting, but there' no code that'll sort the agg's state. The only possible thing that gets sorted there is the group by key. > Moreover, if we're handling PartialAggref in exprCollation(), maybe we > should handle it also in exprInputCollation and exprSetCollation? hmm, maybe, that I'm not sure about. I don't see where we'd call exprSetCollation() for this, but I think I need to look at exprInputCollation() > And if we really need the collation there, why not to fetch the actual > collation from the nested Aggref? Why should it be InvalidOid? It seems quite random to me to do that. If the trans type is bytea, why would it be useful to inherit the collation from the aggregate? I'm not confident I'm right with InvalidOId... I just don't think we can pretend the collation is the same as the Aggref's. > 2) partial_aggregate_walker > --------------------------- > > I think this should follow the naming convention that clearly identifies the > purpose of the walker, not what kind of nodes it is supposed to walk. So it > should be: > > aggregates_allow_partial_walker Agreed and changed. > 3) create_parallelagg_path > -------------------------- > > I do agree with the logic that under-estimates are more dangerous than > over-estimates, so the current estimate is safer. But I think this would be > a good place to apply the formula I proposed a few days ago (or rather the > one Dean Rasheed proposed in response). > > That is, we do know that there are numGroups in total, and each parallel > worker sees subpath->rows then it's expected to see > > sel = (subpath->rows / rel->tuples); > perGroup = (rel->tuples / numGroups); > workerGroups = numGroups * (1 - powl(1 - s, perGroup)); > numPartialGroups = numWorkers * workerGroups > > It's probably better to see Dean's message from 13/3. I think what I have works well when there's a small number of groups, as there's a good chance that each worker will see at least 1 tuple from each group. However I understand that will become increasingly unlikely with a larger number of groups, which is why I capped it to the total input rows, but even in cases before that cap is reached I think it will still overestimate. I'd need to analyze the code above to understand it better, but my initial reaction is that, you're probably right, but I don't think I want to inherit the fight for this. Perhaps it's better to wait until GROUP BY estimate improvement patch gets in, and change this, or if this gets in first, then you can include this change in your patch. I'm not trying to brush off the work, I just would rather it didn't delay parallel aggregate. > 4) Is clauses.h the right place for PartialAggType? I'm not sure that it is to be honest. I just put it there because the patch never persisted the value of a PartialAggType in any struct field anywhere and checks it later in some other file. In all places where we use PartialAggType we're also calling aggregates_allow_partial(), which does require clauses.h. So that's why it ended up there... I think I'll leave it there until someone gives me a good reason to move it. An updated patch will follow soon. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers