Re: [HACKERS] DML and column cound in aggregated subqueries
On 2016-10-31 09:35:57 -0400, Tom Lane wrote: > Andres Freund writes: > > this doesn't look right. The ctid shouldn't be in the aggregate output - > > after all it's pretty much meaningless here. > > I suspect it's being added to support EvalPlanQual row re-fetches. Hm, that doesn't seem particularly meaningful though? I wonder if it's not actually more an accident than anything. Looks like preprocess_rowmarks() adds them pretty unconditionally to everything for DML. > > Casting a wider net: find_hash_columns() and it's subroutines seem like > > pretty much dead code, including outdated comments (look for "SQL99 > > semantics"). > > Hm, maybe. In principle the planner could do that instead, but it doesn't > look like it actually does at the moment; I'm not seeing any distinction > in tlist-building for AGG_HASHED vs other cases in create_agg_plan(). Isn't the important part what's inside Agg->numCols/grpColIdx/grpOperators? Those should Because those are the ones that are minimal, and it looks we do the right thing there already (and if not, we'd be in trouble anyways afaics). We already only store columns that find_hash_columns() figures out to be required in the hashtable, but afaics we can instead just iterate over grpColIdx[] and just store the ones in there. The reason I'm looking into this in the first place is that the slot access inside execGrouping turns out to be pretty expensive, especially because all the tuples have nulls, so we can't even skip columns and such. I wrote code to create a new tupledesc which only containing the columns referenced by grpColIdx, and asserted that it's the same set that find_hash_columns() - but that's not always the case, but afaics spuriously. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DML and column cound in aggregated subqueries
Andres Freund writes: > this doesn't look right. The ctid shouldn't be in the aggregate output - > after all it's pretty much meaningless here. I suspect it's being added to support EvalPlanQual row re-fetches. > Casting a wider net: find_hash_columns() and it's subroutines seem like > pretty much dead code, including outdated comments (look for "SQL99 > semantics"). Hm, maybe. In principle the planner could do that instead, but it doesn't look like it actually does at the moment; I'm not seeing any distinction in tlist-building for AGG_HASHED vs other cases in create_agg_plan(). As an example: regression=# explain verbose select avg(ten), hundred from tenk1 group by 2; QUERY PLAN --- HashAggregate (cost=508.00..509.25 rows=100 width=36) Output: avg(ten), hundred Group Key: tenk1.hundred -> Seq Scan on public.tenk1 (cost=0.00..458.00 rows=1 width=8) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, tw othousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 (5 rows) If you wanted to forgo find_hash_columns() then it would be important for the seqscan to output a minimal tlist rather than try to optimize away its projection step. I'm not that excited about making such a change in terms of performance: you'd essentially be skipping a handmade projection step inside nodeAgg at the cost of probably adding one at the input node, as in this example. But it might be worth doing anyway just on the grounds that this ought to be the planner's domain not a custom hack in the executor. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers