We already allow a SELECT's target list to contain non-aggregated columns
in a GROUP BY query in cases where the non-aggregated column is
functionally dependent on the GROUP BY clause.

For example a query such as;

SELECT p.product_id,p.description, SUM(s.quantity)
FROM product p
INNER JOIN sale s ON p.product_id = s.product_id
GROUP BY p.product_id;

is perfectly fine in PostgreSQL, as p.description is functionally dependent
on p.product_id (assuming product_id is the PRIMARY KEY of product).

It seems that there's no shortage of relational databases in existence
today which don't support this. These databases would require the GROUP BY
clause to include the p.description column too.

It seems rather unfortunate that people who migrate applications to
PostgreSQL may not be aware that we support this, as currently if we
needlessly include the p.description column, PostgreSQL naively includes
this column while grouping. These people could well be incurring a
performance penalty due to our planner not removing the useless items from
the list, as if the primary key is present, then including any other
columns won't cause splitting of the groups any further, all other columns
from the *same relation* can simply be removed from the GROUP BY clause.

There are in fact also two queries in TPC-H (Q10 and Q18) which are written
to include all of the non-aggregated column in the GROUP BY list. During a
recent test I witnessed a 50% gain in performance in Q10 by removing the
unneeded columns from the GROUP BY clause.

I've attached a patch which implements this in PostgreSQL.

The patch may need a little more work in order to adjust the targetlist's
tleSortGroupRefs to remove invalid ones and perhaps also remove the gaps.

I'm posting this now so that I can gauge the community interest in this.

Is it something that we'd like to have in PostgreSQL?

 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment: prune_group_by_clause_2027f512_2015-12-01.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to