On Wed, Dec 28, 2022 at 6:18 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > This patch is aimed at being smarter about cases where we have > redundant GROUP BY entries, for example > > SELECT ... WHERE a.x = b.y GROUP BY a.x, b.y; > > It's clearly not necessary to perform grouping using both columns. > Grouping by either one alone would produce the same results, > assuming compatible equality semantics. I'm not sure how often > such cases arise in the wild; but we have about ten of them in our > regression tests, which makes me think it's worth the trouble to > de-duplicate as long as it doesn't cost too much. And it doesn't, > because PathKey construction already detects exactly this sort of > redundancy. We need only do something with the knowledge.
While we are here, I wonder if we can do the same trick for distinctClause, to cope with cases like select distinct a.x, b.y from a, b where a.x = b.y; And there is case from regression test 'select_distinct.sql' that can benefit from this optimization. -- -- Check mentioning same column more than once -- EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM (SELECT DISTINCT two, four, two FROM tenk1) ss; Thanks Richard