On Wed, Dec 28, 2022 at 6:18 AM Tom Lane <t...@sss.pgh.pa.us> wrote:

> This patch is aimed at being smarter about cases where we have
> redundant GROUP BY entries, for example
>
> SELECT ... WHERE a.x = b.y GROUP BY a.x, b.y;
>
> It's clearly not necessary to perform grouping using both columns.
> Grouping by either one alone would produce the same results,
> assuming compatible equality semantics.  I'm not sure how often
> such cases arise in the wild; but we have about ten of them in our
> regression tests, which makes me think it's worth the trouble to
> de-duplicate as long as it doesn't cost too much.  And it doesn't,
> because PathKey construction already detects exactly this sort of
> redundancy.  We need only do something with the knowledge.


While we are here, I wonder if we can do the same trick for
distinctClause, to cope with cases like

    select distinct a.x, b.y from a, b where a.x = b.y;

And there is case from regression test 'select_distinct.sql' that can
benefit from this optimization.

    --
    -- Check mentioning same column more than once
    --

    EXPLAIN (VERBOSE, COSTS OFF)
    SELECT count(*) FROM
      (SELECT DISTINCT two, four, two FROM tenk1) ss;

Thanks
Richard

Reply via email to