Richard Guo <guofengli...@gmail.com> writes:
> On Wed, Dec 28, 2022 at 6:18 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
>> This patch is aimed at being smarter about cases where we have
>> redundant GROUP BY entries, for example
>> SELECT ... WHERE a.x = b.y GROUP BY a.x, b.y;

> While we are here, I wonder if we can do the same trick for
> distinctClause, to cope with cases like
>     select distinct a.x, b.y from a, b where a.x = b.y;

We do that already, no?

regression=# create table foo (x int, y int);
CREATE TABLE
regression=# explain select distinct * from foo where x = 1;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Unique  (cost=38.44..38.50 rows=11 width=8)
   ->  Sort  (cost=38.44..38.47 rows=11 width=8)
         Sort Key: y
         ->  Seq Scan on foo  (cost=0.00..38.25 rows=11 width=8)
               Filter: (x = 1)
(5 rows)

regression=# explain select distinct * from foo where x = y;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Unique  (cost=38.44..38.50 rows=11 width=8)
   ->  Sort  (cost=38.44..38.47 rows=11 width=8)
         Sort Key: x
         ->  Seq Scan on foo  (cost=0.00..38.25 rows=11 width=8)
               Filter: (x = y)
(5 rows)

But if you do

regression=# explain select * from foo where x = y group by x, y;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Group  (cost=38.44..38.52 rows=11 width=8)
   Group Key: x, y
   ->  Sort  (cost=38.44..38.47 rows=11 width=8)
         Sort Key: x
         ->  Seq Scan on foo  (cost=0.00..38.25 rows=11 width=8)
               Filter: (x = y)
(6 rows)

then you can see that the Sort step knows it need only consider
one column even though the Group step considers both.

                        regards, tom lane


Reply via email to