On Tue, 10 Mar 2020 at 21:19, Andy Fan <zhihui.fan1...@gmail.com> wrote: > >> SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part. > > > Actually I want to keep the distinct for this case now. One reason is there > are only 1 > row returned, so the distinct erasing can't help much. The more important > reason is > Query->hasAggs is true for "select distinct (select count(*) filter (where > t2.c2 = 6 > and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 > order by 1;" > (this sql came from postgres_fdw.sql).
I think that sort of view is part of the problem here. If you want to invent some new way to detect uniqueness that does not count that case then we have more code with more possible places to have bugs. FWIW, query_is_distinct_for() does detect that case with: /* * If we have no GROUP BY, but do have aggregates or HAVING, then the * result is at most one row so it's surely unique, for any operators. */ if (query->hasAggs || query->havingQual) return true; which can be seen by the fact that the following find the unique join on t2. postgres=# explain verbose select * from t1 inner join (select count(*) c from t1) t2 on t1.a=t2.c; QUERY PLAN ------------------------------------------------------------------------------------ Hash Join (cost=41.91..84.25 rows=13 width=12) Output: t1.a, (count(*)) Inner Unique: true Hash Cond: (t1.a = (count(*))) -> Seq Scan on public.t1 (cost=0.00..35.50 rows=2550 width=4) Output: t1.a -> Hash (cost=41.89..41.89 rows=1 width=8) Output: (count(*)) -> Aggregate (cost=41.88..41.88 rows=1 width=8) Output: count(*) -> Seq Scan on public.t1 t1_1 (cost=0.00..35.50 rows=2550 width=0) Output: t1_1.a (12 rows) It will be very simple to add an empty List of UniqueKeys to the GROUP BY's RelOptInfo to indicate that all expressions are unique. That way any code that checks if some of the RelOptInfo's unique keys are a subset of some expressions they'd like to know are unique, then they'll get a match. It does not really matter how much effort is saved in your example above. The UniqueKey infrastructure won't know how much effort properly adding all the uniquekeys will save. It should just add all the keys it can and let whichever code cares about that reap the benefits.