Re: [PATCH] Erase the distinctClause if the result is unique by definition

Andy Fan Tue, 10 Mar 2020 01:21:04 -0700

Hi David:


> 3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
> rel itself (I've not looked for the best location in detail),
> determine if the join can cause rows to be duplicated. If it can't,
> then add the UniqueKeys from that rel.


I have some concerns about this method,  maybe I misunderstand
something, if so, please advise.

In my current implementation, it calculates the uniqueness for each
BaseRel only, but in your way,  looks we need to calculate the
UniquePathKey for both BaseRel and JoinRel.   This makes more
difference for multi table join.    Another concern is UniquePathKey
is designed for a general purpose,  we need to maintain it no matter
distinctClause/groupbyClause.


>  For example: SELECT * FROM t1
> INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
> {t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
> since it's an eqijoin and t1.unique has a unique index).
>

Thanks for raising this.  My current rule requires *every* relation yields
a
unique result and *no matter* with the join method.  Actually I want to make
the rule less strict, for example, we  may just need 1 table yields unique
result
and the final result will be unique as well under some join type.

As for the t1 INNER JOIN t2 ON t1.unique = t2.not_unique;  looks we can't
remove the distinct based on this.

create table m1(a int primary key, b int);
create table m2(a int primary key, b int);
insert into m1 values(1, 1), (2, 1);
insert into m2 values(1, 1), (2, 1);
select distinct m1.a from m1, m2 where m1.a = m2.b;



> SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.
>

Actually I want to keep the distinct for this case now.  One reason is
there are only 1
row returned, so the distinct erasing can't help much.   The more important
reason is
Query->hasAggs is true for "select distinct  (select count(*) filter (where
t2.c2 = 6
and t2.c1 < 10) from ft1 t1 where t1.c1 = 6)  from ft2 t2 where t2.c2 % 6 =
0 order by 1;"
(this sql came from postgres_fdw.sql).

There's a separate effort in
> https://commitfest.postgresql.org/27/1741/ to implement some parts of
> the uniquekeys idea.  However the implementation currently only covers
> adding the unique keys to Paths, not to RelOptInfos


Thanks for this info.  I guess this patch is not merged so far, but looks
the cfbot
for my patch [1]  failed due to this :(   search
"explain (costs off) select distinct on(pk1) pk1, pk2 from
select_distinct_a;"


> I also believe that the existing code in analyzejoins.c should be
> edited to make use of unique keys rather than how it looks at unique
> indexes and group by / distinct clauses.
>
>  I can do this after we have agreement on the UniquePath.

For my cbbot failure, another strange thing is "A" appear ahead of "a" after
the order by..  Still didn't find out why.

[1]
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.83298

Regards
Andy Fan

Re: [PATCH] Erase the distinctClause if the result is unique by definition

Reply via email to