Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve join cardinality estimates
......................................................................


Patch Set 3:

> Patch Set 1:
>
> Thanks for the reply Aman. I wonder if we assume uniform distribution of 
> values, and the RHS's cardinality is less than or equal to the LHS's NDV then 
> does it matter if there are duplications on the right side?
>
> E.g. lets assume the followings:
>
>  LHS cardinality is 1000
>  LHS NDV is 10
>
>  RHS cardinality is 5
>  RHS NDV is unknown
>
> If RHS has 5 distinct values, then the selectivity of it is 50%, so the 
> JOIN's output cardinality should be 500.
> If RHS has the same value 5 times, then the selectivity is 10%, but the 
> multiplication factor is 5x, so the JOIN's output cardinality should be again 
> 500.

I was tied up with some higher priority tasks. I just uploaded a patchset with 
a reworked logic for the estimation and added tests accordingly. As noted in 
the commit message, give the plan changes, we should do a performance run. I 
will try to get that going.

With regard to your example, under the assumptions you have stated, yes the 
inner join's output cardinality estimates would be the same in both cases 
because the max NDV is 10. In my previous comment I was comparing the inner 
join and semi join in the presence of duplicates. If there are duplicates in 
the RHS then in the worst case the inner join cardinality could be N x M 
whereas for left semi join it cannot exceed N. (N and M are left and right 
input cardinality).


--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 19 May 2021 06:13:28 +0000
Gerrit-HasComments: No

Reply via email to