Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12535 )

Change subject: IMPALA-8014: Revise FK/PK cardinality estimation
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12535/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/12535/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java@350
PS4, Line 350:  // The code has decided that the RHS is a dimension (detail, 
FK) table, and
             :     // that the LHS is a fact (master, FK) table
> isn't it the opposite?
Done


http://gerrit.cloudera.org:8080/#/c/12535/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java@416
PS4, Line 416: fkCard = Math.max(fkCard, largestKeyCard);
> Sorry, I meant you are multiplying all lhsNdv() for each slot and comparing
Right. Very confusing. This code is only for the compound key case. We are 
dealing with the lack of an NDV for the key as a whole. (This is where we can 
very much use Anurag's FK/PK enhancement.)

Note the step above in which we constrain the fk cardinality to be no greater 
than the master table cardinality. This can occur, as in our tests, when the fk 
consists of (id, int_col), so NDV(compound key) = NDV(id) * NDV(int_col) > 
master tale cardinality. So, we constrain the foreign key to be no greater than 
the master table (based on our M:1 assumption.)

But, then we have to deal with the case that our guess is wrong: the FK 
cardinality really is greater. Even if the two columns are perfectly 
correlated, their joint NDV has to be at least as large as the largest 
individual NDV.

To handle the reality of messy schemas and stats, we:

* Compute the "raw" joint NDV
* Constrain it by the detail table size (can't have more keys than rows)
* Constrain it by the master table size (can't have more FKs than PKs, ideally)
* Constrain it by largest key NDV size (can't have fewer compound keys than 
column NDV, even if that is greater than master table size)

I *think* this is right. The above messy adjustments are the result of 
inspecting tests that changed. But please do think it through to be certain.



--
To view, visit http://gerrit.cloudera.org:8080/12535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0db82564596b0a9271b89b8e3f7a3cf92d306da
Gerrit-Change-Number: 12535
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 08 Mar 2019 20:31:26 +0000
Gerrit-HasComments: Yes

Reply via email to