Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19682 )
Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts ...................................................................... Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java@496 PS6, Line 496: if (corrfactor > 0) cumulativeSel *= (((double) joinCard/lhsCard)/rhsCard); > This would be more readable as (double) joinCard/(lhsCard*rhsCard); Not doing (lhsCard * rhsCard) was intentional here and and other places. This was to avoid multiplication overflow for large values since both are long datatype. In a prior version of the patch I was doing a checkedMultiply() which checks for overflow but changed it to handle the lhsCard and rhsCard separately based on Csaba's comment. Regarding applying the correlation factor, I had a slightly different interpretation of this. If you have 4 equijoin conjuncts c1, c2, c3, c4 and each one's selectivity is s1, s2, s3, s4, the correlation factor is not necessarily representing a pair-wise correlation. If that was the case, we would get (s1 * s2)/CF for the first two then considering s3, we would get. (((s1 * s2)/CF) * s3)/CF and so on.. but c1 and c2 may not be correlated to each other at all. Only one of them might be correlated to c3. Having a single denominator allows us to set it to whatever factor we want to represent whether 2 or 3 or all 4 are correlated. It also makes it easier to articulate in the documentation. -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 10 Apr 2023 23:37:40 +0000 Gerrit-HasComments: Yes
