Alex Behm has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. ......................................................................
Patch Set 2: (9 comments) Responding to comments. New tests are still WIP. http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS2, Line 27: import org.apache.impala.analysis.SlotDescriptor; > Is this used? No. Done. PS2, Line 40: import org.apache.kudu.shaded.com.google.common.base.Joiner; > replace Sorry, IDE. Done. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: Line 253: if (fkPkEqJoinConjuncts_ != null) { > My concern was that if we have compelling evidence that a FK/PK relationshi Makes sense, thanks. In that case the adjustment based on the NDV ratio in L312 would be wrong and lead to underestimation. http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS2, Line 79: - > nit: remove Done PS2, Line 80: ordered based on the tuple ids > I am not sure I get what that ordering is. Do you mean grouped? Reworded with "grouped by" PS2, Line 202: based on the single most : * selective join predicate. We do not attempt to estimate the joint selectivity of : * multiple join predicates to avoid underestimation. > Much better. Thanks Done PS2, Line 280: fkPkCandidate.get(0) > Can you plz explain this? I could be wrong about this but it seems that the Might have misunderstood your comment, but I'll give it a shot. In L277 we iterate over all groups of conjuncts that belong to the same joined tuple id pair. For each group, we compute the join NDV of the rhs slots and compare it to the number of rows in the rhs table. I didn't follow your comment about processing order. I don't think the processing order matters for what we want to check. Are you asking why it's ok to get(0)? All entries belong to the same tuple id pair, so the num rows is the same for all of them. Added a comment, but not sure if it addresses your concern. PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0); > Are we certain this is a valid precondition check? Why isn't a 0 rhsCard po Maybe you misread the code: It's greater or equal to zero, so zero is valid. PS2, Line 337: we adjustments > nit: grammar Done -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com> Gerrit-HasComments: Yes