Alex Behm has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
......................................................................


Patch Set 2:

(9 comments)

Responding to comments. New tests are still WIP.

http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

PS2, Line 27: import org.apache.impala.analysis.SlotDescriptor;
> Is this used?
No. Done.


PS2, Line 40: import org.apache.kudu.shaded.com.google.common.base.Joiner;
> replace
Sorry, IDE. Done.


http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

Line 253:     if (fkPkEqJoinConjuncts_ != null) {
> My concern was that if we have compelling evidence that a FK/PK relationshi
Makes sense, thanks. In that case the adjustment based on the NDV ratio in L312 
would be wrong and lead to underestimation.


http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS2, Line 79: -
> nit: remove
Done


PS2, Line 80: ordered based on the tuple ids
> I am not sure I get what that ordering is. Do you mean grouped?
Reworded with "grouped by"


PS2, Line 202: based on the single most
             :    * selective join predicate. We do not attempt to estimate the 
joint selectivity of
             :    * multiple join predicates to avoid underestimation.
> Much better. Thanks
Done


PS2, Line 280: fkPkCandidate.get(0)
> Can you plz explain this? I could be wrong about this but it seems that the
Might have misunderstood your comment, but I'll give it a shot.

In L277 we iterate over all groups of conjuncts that belong to the same joined 
tuple id pair. For each group, we compute the join NDV of the rhs slots and 
compare it to the number of rows in the rhs table.

I didn't follow your comment about processing order. I don't think the 
processing order matters for what we want to check.

Are you asking why it's ok to get(0)? All entries belong to the same tuple id 
pair, so the num rows is the same for all of them.

Added a comment, but not sure if it addresses your concern.


PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0);
> Are we certain this is a valid precondition check? Why isn't a 0 rhsCard po
Maybe you misread the code: It's greater or equal to zero, so zero is valid.


PS2, Line 337: we adjustments
> nit: grammar
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to