Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 )
Change subject: IMPALA-10681: Improve join cardinality estimates ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@38 PS3, Line 38: TODO: We would want to run a performance test to validate : the plan changes for TPC-DS at a sufficiently high scale factor. Just out of curiosity I did a single node perf run on TPCDS with scale 10 bin/single_node_perf_run.py --scale 10 --iterations 10 --table_formats parquet/snap --workloads tpcds <git hash> And looked at the changed queries: Q4, Q11: no real change Q5: ~8% improvement Q54: ~5% improvement Q71: ~15% improvement Q74: ~10% improvement But of course, it'd be more interesting to see the results at higher scale on a real cluster. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@1859 PS3, Line 1859: public Expr getSlotDescFirstSourceExpr() { nit: could you please add a brief comment for this method? http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@306 PS3, Line 306: getOtherJoinCardinality nit: could you mention this calculation in the doc comment of this method? http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@447 PS3, Line 447: after IMPALA-7310 is fixed I think this part is fixed by https://gerrit.cloudera.org/#/c/16349/ http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@507 PS3, Line 507: return new EqJoinConjunctScanSlots(eqJoinConjunct, lhsScanSlot, rhsScanSlot); : } else { nit: Since we return in the 'true' branch, we don't need to nest the statements in an 'else' block. I.e. to reduce nesting, we can just have: if (hasLhs && hasRhs) { return new EqJoinConjunctScanSlots(eqJoinConjunct, lhsScanSlot, rhsScanSlot); } Expr lhsExpr = eqJoinConjunct.getChild(0); ... http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@591 PS3, Line 591: public static final class NdvAndRowCountStats { nit: Could you please add a brief comment? -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 19 May 2021 15:29:11 +0000 Gerrit-HasComments: Yes
