Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12136 )
Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output ...................................................................... Patch Set 13: (7 comments) Addressed review comments, rebased on latest master, and fixed a potentially unstable test. Will attempt to remove the range checks for cardinality. Please hold off on another review until we see the results. http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@100 PS12, Line 100: * Return a list of partitions left after applying the conjuncts. Please note > Needs an update to explain the Pair<> Done http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1230 PS12, Line 1230: if (partitionConjuncts_ != null && !partitionConjuncts_.isEmpty()) { > Looks like Greg had a different opinion on the jira when I mentioned separa Agreed. I added this partly because of Greg's request (and your response), but also because I found it very helpful to understand the cardinality in a plan. We have a table with 11K rows. The scan node had no (visible) predicates, yet claimed to return 5K rows. The "5/11 partitions" should have clued me in. Had to debug the code to see why that was the case. Once these partition predicates appeared, however, it was obvious what was happening. So, is that a good reason to include this info in the EXPLAIN output? IMHO it is, what do you think? Discovered one reason that we did not include them: we had some partitioning tests that use the system time for a partitioning value. As a result, the predicate changes on each run and it is hard to verify the test results. Simply changed it to use a constant date/time to work around that issue. http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1283 PS12, Line 1283: Pair<List<? extends FeFsPartition>, List<Expr>> pair = > line too long (110 > 90) Done http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1331 PS12, Line 1331: scanNode.init(analyzer); > nit: Can we force this in the c'tor above? Setting it separately seems weir Done http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@541 PS12, Line 541: if (!testOptions.contains(PlannerTestOption.VALIDATE_CARDINALITY)) { > Did you run into some cases where this was required to make tests pass? Or Mostly it is to reduce redundant output. By definition, an exchange just passes along the rows from its input. So, repeating the row width and cardinality for an Exchange adds no information. Added a comment to explain this thought. http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java File fe/src/test/java/org/apache/impala/testutil/TestUtils.java: http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@302 PS4, Line 302: } > Just for reference, notice that the old code consumed three two per pass: o Argh.. That should be "..two tokens per pass..." http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java File fe/src/test/java/org/apache/impala/testutil/TestUtils.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@207 PS12, Line 207: If found, then parse the values and compare them. Report a match : * if the values are within 5%, else report a mismatch. > Yeah I'm not sure if we expect the cardinality estimates to be exactly repr On my first attempt, I enabled cardinality checks for all tests. I saw some variation. Then, when running on the Jenkins server, I found that the metadata actually differs from what we have on dev machines, so I disabled cardinality checks by default, turning them on only for selected tests where they add value. So, with that revision, it may be that I can remove this code. I'll commit the other review comments and do a commit. Then, I'll remove this code, do another commit, and do a pre-review test run. If it all passes, then we'll know that this check is redundant and can stay removed. Else, I'll back out the change and have some concrete evidence about the problems we need to solve. -- To view, visit http://gerrit.cloudera.org:8080/12136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Gerrit-Change-Number: 12136 Gerrit-PatchSet: 13 Gerrit-Owner: Paul Rogers <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 08 Jan 2019 02:01:10 +0000 Gerrit-HasComments: Yes
