Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12136 )

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
......................................................................


Patch Set 13:

(7 comments)

Addressed review comments, rebased on latest master, and fixed a potentially 
unstable test.

Will attempt to remove the range checks for cardinality. Please hold off on 
another review until we see the results.

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@100
PS12, Line 100:    * Return a list of partitions left after applying the 
conjuncts. Please note
> Needs an update to explain the Pair<>
Done


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1230
PS12, Line 1230:       if (partitionConjuncts_ != null && 
!partitionConjuncts_.isEmpty()) {
> Looks like Greg had a different opinion on the jira when I mentioned separa
Agreed. I added this partly because of Greg's request (and your response), but 
also because I found it very helpful to understand the cardinality in a plan. 
We have a table with 11K rows. The scan node had no (visible) predicates, yet 
claimed to return 5K rows. The "5/11 partitions" should have clued me in. Had 
to debug the code to see why that was the case. Once these partition predicates 
appeared, however, it was obvious what was happening.

So, is that a good reason to include this info in the EXPLAIN output? IMHO it 
is, what do you think?

Discovered one reason that we did not include them: we had some partitioning 
tests that use the system time for a partitioning value. As a result, the 
predicate changes on each run and it is hard to verify the test results. Simply 
changed it to use a constant date/time to work around that issue.


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1283
PS12, Line 1283:     Pair<List<? extends FeFsPartition>, List<Expr>> pair =
> line too long (110 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1331
PS12, Line 1331: scanNode.init(analyzer);
> nit: Can we force this in the c'tor above? Setting it separately seems weir
Done


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@541
PS12, Line 541:       if 
(!testOptions.contains(PlannerTestOption.VALIDATE_CARDINALITY)) {
> Did you run into some cases where this was required to make tests pass? Or
Mostly it is to reduce redundant output. By definition, an exchange just passes 
along the rows from its input. So, repeating the row width and cardinality for 
an Exchange adds no information.

Added a comment to explain this thought.


http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@302
PS4, Line 302:         }
> Just for reference, notice that the old code consumed three two per pass: o
Argh.. That should be "..two tokens per pass..."


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@207
PS12, Line 207:  If found, then parse the values and compare them. Report a 
match
              :    * if the values are within 5%, else report a mismatch.
> Yeah I'm not sure if we expect the cardinality estimates to be exactly repr
On my first attempt, I enabled cardinality checks for all tests. I saw some 
variation. Then, when running on the Jenkins server, I found that the metadata 
actually differs from what we have on dev machines, so I disabled cardinality 
checks by default, turning them on only for selected tests where they add value.

So, with that revision, it may be that I can remove this code. I'll commit the 
other review comments and do a commit. Then, I'll remove this code, do another 
commit, and do a pre-review test run. If it all passes, then we'll know that 
this check is redundant and can stay removed. Else, I'll back out the change 
and have some concrete evidence about the problems we need to solve.



--
To view, visit http://gerrit.cloudera.org:8080/12136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Gerrit-Change-Number: 12136
Gerrit-PatchSet: 13
Gerrit-Owner: Paul Rogers <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:01:10 +0000
Gerrit-HasComments: Yes

Reply via email to