Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12974 )

Change subject: IMPALA-7608: Estimate row count from file size when no stats 
available
......................................................................


Patch Set 10:

(6 comments)

Mainly comments about comments and some cleanup.

One administrative thing - it would have been a little easier to review PS10 if 
you did the rebase in a separate patchset. The diff from PS9->PS10 was noisy 
because of unrelated changes in query-options.cc picked up from the rebase.

http://gerrit.cloudera.org:8080/#/c/12974/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12974/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@179
PS10, Line 179:   private static double ESTIMATED_COMPRESSION_FACTOR_LEGACY = 
3.58;// to change
Can you remove these "to change" comments? I don't think they help much.


http://gerrit.cloudera.org:8080/#/c/12974/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12974/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@225
PS10, Line 225:   // functional.alltypesmixedformat is a table of 4 partitions,
I have some nits about these test comments. I think they should be javadoc 
comments, just for consistency. The text is also wrapped at < 90 lines - in 
some cases the comment would fit on fewer lines.


http://gerrit.cloudera.org:8080/#/c/12974/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@243
PS10, Line 243:   // True cardinality of tpch_text_gzip.lineitem is 6,001,215.
Thanks for these comments about the cardinality, this is actually really 
helpful to understand the test.


http://gerrit.cloudera.org:8080/#/c/12974/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@516
PS10, Line 516:     // Estimated cardinality of the NestedLoopJoinNode is 
550,564 = 742 * 742.
I guess we're not so good at estimating cardinality of tiny parquet files 
because of the footer?


http://gerrit.cloudera.org:8080/#/c/12974/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@564
PS10, Line 564:   //TODO: It seems that the cardinality of the SelectNode 
should be 1 instead
Nice catch. Can you file a JIRA for this and mention in TODO? This is much 
better for tracking purposes - bugs that are only tracked by TODOs in the code 
tend to be forgotten easily.

And consider fixing it in a separate commit. It seems like an oversight, I 
think this code should probably have a max(1, ...) to avoid setting it to 0.

      cardinality_ =
          Math.round(((double) getChild(0).cardinality_) * 
computeSelectivity());
      Preconditions.checkState(cardinality_ >= 0);


http://gerrit.cloudera.org:8080/#/c/12974/10/testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle-hdfs-num-rows-est-disabled.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle-hdfs-num-rows-est-disabled.test:

PS10:
I think you can revert this file to its original name (same for the other 
similar cases where gerrit shows a rename).

Encoding all of the options in the file name isn't really scalable, so I'd 
prefer default-join-distr-mode-shuffle.test  for conciseness.



--
To view, visit http://gerrit.cloudera.org:8080/12974
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a
Gerrit-Change-Number: 12974
Gerrit-PatchSet: 10
Gerrit-Owner: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Mon, 10 Jun 2019 18:07:43 +0000
Gerrit-HasComments: Yes

Reply via email to