Alex Behm has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS2, Line 77: TTableStats tableStats_
> nit: a bit safer to initialize this here and set now rows to -1.
Created in c'tor. There is no constructor that allows setting both values to 
-1. And only setting one seems more dangerous (bytes would be 0).


PS2, Line 196: /**
             :    * Returns the value of the ROW_COUNT constant, or -1 if not 
found.
             :    */
> Remove, not needed.
Happy to remove, but seems non-obvious that -1 represents "not found"? Sure we 
should remove?


http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS2, Line 128: Input cardinality based on the partition row counts and based on 
extrapolation
             :   // using the table-level row count and raw data size. -1 if 
invalid.
> Remove the part that explains where this is derived from; doesn't really he
Done


PS2, Line 664: Chose
> Choose (typo)
Done


PS2, Line 664:  Chose between the extrapolated row count and the one based on 
stored stats.
> I think that requires some further explanation. Something along the lines o
I like the function idea. Moved into separate function. Function comment should 
address your concern regarding the comment.


PS2, Line 672: if (!partitions_.isEmpty() && numPartitionsMissingStats_ == 
partitions_.size()
             :         && extrapolatedNumRows_ == -1) {
             :       // if none of the partitions knew its number of rows, and 
extrapolation was
             :       // not possible, we fall back on the table stats
             :       cardinality_ = tbl_.getNumRows();
             :     }
> I think it might help readability if you put this inside the if block in L6
Done. Added tests for this cardinality behavior.


http://gerrit.cloudera.org:8080/#/c/6840/2/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

PS2, Line 268: 61
> Is this ok that this changed?
This is actually the same value as in master. Our planner test infra has a bug 
that causes row-size to incorrectly be ignored in the comparison, so this file 
was never updated.
I filed IMPALA-5341


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-HasComments: Yes

Reply via email to