Alex Behm has posted comments on this change. Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. ......................................................................
Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS2, Line 77: TTableStats tableStats_ > nit: a bit safer to initialize this here and set now rows to -1. Created in c'tor. There is no constructor that allows setting both values to -1. And only setting one seems more dangerous (bytes would be 0). PS2, Line 196: /** : * Returns the value of the ROW_COUNT constant, or -1 if not found. : */ > Remove, not needed. Happy to remove, but seems non-obvious that -1 represents "not found"? Sure we should remove? http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS2, Line 128: Input cardinality based on the partition row counts and based on extrapolation : // using the table-level row count and raw data size. -1 if invalid. > Remove the part that explains where this is derived from; doesn't really he Done PS2, Line 664: Chose > Choose (typo) Done PS2, Line 664: Chose between the extrapolated row count and the one based on stored stats. > I think that requires some further explanation. Something along the lines o I like the function idea. Moved into separate function. Function comment should address your concern regarding the comment. PS2, Line 672: if (!partitions_.isEmpty() && numPartitionsMissingStats_ == partitions_.size() : && extrapolatedNumRows_ == -1) { : // if none of the partitions knew its number of rows, and extrapolation was : // not possible, we fall back on the table stats : cardinality_ = tbl_.getNumRows(); : } > I think it might help readability if you put this inside the if block in L6 Done. Added tests for this cardinality behavior. http://gerrit.cloudera.org:8080/#/c/6840/2/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: PS2, Line 268: 61 > Is this ok that this changed? This is actually the same value as in master. Our planner test infra has a bug that causes row-size to incorrectly be ignored in the comparison, so this file was never updated. I filed IMPALA-5341 -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-HasComments: Yes
