Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. ......................................................................
Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS2, Line 77: TTableStats tableStats_ nit: a bit safer to initialize this here and set now rows to -1. PS2, Line 196: /** : * Returns the value of the ROW_COUNT constant, or -1 if not found. : */ Remove, not needed. http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS2, Line 128: Input cardinality based on the partition row counts and based on extrapolation : // using the table-level row count and raw data size. -1 if invalid. Remove the part that explains where this is derived from; doesn't really help. Simply state what these two variables represent. PS2, Line 664: Chose Choose (typo) PS2, Line 664: Chose between the extrapolated row count and the one based on stored stats. I think that requires some further explanation. Something along the lines of "if table stats (row count, bytes) are available, use that information to estimate cardinality using extrapolation, otherwise use the exact row counts from table/partition stats". Also, I would put L664-705 in a separate function responsible solely for estimating and setting cardinality, but feel free to ignore this if you don't like it. PS2, Line 672: if (!partitions_.isEmpty() && numPartitionsMissingStats_ == partitions_.size() : && extrapolatedNumRows_ == -1) { : // if none of the partitions knew its number of rows, and extrapolation was : // not possible, we fall back on the table stats : cardinality_ = tbl_.getNumRows(); : } I think it might help readability if you put this inside the if block in L666-668. Also, do we have a test for this case? http://gerrit.cloudera.org:8080/#/c/6840/2/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: PS2, Line 268: 61 Is this ok that this changed? -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-HasComments: Yes
