Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. ......................................................................
Patch Set 1: (11 comments) First pass on main classes. I haven't looked at the tests yet. http://gerrit.cloudera.org:8080/#/c/6840/1/be/src/common/global-flags.cc File be/src/common/global-flags.cc: PS1, Line 137: tables What does this mean? I thought we only extrapolate row count of partitions. http://gerrit.cloudera.org:8080/#/c/6840/1/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: PS1, Line 36: DECLARE_bool nit: they seem to be grouped by type http://gerrit.cloudera.org:8080/#/c/6840/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS1, Line 140: HDFS How about S3? Does it make sense to rename the total_hdfs_bytes into total_bytes? http://gerrit.cloudera.org:8080/#/c/6840/1/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS1, Line 493: !is_incremental nit: blending code in comments :) "is_incremental is false" http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: PS1, Line 587: hdfsTable nit: inline http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS1, Line 185: private long totalHdfsBytes_; Do we still need this? PS1, Line 1328: tableStats_.num_rows getNumRows()? PS1, Line 1912: getExtrapolatedNumRows(p.getSize()) Why do we need to compute this if we already have p.getNumRows() and is >= 1? PS1, Line 1958: tableStats_.num_rows getNumRows()? http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS1, Line 76: // estimated number of rows in table; -1: unknown. update comment http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS1, Line 616: computeStats(Analyzer analyzer) { The logic in this function is now very hard to follow. It's not clear when the extrapolation is used vs the exact row counts. I think the problem is that this function blends the computation of the cardinality (and other stats) with the logic of using exact vs extrapolated counts. -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-HasComments: Yes
