[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

Dimitris Tsirogiannis (Code Review) Fri, 12 May 2017 17:22:40 -0700

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
......................................................................



Patch Set 1:

(11 comments)

First pass on main classes. I haven't looked at the tests yet.

http://gerrit.cloudera.org:8080/#/c/6840/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

PS1, Line 137: tables
What does this mean? I thought we only extrapolate row count of partitions.


http://gerrit.cloudera.org:8080/#/c/6840/1/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

PS1, Line 36: DECLARE_bool
nit: they seem to be grouped by type


http://gerrit.cloudera.org:8080/#/c/6840/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS1, Line 140: HDFS
How about S3? Does it make sense to rename the total_hdfs_bytes into 
total_bytes?


http://gerrit.cloudera.org:8080/#/c/6840/1/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS1, Line 493: !is_incremental
nit: blending code in comments :) "is_incremental is false"


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

PS1, Line 587: hdfsTable
nit: inline


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 185: private long totalHdfsBytes_;
Do we still need this?


PS1, Line 1328: tableStats_.num_rows
getNumRows()?


PS1, Line 1912: getExtrapolatedNumRows(p.getSize())
Why do we need to compute this if we already have p.getNumRows() and is >= 1?


PS1, Line 1958: tableStats_.num_rows
getNumRows()?


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS1, Line 76: // estimated number of rows in table; -1: unknown.
update comment


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS1, Line 616: computeStats(Analyzer analyzer) {
The logic in this function is now very hard to follow. It's not clear when the 
extrapolation is used vs the exact row counts. I think the problem is that this 
function blends the computation of the cardinality (and other stats) with the 
logic of using exact vs extrapolated counts.


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

Reply via email to