Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11193 )

Change subject: IMPALA-7425: Change incremental stats to pull from catalogd.
......................................................................


Patch Set 9:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc@61
PS9, Line 61:       {"getDbs", "([B)[B", &get_dbs_id_}, {"getFunctions", 
"([B)[B", &get_functions_id_},
nit:looks like your editor still reformatted this list, might cause conflicts


http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@207
PS9, Line 207: an
nit typo


http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@211
PS9, Line 211: DEFINE_bool(pull_incremental_statistics, false,
should we consider changing this to true if we're pretty confident it works? 
(and make the flag a 'hidden' flag if we're only putting it in for "just in 
case" rollback capability). Making it hidden would make it easier to remove the 
flag later so we can remove the old code path.


http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc@627
PS9, Line 627:       const_cast<char*>("NativeGetPartialCatalogObject"), 
const_cast<char*>("([B)[B"),
same as elsewhere, your editor is reformat-happy


http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@382
PS9, Line 382:   // The CatalogService service ID this request came from.
hm, earlier comment still relevant - request didn't come from a catalog service


http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@389
PS9, Line 389: // statistic is returned. Partitions are identified by their 
partitioning column
Curious why you chose to identify them by list<string> instead of just their 
string names? I think given thrift's encoding this will be much larger on the 
wire


http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@396
PS9, Line 396:   2: optional map<list<string>, CatalogObjects.TPartitionStats> 
partition_stats
will be interesting to see how this interacts with Bharath's change


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@483
PS9, Line 483: NULL
not your change, but it's odd to me that we use NULL here whereas the default 
null partition key value is something liek __NULL__. Would we have problems 
computing stats on a table which actually has as string 'NULL' as a partition 
value?


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@618
PS9, Line 618: partitionIds
if you just passed in the List<FeFsPartition> here, would that work? You no 
longer need to use their IDs because you're fetching everything from catalogd.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@661
PS9, Line 661:           Preconditions.checkArgument(part instanceof 
FeFsPartition);
maybe irrelevant if you make the above change to just pass through the 
List<FeFsPartition>, but I think this downcast is incorrect in the case of 
LocalCatalog, where the PrunablePartition doesn't downcast directly to 
FeFsPartition. You have to "load" the partition.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@665
PS9, Line 665: remoteStats.isSetIntermediate_col_stats
the doc on the thrift RPC says that it would return a "default stats object" 
for partitions with no stats.. what's the point of doing that if you ignore 
such default ones?


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@670
PS9, Line 670:       } else {
is it possible to invert this condition so you have the "return empty" case up 
above, and you can un-indent all the code above?


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
File fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java@95
PS9, Line 95:    *  a list of partitioning key values as strings.
nit: extra spacing after *s


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@423
PS9, Line 423:  +
this should be a comma, no?


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@428
PS9, Line 428:     FeFsTable fsTable = (FeFsTable) table;
since this runs on the catalog, you could avoid some of the downcasts below and 
cast this to 'HdfsTable' instead of the FeFsTable interface.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2079
PS9, Line 2079:         Preconditions.checkState(p instanceof HdfsPartition);
maybe this check should actually be an && above? ie if it's LocalCatalog, then 
hasIncrementalStats() already should represent the 'remote' state (ie that the 
stats are available). Or, we could add some validation to ensure that you dont 
set both --use_local_catalog and --pull_incremental_statistics at startup (I 
see you have such a check in the RPC)


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@330
PS9, Line 330: GetPartitionStats
at INFO level I think we should make this more user-consumable and not 
reference internal function names. Maybe just say "Fetching partition stats 
from catalogd" or something?


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@356
PS9, Line 356: %s partitions
stats for %s partitions



--
To view, visit http://gerrit.cloudera.org:8080/11193
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Gerrit-Change-Number: 11193
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 31 Aug 2018 05:07:56 +0000
Gerrit-HasComments: Yes

Reply via email to