Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11193 )
Change subject: IMPALA-7425: Change incremental stats to pull from catalogd. ...................................................................... Patch Set 9: (18 comments) http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc@61 PS9, Line 61: {"getDbs", "([B)[B", &get_dbs_id_}, {"getFunctions", "([B)[B", &get_functions_id_}, nit:looks like your editor still reformatted this list, might cause conflicts http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@207 PS9, Line 207: an nit typo http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@211 PS9, Line 211: DEFINE_bool(pull_incremental_statistics, false, should we consider changing this to true if we're pretty confident it works? (and make the flag a 'hidden' flag if we're only putting it in for "just in case" rollback capability). Making it hidden would make it easier to remove the flag later so we can remove the old code path. http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc@627 PS9, Line 627: const_cast<char*>("NativeGetPartialCatalogObject"), const_cast<char*>("([B)[B"), same as elsewhere, your editor is reformat-happy http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@382 PS9, Line 382: // The CatalogService service ID this request came from. hm, earlier comment still relevant - request didn't come from a catalog service http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@389 PS9, Line 389: // statistic is returned. Partitions are identified by their partitioning column Curious why you chose to identify them by list<string> instead of just their string names? I think given thrift's encoding this will be much larger on the wire http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@396 PS9, Line 396: 2: optional map<list<string>, CatalogObjects.TPartitionStats> partition_stats will be interesting to see how this interacts with Bharath's change http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@483 PS9, Line 483: NULL not your change, but it's odd to me that we use NULL here whereas the default null partition key value is something liek __NULL__. Would we have problems computing stats on a table which actually has as string 'NULL' as a partition value? http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@618 PS9, Line 618: partitionIds if you just passed in the List<FeFsPartition> here, would that work? You no longer need to use their IDs because you're fetching everything from catalogd. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@661 PS9, Line 661: Preconditions.checkArgument(part instanceof FeFsPartition); maybe irrelevant if you make the above change to just pass through the List<FeFsPartition>, but I think this downcast is incorrect in the case of LocalCatalog, where the PrunablePartition doesn't downcast directly to FeFsPartition. You have to "load" the partition. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@665 PS9, Line 665: remoteStats.isSetIntermediate_col_stats the doc on the thrift RPC says that it would return a "default stats object" for partitions with no stats.. what's the point of doing that if you ignore such default ones? http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@670 PS9, Line 670: } else { is it possible to invert this condition so you have the "return empty" case up above, and you can un-indent all the code above? http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java File fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java@95 PS9, Line 95: * a list of partitioning key values as strings. nit: extra spacing after *s http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@423 PS9, Line 423: + this should be a comma, no? http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@428 PS9, Line 428: FeFsTable fsTable = (FeFsTable) table; since this runs on the catalog, you could avoid some of the downcasts below and cast this to 'HdfsTable' instead of the FeFsTable interface. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2079 PS9, Line 2079: Preconditions.checkState(p instanceof HdfsPartition); maybe this check should actually be an && above? ie if it's LocalCatalog, then hasIncrementalStats() already should represent the 'remote' state (ie that the stats are available). Or, we could add some validation to ensure that you dont set both --use_local_catalog and --pull_incremental_statistics at startup (I see you have such a check in the RPC) http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@330 PS9, Line 330: GetPartitionStats at INFO level I think we should make this more user-consumable and not reference internal function names. Maybe just say "Fetching partition stats from catalogd" or something? http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@356 PS9, Line 356: %s partitions stats for %s partitions -- To view, visit http://gerrit.cloudera.org:8080/11193 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Gerrit-Change-Number: 11193 Gerrit-PatchSet: 9 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 31 Aug 2018 05:07:56 +0000 Gerrit-HasComments: Yes
