Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11193 )
Change subject: IMPALA-7425: Change incremental stats to pull from catalogd. ...................................................................... Patch Set 9: (18 comments) http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc@61 PS9, Line 61: {"getDbs", "([B)[B", &get_dbs_id_}, {"getFunctions", "([B)[B", &get_functions_id_}, > nit:looks like your editor still reformatted this list, might cause conflic Done. that was clang. http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@207 PS9, Line 207: an > nit typo Done http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@211 PS9, Line 211: DEFINE_bool(pull_incremental_statistics, false, > should we consider changing this to true if we're pretty confident it works So by default, we'd get this feature and compressed stats. then to try v2, one would restart an impalad with --use_local_catalog and flip the hidden flag for this to false? The other consideration I had was for backports. I would think the user would want to explicitly enable it. I am fine either way-- let me know if you were thinking of other cases. http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc@627 PS9, Line 627: const_cast<char*>("NativeGetPartialCatalogObject"), const_cast<char*>("([B)[B"), > same as elsewhere, your editor is reformat-happy Done. http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@382 PS9, Line 382: // The CatalogService service ID this request came from. > hm, earlier comment still relevant - request didn't come from a catalog ser missed this. agreed, can remove this when fetching all. the other place I saw this guard used was for making multiple requests to the catalogd (multiple prioritized loads). however, in this case, its a single request. http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@389 PS9, Line 389: // statistic is returned. Partitions are identified by their partitioning column > Curious why you chose to identify them by list<string> instead of just thei Which string name are you referring to? For example FeCatalogUtils.getPartitionName repeats the schema for each partition, so I'd expect that to be larger, right? Either way, the key size should be dominated by the stats for the common case. I've updated this to be a single string, so I can tune it some more. For example I can concat the values and factor out the schema (or just leave the schema out). http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@396 PS9, Line 396: 2: optional map<list<string>, CatalogObjects.TPartitionStats> partition_stats > will be interesting to see how this interacts with Bharath's change an option here is to send it over compressed. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@483 PS9, Line 483: NULL > not your change, but it's odd to me that we use NULL here whereas the defau good point. added a todo to look into this. I expect this would have come up by now. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@618 PS9, Line 618: partitionIds > if you just passed in the List<FeFsPartition> here, would that work? You no good call! simplified nicely. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@661 PS9, Line 661: Preconditions.checkArgument(part instanceof FeFsPartition); > maybe irrelevant if you make the above change to just pass through the List Done http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@665 PS9, Line 665: remoteStats.isSetIntermediate_col_stats > the doc on the thrift RPC says that it would return a "default stats object yup, this was left-over from the prev. version. updated the comment and the catalogd side. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@670 PS9, Line 670: } else { > is it possible to invert this condition so you have the "return empty" case Done http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java File fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java@95 PS9, Line 95: * a list of partitioning key values as strings. > nit: extra spacing after *s Done http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@423 PS9, Line 423: + > this should be a comma, no? Done http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@428 PS9, Line 428: FeFsTable fsTable = (FeFsTable) table; > since this runs on the catalog, you could avoid some of the downcasts below removed some of these downcasts. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2079 PS9, Line 2079: Preconditions.checkState(p instanceof HdfsPartition); > maybe this check should actually be an && above? ie if it's LocalCatalog, t I expect this may change a little when merging, but yeah, makes sense to pull this to the if clause. http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@330 PS9, Line 330: GetPartitionStats > at INFO level I think we should make this more user-consumable and not refe Done http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@356 PS9, Line 356: %s partitions > stats for %s partitions Done -- To view, visit http://gerrit.cloudera.org:8080/11193 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Gerrit-Change-Number: 11193 Gerrit-PatchSet: 9 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 31 Aug 2018 08:50:03 +0000 Gerrit-HasComments: Yes
