Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11193 )
Change subject: IMPALA-7425: Change incremental stats to pull from catalogd. ...................................................................... Patch Set 4: (30 comments) made my way through most of these. there are several smallish changes in catalogservicecatalog and two larger changes coming (fetch all, simplify the flag). this will also need a sync w/ bharath's change. http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h@116 PS4, Line 116: Get partition statistics for the partitions specified in : /// TGetPartitionStatsRequest. > nit: single line? Done http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc@210 PS4, Line 210: If used, the flag must " : "be set on both catalogd and all impalad coordinators."); > How about we call it "pull_incremental_statistics_ondemand" or some such th There's some redundancy there so I'm leaning towards keeping it simpler. http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc@210 PS4, Line 210: If used, the flag must " : "be set on both catalogd and all impalad coordinators."); > Do we have any way to emit warnings if this setting ends up inconsistent in Since this is for v1 metadata, I think it would be even cleaner if it was specified just at the catalogd and broadcast to coordinators as a TCatalog property. It can be thought of as a catalogd property, so if its present, the impalad must pull. I'll change it and see how it looks. With regards to flag consistency, there are a number of other examples here that can lead to inconsistencies (and some where its unclear where the flag should go). Probably worth either checking/handling uniformly and/or deprecating. http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@549 PS4, Line 549: extern "C" JNIEXPORT jbyteArray JNICALL > side node gripe: we should figure out some way to refactor all of these alm yes! http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@560 PS4, Line 560: status.AddDetail("Error making an RPC call to Catalog server."); > Is this needed? I think DoRPC has a reasonable status error message if it f indeed, good point. added a todo to get rid of it for prioritized load as well. http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@561 PS4, Line 561: ; > return early? no, we're adding the status here, which is part of the returned result, then serializing the result on L564. http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@606 PS4, Line 606: {const_cast<char*>("NativeFeTestInit"), const_cast<char*>("()V"), > nit: the reformatting of all the unrelated code is going to make this a pai yeah, this was clang. will reduce it. http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@380 PS4, Line 380: // identified by their id. The caller's table_version_number is used by the catalog > Yep, it'd be nice to just fetch the aggregate. AFAICT we just need the aggr several things here :-) There is a partition spec for compute incremental stats. The partitions *not* in this spec are the ones whose stats we're fetching and for the rest, we're computing them. I'd expect that we'll fetch most and compute few, but one can misuse the spec or it could be the first time incr stats is run. If that's the case, then we'll spend more time computing anyways, so perhaps less of an overall issue if we fetch more than needed. In either case, yes, it'll simplify the request if we fetch all, and I can get rid of the versioning stuff below. http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@381 PS4, Line 381: if the caller and callee versions are the same. > This could probably be annoying to the users. Commands like "refresh tbl pa see replies to first comment. http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@385 PS4, Line 385: // The CatalogService service ID this request came from. > +1 the replies to first comment. http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@a513 PS4, Line 513: > nit: some unrelated removal of blank lines might make merging conflicts har Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@449 PS4, Line 449: new HashSet<Long>() > use Collections.emptySet() here to emphasize that it's only an in-parameter Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452 PS4, Line 452: partitionStats != null > why not make partitionStats always non-null and just return an empty map in Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@453 PS4, Line 453: partStats != null && partStats.isSetIntermediate_col_stats() : && !partStats.intermediate_col_stats.isEmpty() > I think my patch simplifies this ok, will look out for it (or the merge will replace it) http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@497 PS4, Line 497: relevant > should this be _other_ partitions? ie we need to fetch stats from the ones Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@595 PS4, Line 595: getPartitionStats > maybe rename to "getOrFetchPartitionStats" to emphasize that this one is th renamed. there's a reference to table_. added a todo to pull this out to make these static. http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@600 PS4, Line 600: maxNumStats > this is more of an "expected" than actually a hard max, right? Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@601 PS4, Line 601: checkState > if this precondition fails, it's actually a bad argument, not a bad state, Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@634 PS4, Line 634: ps > I think if you return an empty map here it'll avoid some null checks up abo Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@644 PS4, Line 644: Exception > this will also catch the above AnalysisException, resulting in a doubly-wra went with the propagate. http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@645 PS4, Line 645: : > nit: no need for the ':' in the exception message itself. When printed it w removed. http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@392 PS4, Line 392: catalog_service_id != catalogServiceId_ > this should be a .equals() rather than reference-equals, no? Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@393 PS4, Line 393: () > %s not () whoops, forgot to fill these in. http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1768 PS4, Line 1768: (statsSizeEstimate < BackendConfig.INSTANCE.getIncStatsMaxSize() : && !BackendConfig.INSTANCE.pullIncrementalStatistics()); > can we factor this out into a method for readability? Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java File fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java@211 PS4, Line 211: N/A > Add a clear description? Done. its a config issue if we get here so let me know if the update is what you were thinking here. http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@333 PS4, Line 333: LOG.info( : String.format > nit: weird formatting Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java@276 PS4, Line 276: Lists.<String>newArrayList > nit: Collections.emptyList Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java@279 PS4, Line 279: Lists.newArrayList > ImmutableList.of(e.getMessage()) Done http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java@280 PS4, Line 280: throw e; > don't you _not_ want to re-throw, so you can serialize it back into the res yup, this was off. http://gerrit.cloudera.org:8080/#/c/11193/4/tests/custom_cluster/test_pull_stats.py File tests/custom_cluster/test_pull_stats.py: http://gerrit.cloudera.org:8080/#/c/11193/4/tests/custom_cluster/test_pull_stats.py@30 PS4, Line 30: @CustomClusterTestSuite.with_args("--pull_incremental_statistics=true", > this would read better if you also use the impalad_args= kwarg even though Done -- To view, visit http://gerrit.cloudera.org:8080/11193 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Gerrit-Change-Number: 11193 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Tue, 28 Aug 2018 08:23:07 +0000 Gerrit-HasComments: Yes
