Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11193 )
Change subject: IMPALA-7425: Change incremental stats to pull from catalogd. ...................................................................... Patch Set 4: (14 comments) We probably need to sync up on the conflicts with IMPALA-7424 patch. http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog-server.cc@149 PS4, Line 149: ; Can you log the request too? Contains table info etc http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h@116 PS4, Line 116: Get partition statistics for the partitions specified in : /// TGetPartitionStatsRequest. nit: single line? http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc@210 PS4, Line 210: If used, the flag must " : "be set on both catalogd and all impalad coordinators."); > Do we have any way to emit warnings if this setting ends up inconsistent in How about we call it "pull_incremental_statistics_ondemand" or some such thing? http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@560 PS4, Line 560: status.AddDetail("Error making an RPC call to Catalog server."); Is this needed? I think DoRPC has a reasonable status error message if it fails http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@561 PS4, Line 561: ; return early? http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@380 PS4, Line 380: // identified by their id. The caller's table_version_number is used by the catalog > per offline chat earlier, I think it makes sense to just always fetch all p Yep, it'd be nice to just fetch the aggregate. AFAICT we just need the aggregate. http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@381 PS4, Line 381: if the caller and callee versions are the same. This could probably be annoying to the users. Commands like "refresh tbl partition p" also update the version. If we end up taking this path, we need to make sure to document this (behavioral regression) http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@385 PS4, Line 385: // The CatalogService service ID this request came from. > but the request came from an impalad.. maybe clearer to be a little more sp +1 http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@447 PS4, Line 447: // Get incremental statistics from all relevant partitions. I think we need more counters/timeline stuff around these RPCs in the incremental stats query timeline (in runtime profiles). Also may be add a TODO that they can be done in parallel with the incremental stats child queries. http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@453 PS4, Line 453: partStats != null && partStats.isSetIntermediate_col_stats() : && !partStats.intermediate_col_stats.isEmpty() I think my patch simplifies this http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@422 PS4, Line 422: / Table is of the wrong type. : if (!(table instanceof FeFsTable)) { : throw new CatalogException("Partition statistics can only be requested for FS" : + " tables, type is: " + table.getClass().getCanonicalName()); : } Can we make this a preconditions check? We do such checks in ComputeStatsStmt analysis http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524 PS4, Line 524: // When incremental statistics exceed a size threshold or when they are pulled : // directly from catalogd by impalads, incremental statistics will not be present in : // the impalad's representation of the partition. However, its useful to know whether : // incremental statistics are present at catalogd. Catalogd marks whether incremental : // statistics are stored, which is recorded in this member in the impalad partition : // representation. : private boolean incrementalStatsAtCatalog_ = false; My patch has similar logic around this. We should coordinate as which one should be merged first so that other patch should be rebased accordingly. https://gerrit.cloudera.org/#/c/11341/ http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1768 PS4, Line 1768: (statsSizeEstimate < BackendConfig.INSTANCE.getIncStatsMaxSize() : && !BackendConfig.INSTANCE.pullIncrementalStatistics()); can we factor this out into a method for readability? http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java File fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java@211 PS4, Line 211: N/A Add a clear description? -- To view, visit http://gerrit.cloudera.org:8080/11193 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Gerrit-Change-Number: 11193 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Tue, 28 Aug 2018 05:58:35 +0000 Gerrit-HasComments: Yes
