Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11193 )

Change subject: IMPALA-7425: Change incremental stats to pull from catalogd.
......................................................................


Patch Set 4:

(14 comments)

We probably need to sync up on the conflicts with IMPALA-7424 patch.

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog-server.cc@149
PS4, Line 149: ;
Can you log the request too? Contains table info etc


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h@116
PS4, Line 116: Get partition statistics for the partitions specified in
             :   /// TGetPartitionStatsRequest.
nit: single line?


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc@210
PS4, Line 210: If used, the flag must "
             :     "be set on both catalogd and all impalad coordinators.");
> Do we have any way to emit warnings if this setting ends up inconsistent in
How about we call it "pull_incremental_statistics_ondemand" or some such thing?


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@560
PS4, Line 560: status.AddDetail("Error making an RPC call to Catalog server.");
Is this needed? I think DoRPC has a reasonable status error message if it fails


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@561
PS4, Line 561: ;
return early?


http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@380
PS4, Line 380: // identified by their id. The caller's table_version_number is 
used by the catalog
> per offline chat earlier, I think it makes sense to just always fetch all p
Yep, it'd be nice to just fetch the aggregate. AFAICT we just need the 
aggregate.


http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@381
PS4, Line 381: if the caller and callee versions are the same.
This could probably be annoying to the users. Commands like "refresh tbl 
partition p" also update the version. If we end up taking this path, we need to 
make sure to document this (behavioral regression)


http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@385
PS4, Line 385:   // The CatalogService service ID this request came from.
> but the request came from an impalad.. maybe clearer to be a little more sp
+1


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@447
PS4, Line 447:         // Get incremental statistics from all relevant 
partitions.
I think we need more counters/timeline stuff around these RPCs in the 
incremental stats query timeline (in runtime profiles).

Also may be add a TODO that they can be done in parallel with the incremental 
stats child queries.


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@453
PS4, Line 453: partStats != null && partStats.isSetIntermediate_col_stats()
             :               && !partStats.intermediate_col_stats.isEmpty()
I think my patch simplifies this


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@422
PS4, Line 422: / Table is of the wrong type.
             :     if (!(table instanceof FeFsTable)) {
             :       throw new CatalogException("Partition statistics can only 
be requested for FS"
             :           + " tables, type is: " + 
table.getClass().getCanonicalName());
             :     }
Can we make this a preconditions check? We do such checks in ComputeStatsStmt 
analysis


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524
PS4, Line 524: // When incremental statistics exceed a size threshold or when 
they are pulled
             :   // directly from catalogd by impalads, incremental statistics 
will not be present in
             :   // the impalad's representation of the partition. However, its 
useful to know whether
             :   // incremental statistics are present at catalogd. Catalogd 
marks whether incremental
             :   // statistics are stored, which is recorded in this member in 
the impalad partition
             :   // representation.
             :   private boolean incrementalStatsAtCatalog_ = false;
My patch has similar logic around this. We should coordinate as which one 
should be merged first so that other patch should be rebased accordingly.

https://gerrit.cloudera.org/#/c/11341/


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1768
PS4, Line 1768: (statsSizeEstimate < BackendConfig.INSTANCE.getIncStatsMaxSize()
              :             && 
!BackendConfig.INSTANCE.pullIncrementalStatistics());
can we factor this out into a method for readability?


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java@211
PS4, Line 211: N/A
Add a clear description?



--
To view, visit http://gerrit.cloudera.org:8080/11193
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Gerrit-Change-Number: 11193
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Tue, 28 Aug 2018 05:58:35 +0000
Gerrit-HasComments: Yes

Reply via email to