Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11193 )

Change subject: IMPALA-7425: Change incremental stats to pull from catalogd.
......................................................................


Patch Set 4:

(30 comments)

made my way through most of these. there are several smallish changes in 
catalogservicecatalog and two larger changes coming (fetch all, simplify the 
flag). this will also need a sync w/ bharath's change.

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/catalog/catalog.h@116
PS4, Line 116: Get partition statistics for the partitions specified in
             :   /// TGetPartitionStatsRequest.
> nit: single line?
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc@210
PS4, Line 210: If used, the flag must "
             :     "be set on both catalogd and all impalad coordinators.");
> How about we call it "pull_incremental_statistics_ondemand" or some such th
There's some redundancy there so I'm leaning towards keeping it simpler.


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/common/global-flags.cc@210
PS4, Line 210: If used, the flag must "
             :     "be set on both catalogd and all impalad coordinators.");
> Do we have any way to emit warnings if this setting ends up inconsistent in
Since this is for v1 metadata, I think it would be even cleaner if it was 
specified just at the catalogd and broadcast to coordinators as a TCatalog 
property. It can be thought of as a catalogd property, so if its present, the 
impalad must pull. I'll change it and see how it looks.

With regards to flag consistency, there are a number of other examples here 
that can lead to inconsistencies (and some where its unclear where the flag 
should go). Probably worth either checking/handling uniformly and/or 
deprecating.


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@549
PS4, Line 549: extern "C" JNIEXPORT jbyteArray JNICALL
> side node gripe: we should figure out some way to refactor all of these alm
yes!


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@560
PS4, Line 560: status.AddDetail("Error making an RPC call to Catalog server.");
> Is this needed? I think DoRPC has a reasonable status error message if it f
indeed, good point. added a todo to get rid of it for prioritized load as well.


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@561
PS4, Line 561: ;
> return early?
no, we're adding the status here, which is part of the returned result, then 
serializing the result on L564.


http://gerrit.cloudera.org:8080/#/c/11193/4/be/src/service/fe-support.cc@606
PS4, Line 606:     {const_cast<char*>("NativeFeTestInit"), 
const_cast<char*>("()V"),
> nit: the reformatting of all the unrelated code is going to make this a pai
yeah, this was clang. will reduce it.


http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@380
PS4, Line 380: // identified by their id. The caller's table_version_number is 
used by the catalog
> Yep, it'd be nice to just fetch the aggregate. AFAICT we just need the aggr
several things here :-)
There is a partition spec for compute incremental stats. The partitions *not* 
in this spec are the ones whose stats we're fetching and for the rest, we're 
computing them. I'd expect that we'll fetch most and compute few, but one can 
misuse the spec or it could be the first time incr stats is run. If that's the 
case, then we'll spend more time computing anyways, so perhaps less of an 
overall issue if we fetch more than needed. In either case, yes, it'll simplify 
the request if we fetch all, and I can get rid of the versioning stuff below.


http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@381
PS4, Line 381: if the caller and callee versions are the same.
> This could probably be annoying to the users. Commands like "refresh tbl pa
see replies to first comment.


http://gerrit.cloudera.org:8080/#/c/11193/4/common/thrift/CatalogService.thrift@385
PS4, Line 385:   // The CatalogService service ID this request came from.
> +1
the replies to first comment.


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@a513
PS4, Line 513:
> nit: some unrelated removal of blank lines might make merging conflicts har
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@449
PS4, Line 449: new HashSet<Long>()
> use Collections.emptySet() here to emphasize that it's only an in-parameter
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452
PS4, Line 452: partitionStats != null
> why not make partitionStats always non-null and just return an empty map in
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@453
PS4, Line 453: partStats != null && partStats.isSetIntermediate_col_stats()
             :               && !partStats.intermediate_col_stats.isEmpty()
> I think my patch simplifies this
ok, will look out for it (or the merge will replace it)


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@497
PS4, Line 497: relevant
> should this be _other_ partitions? ie we need to fetch stats from the ones
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@595
PS4, Line 595: getPartitionStats
> maybe rename to "getOrFetchPartitionStats" to emphasize that this one is th
renamed. there's a reference to table_. added a todo to pull this out to make 
these static.


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@600
PS4, Line 600: maxNumStats
> this is more of an "expected" than actually a hard max, right?
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@601
PS4, Line 601: checkState
> if this precondition fails, it's actually a bad argument, not a bad state,
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@634
PS4, Line 634: ps
> I think if you return an empty map here it'll avoid some null checks up abo
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@644
PS4, Line 644: Exception
> this will also catch the above AnalysisException, resulting in a doubly-wra
went with the propagate.


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@645
PS4, Line 645: :
> nit: no need for the ':' in the exception message itself. When printed it w
removed.


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@392
PS4, Line 392: catalog_service_id != catalogServiceId_
> this should be a .equals() rather than reference-equals, no?
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@393
PS4, Line 393: ()
> %s not ()
whoops, forgot to fill these in.


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1768
PS4, Line 1768: (statsSizeEstimate < BackendConfig.INSTANCE.getIncStatsMaxSize()
              :             && 
!BackendConfig.INSTANCE.pullIncrementalStatistics());
> can we factor this out into a method for readability?
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java@211
PS4, Line 211: N/A
> Add a clear description?
Done. its a config issue if we get here so let me know if the update is what 
you were thinking here.


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@333
PS4, Line 333:   LOG.info(
             :         String.format
> nit: weird formatting
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java@276
PS4, Line 276: Lists.<String>newArrayList
> nit: Collections.emptyList
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java@279
PS4, Line 279: Lists.newArrayList
> ImmutableList.of(e.getMessage())
Done


http://gerrit.cloudera.org:8080/#/c/11193/4/fe/src/main/java/org/apache/impala/service/JniCatalog.java@280
PS4, Line 280:       throw e;
> don't you _not_ want to re-throw, so you can serialize it back into the res
yup, this was off.


http://gerrit.cloudera.org:8080/#/c/11193/4/tests/custom_cluster/test_pull_stats.py
File tests/custom_cluster/test_pull_stats.py:

http://gerrit.cloudera.org:8080/#/c/11193/4/tests/custom_cluster/test_pull_stats.py@30
PS4, Line 30:   
@CustomClusterTestSuite.with_args("--pull_incremental_statistics=true",
> this would read better if you also use the impalad_args= kwarg even though
Done



--
To view, visit http://gerrit.cloudera.org:8080/11193
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Gerrit-Change-Number: 11193
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Tue, 28 Aug 2018 08:23:07 +0000
Gerrit-HasComments: Yes

Reply via email to