Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11193 )

Change subject: IMPALA-7425: Change incremental stats to pull from catalogd.
......................................................................


Patch Set 9:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/catalog/catalog.cc@61
PS9, Line 61:       {"getDbs", "([B)[B", &get_dbs_id_}, {"getFunctions", 
"([B)[B", &get_functions_id_},
> nit:looks like your editor still reformatted this list, might cause conflic
Done. that was clang.


http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@207
PS9, Line 207: an
> nit typo
Done


http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/common/global-flags.cc@211
PS9, Line 211: DEFINE_bool(pull_incremental_statistics, false,
> should we consider changing this to true if we're pretty confident it works
So by default, we'd get this feature and compressed stats. then to try v2, one 
would restart an impalad with --use_local_catalog and flip the hidden flag for 
this to false?
The other consideration I had was for backports. I would think the user would 
want to explicitly enable it.
I am fine either way-- let me know if you were thinking of other cases.


http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/11193/9/be/src/service/fe-support.cc@627
PS9, Line 627:       const_cast<char*>("NativeGetPartialCatalogObject"), 
const_cast<char*>("([B)[B"),
> same as elsewhere, your editor is reformat-happy
Done.


http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@382
PS9, Line 382:   // The CatalogService service ID this request came from.
> hm, earlier comment still relevant - request didn't come from a catalog ser
missed this. agreed, can remove this when fetching all. the other place I saw 
this guard used was for making multiple requests to the catalogd (multiple 
prioritized loads). however, in this case, its a single request.


http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@389
PS9, Line 389: // statistic is returned. Partitions are identified by their 
partitioning column
> Curious why you chose to identify them by list<string> instead of just thei
Which string name are you referring to? For example 
FeCatalogUtils.getPartitionName repeats the schema for each partition, so I'd 
expect that to be larger, right? Either way, the key size should be dominated 
by the stats for the common case. I've updated this to be a single string, so I 
can tune it some more. For example I can concat the values and factor out the 
schema (or just leave the schema out).


http://gerrit.cloudera.org:8080/#/c/11193/9/common/thrift/CatalogService.thrift@396
PS9, Line 396:   2: optional map<list<string>, CatalogObjects.TPartitionStats> 
partition_stats
> will be interesting to see how this interacts with Bharath's change
an option here is to send it over compressed.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@483
PS9, Line 483: NULL
> not your change, but it's odd to me that we use NULL here whereas the defau
good point. added a todo to look into this. I expect this would have come up by 
now.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@618
PS9, Line 618: partitionIds
> if you just passed in the List<FeFsPartition> here, would that work? You no
good call! simplified nicely.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@661
PS9, Line 661:           Preconditions.checkArgument(part instanceof 
FeFsPartition);
> maybe irrelevant if you make the above change to just pass through the List
Done


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@665
PS9, Line 665: remoteStats.isSetIntermediate_col_stats
> the doc on the thrift RPC says that it would return a "default stats object
yup, this was left-over from the prev. version. updated the comment and the 
catalogd side.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@670
PS9, Line 670:       } else {
> is it possible to invert this condition so you have the "return empty" case
Done


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
File fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java@95
PS9, Line 95:    *  a list of partitioning key values as strings.
> nit: extra spacing after *s
Done


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@423
PS9, Line 423:  +
> this should be a comma, no?
Done


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@428
PS9, Line 428:     FeFsTable fsTable = (FeFsTable) table;
> since this runs on the catalog, you could avoid some of the downcasts below
removed some of these downcasts.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2079
PS9, Line 2079:         Preconditions.checkState(p instanceof HdfsPartition);
> maybe this check should actually be an && above? ie if it's LocalCatalog, t
I expect this may change a little when merging, but yeah, makes sense to pull 
this to the if clause.


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@330
PS9, Line 330: GetPartitionStats
> at INFO level I think we should make this more user-consumable and not refe
Done


http://gerrit.cloudera.org:8080/#/c/11193/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@356
PS9, Line 356: %s partitions
> stats for %s partitions
Done



--
To view, visit http://gerrit.cloudera.org:8080/11193
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Gerrit-Change-Number: 11193
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 31 Aug 2018 08:50:03 +0000
Gerrit-HasComments: Yes

Reply via email to