Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11193 )

Change subject: IMPALA-7425: Change incremental stats to pull from catalogd.
......................................................................


Patch Set 2:

(10 comments)

only got partially through but have some meetings for the rest of the morning 
and early afternoon so figured i'd send along what i got so far

http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@379
PS2, Line 379: statistcs
typo


http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@382
PS2, Line 382:   1: required CatalogObjects.TTableName table_name
             :   2: required list<i64> partition_ids
it seems risky to use partition IDs without being tied to a particular version 
number of the table. If there is a concurrent refresh or reload of the table, 
we'd end up pulling stats for the wrong partitions or incorrectly deciding that 
there were no stats available when there were. Partition names are a more 
stable identifier


http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@387
PS2, Line 387: statistcs
typo


http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@388
PS2, Line 388: // If there was an error, an empty partition_stats mapping is 
returned along with
- seems like it would be easier to just make the returned map optional so youu 
dont need to return an empty one in the error case.

- can you clarify what happens when you request stats for a partition that does 
not have them? Do you get a map entry or just a "missing" entry?


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@336
PS2, Line 336:     LOG.info("analyzing compute stats");
probably didn't mean to leave this in at INFO level


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@451
PS2, Line 451: !RuntimeEnv.INSTANCE.isTestEnv()
why not isTestEnv?


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@456
PS2, Line 456:           fetchedPartitionStats = fetchPartitionStats(analyzer, 
partStatsToFetch);
it seems some of this code ends up duplicated in a few places. Did you consider 
adding a flag to 'loadPartitions' and 'loadAllPartitions' which says 
'NEED_INCREMENTAL_STATS'?

Or maybe this code could end up being a little cleaner if you always follow the 
same code path to "fetch stats" into 'fetchedPartitionStats', but in the "old" 
code path, you just "fetch" them by copying the reference from the partition 
object? I think that could reduce the branching factor of the code below and 
allow you to consolidate all the code into 'fetchPartitionStats'


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@415
PS2, Line 415:     // Table is not loaded.
             :     if (!table.isLoaded()) {
             :       throw new CatalogException("Partition statistics not 
available for not yet "
             :           + " loaded table: " + request.table_name);
can this be the case? I thought if it's FeFsTable then it's guaranteed to be 
loaded


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@428
PS2, Line 428:  tableName.getDb_name() + "." + tableName.getTable_name()
nit: fsTable.getFullName()


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524
PS2, Line 524: may not be serialized and sent from catalogd to impalads.
I find this usage of the word "may" unclear -- not sure if you mean "it's not 
permitted" or "it is possible that". Can you rephrase to something like 
"Depending on configuration, it is possible that incremental stats are not 
sent" or somesuch?



--
To view, visit http://gerrit.cloudera.org:8080/11193
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Gerrit-Change-Number: 11193
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Fri, 24 Aug 2018 16:30:33 +0000
Gerrit-HasComments: Yes

Reply via email to