Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11193 )

Change subject: IMPALA-7425: Change incremental stats to pull from catalogd.
......................................................................


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@379
PS2, Line 379: statistcs
> typo
Done


http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@382
PS2, Line 382:   1: required CatalogObjects.TTableName table_name
             :   2: required list<i64> partition_ids
> it seems risky to use partition IDs without being tied to a particular vers
Changed it to detect when the table is a differing version. That makes the 
approach similar to the one taken by TTableInfoSelector which also uses ids. If 
we can use ids, I'd prefer it since its less verbose. Any further reason to use 
the partitioning values (its name)? Or, what's the reason that you went with 
ids instead of names for TTableInfoSelector?

Also, I put the version on the request. If the catalog detects that the 
versions are out-of-sync, it throws an exception. This way, we don't go through 
the work of sending back stats that will be rejected by the client.


http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@387
PS2, Line 387: statistcs
> typo
Done


http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@388
PS2, Line 388: // If there was an error, an empty partition_stats mapping is 
returned along with
> - seems like it would be easier to just make the returned map optional so y
made the return optional. clarified the comment about the returned result.


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@336
PS2, Line 336:     LOG.info("analyzing compute stats");
> probably didn't mean to leave this in at INFO level
Done


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@451
PS2, Line 451: !RuntimeEnv.INSTANCE.isTestEnv()
> why not isTestEnv?
when in test mode, my understanding is that the impalad contains a catalogd, so 
when trying to make an rpc to the catalog, it fails. as a result, for frontend 
tests, this path is not run.


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@456
PS2, Line 456:           fetchedPartitionStats = fetchPartitionStats(analyzer, 
partStatsToFetch);
> it seems some of this code ends up duplicated in a few places. Did you cons
I considered it but since I was only using it for the incremental case, in 
order to keep things simpler, I left it out.

Yes, this method is a bit branchy already so this change didn't help that. I've 
pushed both main branches down to a common method.


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@415
PS2, Line 415:     // Table is not loaded.
             :     if (!table.isLoaded()) {
             :       throw new CatalogException("Partition statistics not 
available for not yet "
             :           + " loaded table: " + request.table_name);
> can this be the case? I thought if it's FeFsTable then it's guaranteed to b
indeed. I see the default (true) is not overridden down to that class. changed 
it to a precondition.


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@428
PS2, Line 428:  tableName.getDb_name() + "." + tableName.getTable_name()
> nit: fsTable.getFullName()
Done


http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524
PS2, Line 524: may not be serialized and sent from catalogd to impalads.
> I find this usage of the word "may" unclear -- not sure if you mean "it's n
re-worded


http://gerrit.cloudera.org:8080/#/c/11193/2/tests/conftest.py
File tests/conftest.py:

http://gerrit.cloudera.org:8080/#/c/11193/2/tests/conftest.py@125
PS2, Line 125: "
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/11193/2/tests/custom_cluster/test_pull_stats.py
File tests/custom_cluster/test_pull_stats.py:

http://gerrit.cloudera.org:8080/#/c/11193/2/tests/custom_cluster/test_pull_stats.py@21
PS2, Line 21: class TestPullStatistics(CustomClusterTestSuite):
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/11193
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Gerrit-Change-Number: 11193
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 24 Aug 2018 23:18:30 +0000
Gerrit-HasComments: Yes

Reply via email to