Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11193 )
Change subject: IMPALA-7425: Change incremental stats to pull from catalogd. ...................................................................... Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@379 PS2, Line 379: statistcs > typo Done http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@382 PS2, Line 382: 1: required CatalogObjects.TTableName table_name : 2: required list<i64> partition_ids > it seems risky to use partition IDs without being tied to a particular vers Changed it to detect when the table is a differing version. That makes the approach similar to the one taken by TTableInfoSelector which also uses ids. If we can use ids, I'd prefer it since its less verbose. Any further reason to use the partitioning values (its name)? Or, what's the reason that you went with ids instead of names for TTableInfoSelector? Also, I put the version on the request. If the catalog detects that the versions are out-of-sync, it throws an exception. This way, we don't go through the work of sending back stats that will be rejected by the client. http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@387 PS2, Line 387: statistcs > typo Done http://gerrit.cloudera.org:8080/#/c/11193/2/common/thrift/CatalogService.thrift@388 PS2, Line 388: // If there was an error, an empty partition_stats mapping is returned along with > - seems like it would be easier to just make the returned map optional so y made the return optional. clarified the comment about the returned result. http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@336 PS2, Line 336: LOG.info("analyzing compute stats"); > probably didn't mean to leave this in at INFO level Done http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@451 PS2, Line 451: !RuntimeEnv.INSTANCE.isTestEnv() > why not isTestEnv? when in test mode, my understanding is that the impalad contains a catalogd, so when trying to make an rpc to the catalog, it fails. as a result, for frontend tests, this path is not run. http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@456 PS2, Line 456: fetchedPartitionStats = fetchPartitionStats(analyzer, partStatsToFetch); > it seems some of this code ends up duplicated in a few places. Did you cons I considered it but since I was only using it for the incremental case, in order to keep things simpler, I left it out. Yes, this method is a bit branchy already so this change didn't help that. I've pushed both main branches down to a common method. http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@415 PS2, Line 415: // Table is not loaded. : if (!table.isLoaded()) { : throw new CatalogException("Partition statistics not available for not yet " : + " loaded table: " + request.table_name); > can this be the case? I thought if it's FeFsTable then it's guaranteed to b indeed. I see the default (true) is not overridden down to that class. changed it to a precondition. http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@428 PS2, Line 428: tableName.getDb_name() + "." + tableName.getTable_name() > nit: fsTable.getFullName() Done http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11193/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524 PS2, Line 524: may not be serialized and sent from catalogd to impalads. > I find this usage of the word "may" unclear -- not sure if you mean "it's n re-worded http://gerrit.cloudera.org:8080/#/c/11193/2/tests/conftest.py File tests/conftest.py: http://gerrit.cloudera.org:8080/#/c/11193/2/tests/conftest.py@125 PS2, Line 125: " > flake8: E501 line too long (91 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/11193/2/tests/custom_cluster/test_pull_stats.py File tests/custom_cluster/test_pull_stats.py: http://gerrit.cloudera.org:8080/#/c/11193/2/tests/custom_cluster/test_pull_stats.py@21 PS2, Line 21: class TestPullStatistics(CustomClusterTestSuite): > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/11193 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Gerrit-Change-Number: 11193 Gerrit-PatchSet: 2 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 24 Aug 2018 23:18:30 +0000 Gerrit-HasComments: Yes
