Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16308 )
Change subject: IMPALA-4364: Query option to refresh updated partitions ...................................................................... Patch Set 1: (15 comments) Have a first round look. It's nice to refactor the REFRESH code path! The solution looks good to me. Will have a more detailed look. http://gerrit.cloudera.org:8080/#/c/16308/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16308/1//COMMIT_MSG@10 PS1, Line 10: ENABLE_REFRESH_UPDATED_PARTITIONS I think we still refresh the file metadata of some kinds of updated partitions without this flag. E.g. partitions updated by INSERTs that just change the file meta. The key part is about refreshing hms metadata. Can we make its name more specifit, e.g. FORCE_REFRESH_HMS_META? http://gerrit.cloudera.org:8080/#/c/16308/1/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/16308/1/be/src/service/client-request-state.cc@a630 PS1, Line 630: nit: don't need this? http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@911 PS1, Line 911: the callers do not need to delete the incremental stats : // strings from the hmsParams_ map later. I think this comment is stale after we change hmsParameters_ to an ImmutableMap. I think the reason is that hmsParameters_ is immutable. Make a copy so that the callers can modify the parameters. http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1270 PS1, Line 1270: PartitionDeltaChecker deltaChecker = Nice abstraction! Good to see that we refactor these codes. http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1275 PS1, Line 1275: deltaChecker.apply(); nit: "Checker" sounds like it won't modify anything. What about changing its name to something like PartitionUpdater? http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1327 PS1, Line 1327: * @param partition : * @return : * @throws Exception nit: remove these if no comments http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1334 PS1, Line 1334: public abstract long loadUpdatedPartitions( Could you also add a method comment for this? I think it means update(reload) both of the hms and file metadata of the given partition builders. http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1373 PS1, Line 1373: loadPartitionsFromMetastore() nit: stale comment, should be loadNewPartitions(). http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1406 PS1, Line 1406: HDFSTable nit: HdfsTable http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1419 PS1, Line 1419: List<Partition> partitionList = : MetaStoreUtil.fetchAllPartitions( : client_, db_.getName(), name_, NUM_PARTITION_FETCH_RETRIES); This may be time-consuming. If partitionsToUpdate != null, can we just load msPartitions only for given partitions? http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1451 PS1, Line 1451: we compare the StorageDescriptor which nit: unfinished comment? http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1453 PS1, Line 1453: if(!msPartition.getSd().equals(sd)) { Can we add a test for this to verify that unchanged partitions won't go into this branch? We depends on the implementation of StorageDescriptor#equals(). One day if it changes and breaks our understandings, we can notice it in time. http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/service/Frontend.java@581 PS1, Line 581: LOG.info("VIHANG-DEBUG: Creating TResetMetadataRequest"); I think you forgot to remove this and line 1561, 1615. http://gerrit.cloudera.org:8080/#/c/16308/1/tests/metadata/test_reset_metadata.py File tests/metadata/test_reset_metadata.py: http://gerrit.cloudera.org:8080/#/c/16308/1/tests/metadata/test_reset_metadata.py@54 PS1, Line 54: result Could you comment about what the 'result' looks like? I thought the refresh statement will only return how many partitions are updated. Looks like it's returning the result like "show partitions"? http://gerrit.cloudera.org:8080/#/c/16308/1/tests/metadata/test_reset_metadata.py@83 PS1, Line 83: make sure that refresh without the query option does not update the partition missing a test for this? -- To view, visit http://gerrit.cloudera.org:8080/16308 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50e8680509f4eb0712e7bb3de44df5f2952179af Gerrit-Change-Number: 16308 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 10 Aug 2020 02:49:13 +0000 Gerrit-HasComments: Yes
