Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16308 )

Change subject: IMPALA-4364: Query option to refresh updated partitions
......................................................................


Patch Set 1:

(15 comments)

Have a first round look. It's nice to refactor the REFRESH code path! The 
solution looks good to me. Will have a more detailed look.

http://gerrit.cloudera.org:8080/#/c/16308/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16308/1//COMMIT_MSG@10
PS1, Line 10: ENABLE_REFRESH_UPDATED_PARTITIONS
I think we still refresh the file metadata of some kinds of updated partitions 
without this flag. E.g. partitions updated by INSERTs that just change the file 
meta. The key part is about refreshing hms metadata. Can we make its name more 
specifit, e.g. FORCE_REFRESH_HMS_META?


http://gerrit.cloudera.org:8080/#/c/16308/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/16308/1/be/src/service/client-request-state.cc@a630
PS1, Line 630:
nit: don't need this?


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@911
PS1, Line 911: the callers do not need to delete the incremental stats
             :     // strings from the hmsParams_ map later.
I think this comment is stale after we change hmsParameters_ to an 
ImmutableMap. I think the reason is that hmsParameters_ is immutable. Make a 
copy so that the callers can modify the parameters.


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1270
PS1, Line 1270:     PartitionDeltaChecker deltaChecker =
Nice abstraction! Good to see that we refactor these codes.


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1275
PS1, Line 1275:     deltaChecker.apply();
nit: "Checker" sounds like it won't modify anything. What about changing its 
name to something like PartitionUpdater?


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1327
PS1, Line 1327:      * @param partition
              :      * @return
              :      * @throws Exception
nit: remove these if no comments


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1334
PS1, Line 1334:     public abstract long loadUpdatedPartitions(
Could you also add a method comment for this? I think it means update(reload) 
both of the hms and file metadata of the given partition builders.


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1373
PS1, Line 1373: loadPartitionsFromMetastore()
nit: stale comment, should be loadNewPartitions().


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1406
PS1, Line 1406: HDFSTable
nit: HdfsTable


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1419
PS1, Line 1419:       List<Partition> partitionList =
              :           MetaStoreUtil.fetchAllPartitions(
              :               client_, db_.getName(), name_, 
NUM_PARTITION_FETCH_RETRIES);
This may be time-consuming. If partitionsToUpdate != null, can we just load 
msPartitions only for given partitions?


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1451
PS1, Line 1451: we compare the StorageDescriptor which
nit: unfinished comment?


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1453
PS1, Line 1453: if(!msPartition.getSd().equals(sd)) {
Can we add a test for this to verify that unchanged partitions won't go into 
this branch? We depends on the implementation of StorageDescriptor#equals(). 
One day if it changes and breaks our understandings, we can notice it in time.


http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/16308/1/fe/src/main/java/org/apache/impala/service/Frontend.java@581
PS1, Line 581:       LOG.info("VIHANG-DEBUG: Creating TResetMetadataRequest");
I think you forgot to remove this and line 1561, 1615.


http://gerrit.cloudera.org:8080/#/c/16308/1/tests/metadata/test_reset_metadata.py
File tests/metadata/test_reset_metadata.py:

http://gerrit.cloudera.org:8080/#/c/16308/1/tests/metadata/test_reset_metadata.py@54
PS1, Line 54: result
Could you comment about what the 'result' looks like? I thought the refresh 
statement will only return how many partitions are updated. Looks like it's 
returning the result like "show partitions"?


http://gerrit.cloudera.org:8080/#/c/16308/1/tests/metadata/test_reset_metadata.py@83
PS1, Line 83: make sure that refresh without the query option does not update 
the partition
missing a test for this?



--
To view, visit http://gerrit.cloudera.org:8080/16308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50e8680509f4eb0712e7bb3de44df5f2952179af
Gerrit-Change-Number: 16308
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 10 Aug 2020 02:49:13 +0000
Gerrit-HasComments: Yes

Reply via email to