Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16392 )

Change subject: IMPALA-10075: Reuse unchanged partition instances
......................................................................


Patch Set 6:

(10 comments)

Thanks for the review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@7
PS5, Line 7: IMPAL
> typo: IMPALA
Oops. Done.


http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@11
PS5, Line 11: redundan
> typo: redundant
Done


http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@13
PS5, Line 13: changed
> typo: changed
Done


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@293
PS5, Line 293:     /** Number of files skipped because they pertain to an 
uncommitted ACID transaction */
             :     public int uncommittedAcidFilesSkipped = 0;
> Are you sure that we shouldn't check other properties too? Or the 
> modification time will change for all other changes?
> An example that comes to mind is HDFS rebalancing - my guess is that if the 
> locations of some blocks change for a file, the modification time won't 
> change, so we won't propagate the change, which can lead to suboptimal plans.

This is consistent with the checks in hasFileChanged at line 278. Currently, 
table reloading (e.g. REFRESH or any reloadings triggered by DDL/DMLs) won't 
deal with HDFS rebalancing that changes file locations (it requires an 
INVALIDATE METADATA command). I think one reason is that it doesn't impact 
query correctness.

> One more thing: I would prefer to move this logic to FileDescriptor, e.g. in 
> a function like equalsTo(), or isUnchanged().

Good point. Refactored these.


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541
PS5, Line 1541: ble_.getPartitionLocationCo
> Is it possible for these two to be different?
Good point. They are always the same in current code paths. Just add the check 
since this is a public method.


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541
PS5, Line 1541: sor().new Loc
> I think that relying on filesChanged_ is counter intuitive in equalsTo. Cha
Good point! Changed the method name. Also replaced the check on filesChanged_ 
with checks on encodedFileDescriptors_ etc. Because if we detect files not 
changed in ParallelFileMetadataLoader#load(), we won't update the file 
descriptors. They should remain the same as the original ones.


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1545
PS5, Line 1545: ition.getSta
> We are using oldInstance_ here and in the last line while using oldInstance
Done


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@933
PS5, Line 933:     HdfsPartition ol
> Do you plan to change the logging below?
Oops... Forgot this. Done.


http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py
File tests/metadata/test_reuse_partitions.py:

http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@49
PS5, Line 49: test_reuse_partitions
> An idea to extend this test is to run the same queries for an insert_only t
Good point! Done.


http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@55
PS5, Line 55:   def __test_reuse_partitions_helper(self, unique_database, 
transactional=False):
            :     """Test catalogd reuses partitio
> nit: +2 indentation
Done



--
To view, visit http://gerrit.cloudera.org:8080/16392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2dd645c260d271291021e52fdac4b74924df1170
Gerrit-Change-Number: 16392
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Sun, 11 Oct 2020 03:32:25 +0000
Gerrit-HasComments: Yes

Reply via email to