Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16392 )
Change subject: IMPALA-10075: Reuse unchanged partition instances ...................................................................... Patch Set 6: (10 comments) Thanks for the review! Addressed the comments. http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@7 PS5, Line 7: IMPAL > typo: IMPALA Oops. Done. http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@11 PS5, Line 11: redundan > typo: redundant Done http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@13 PS5, Line 13: changed > typo: changed Done http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@293 PS5, Line 293: /** Number of files skipped because they pertain to an uncommitted ACID transaction */ : public int uncommittedAcidFilesSkipped = 0; > Are you sure that we shouldn't check other properties too? Or the > modification time will change for all other changes? > An example that comes to mind is HDFS rebalancing - my guess is that if the > locations of some blocks change for a file, the modification time won't > change, so we won't propagate the change, which can lead to suboptimal plans. This is consistent with the checks in hasFileChanged at line 278. Currently, table reloading (e.g. REFRESH or any reloadings triggered by DDL/DMLs) won't deal with HDFS rebalancing that changes file locations (it requires an INVALIDATE METADATA command). I think one reason is that it doesn't impact query correctness. > One more thing: I would prefer to move this logic to FileDescriptor, e.g. in > a function like equalsTo(), or isUnchanged(). Good point. Refactored these. http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541 PS5, Line 1541: ble_.getPartitionLocationCo > Is it possible for these two to be different? Good point. They are always the same in current code paths. Just add the check since this is a public method. http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541 PS5, Line 1541: sor().new Loc > I think that relying on filesChanged_ is counter intuitive in equalsTo. Cha Good point! Changed the method name. Also replaced the check on filesChanged_ with checks on encodedFileDescriptors_ etc. Because if we detect files not changed in ParallelFileMetadataLoader#load(), we won't update the file descriptors. They should remain the same as the original ones. http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1545 PS5, Line 1545: ition.getSta > We are using oldInstance_ here and in the last line while using oldInstance Done http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@933 PS5, Line 933: HdfsPartition ol > Do you plan to change the logging below? Oops... Forgot this. Done. http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py File tests/metadata/test_reuse_partitions.py: http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@49 PS5, Line 49: test_reuse_partitions > An idea to extend this test is to run the same queries for an insert_only t Good point! Done. http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@55 PS5, Line 55: def __test_reuse_partitions_helper(self, unique_database, transactional=False): : """Test catalogd reuses partitio > nit: +2 indentation Done -- To view, visit http://gerrit.cloudera.org:8080/16392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2dd645c260d271291021e52fdac4b74924df1170 Gerrit-Change-Number: 16392 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Sun, 11 Oct 2020 03:32:25 +0000 Gerrit-HasComments: Yes
