Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )
Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API ...................................................................... Patch Set 8: (11 comments) http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@26 PS8, Line 26: import java.text.DecimalFormat; > nit: unsed import Done http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@642 PS8, Line 642: ValidTxnList validTxnList = validWriteIds_ != null ? loadValidTxns(client) : null; > This is no longer used. the latest patch refactors the code and uses this. http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@997 PS8, Line 997: loadValidWriteIdList(client); > We always update the validWriteIdList here no matter fully or incrementally Yes, I think you are right. The ValidWriteIdList should only be reloaded when file-metadata is being reloaded in my opinion. But this is tricky to do since that would mean that in case of incremental refresh we might end up doing a full file-metadata refresh and may be seen as a perf regression. I think it would require some more thought and testing. For now, I would like to keep that out of scope of this patch since this focuses only on the read-path. The patch is not operational currently since the coordinators never send any ValidWriteIdList. I created IMPALA-9841 to track those changes. http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@102 PS5, Line 102: Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(), > line too long (99 > 90) Done http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@79 PS8, Line 79: // Group the partitions by their path (multiple partitions may point to the same : // path). > nit: could you move this comment to line 91? Done http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@83 PS8, Line 83: try (MetaStoreClient client = MetaStoreClientPool.get().getClient()) { : try { > nit: can we have a single try-catch? Done http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101 PS8, Line 101: FileMetadataLoader loader = new FileMetadataLoader(e.getKey(), : Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(), : validTxnList, writeIdList, e.getValue().get(0).getFileFormat()); > IIUC, the 'writeIdList' sent from the client(coordinator) is only used to c I would like to defer opening a transaction during table load to a separate change since it is needed for a separate issue. I refactored the code so that ParallelFileMetadataLoader doesn't fetch the transaction id list here since its seems cleaner (FileMetadataLoaders deal with FileSystem not HMS). http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101 PS8, Line 101: FileMetadataLoader loader = new FileMetadataLoader(e.getKey(), : Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(), : validTxnList, writeIdList, e.getValue().get(0).getFileFormat()); > If 'validTxnList' is loaded after a while since 'writeIdList' is fetched by we are actually doing a strict check in the regular path as well now. But instead of throwing an exception we just ignore the files which have open writeIds in compacted files. See the changes in AcidUtils.java for WriteListBasedPredicate class. http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@189 PS5, Line 189: List<PartitionRef> partitionRefs) throws CatalogException, TException { > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java@372 PS8, Line 372: try > nit: trying Done http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java: http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@145 PS8, Line 145: MetaE > nit: CatalogException Done -- To view, visit http://gerrit.cloudera.org:8080/16008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a Gerrit-Change-Number: 16008 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Anurag Mantripragada <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 08 Jun 2020 21:43:31 +0000 Gerrit-HasComments: Yes
