Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )
Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API ...................................................................... Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@26 PS8, Line 26: import java.text.DecimalFormat; nit: unsed import http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@642 PS8, Line 642: ValidTxnList validTxnList = validWriteIds_ != null ? loadValidTxns(client) : null; This is no longer used. http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@997 PS8, Line 997: loadValidWriteIdList(client); We always update the validWriteIdList here no matter fully or incrementally reload the table. Most of the DDL/DML code paths invoke incremental reload. E.g. INSERT statement will only reload affected partitions or load new partitions. For other partitions, they are previously loaded with an old validWriteIdList. Should we trigger a full reload when we detect they are stale (e.g. validWriteIdList changes more than expected)? For instance, I'm not sure if we handle this scenario correctly: impala> refresh tableA; // tableA has two partitions p=1,2 hive> insert into tableA partition (p=1) values ...; impala> insert into table tableA partition (p=2) values ...; In the INSERT statement, we just reload the metadata of partition(p=2). However, the validWriteIdList is also updated. This patch use this new validWriteIdList to compare with those sent from clients(coordinators), if they match, we are returning the stale metadata for partition(p=1). Is my understanding correct? http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@79 PS8, Line 79: // Group the partitions by their path (multiple partitions may point to the same : // path). nit: could you move this comment to line 91? http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101 PS8, Line 101: FileMetadataLoader loader = new FileMetadataLoader(e.getKey(), : Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(), : validTxnList, writeIdList, e.getValue().get(0).getFileFormat()); > If 'validTxnList' is loaded after a while since 'writeIdList' is fetched by IIUC, the 'writeIdList' sent from the client(coordinator) is only used to compare with the table's 'writeIdList'. I think 'writeIdList' here comes from HdfsTable.load() instead of the client(coordinator): https://github.com/apache/impala/blob/d45e3a50b003259e4ef1023333b47781a028eb19/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1017 and it's only updated in HdfsTable.load() -> HdfsTable.loadValidWriteIdList(). Should we cache the 'validTxnList' there instead of getting it here? and also open a read transaction in HdfsTable.load()? http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java@372 PS8, Line 372: try nit: trying -- To view, visit http://gerrit.cloudera.org:8080/16008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a Gerrit-Change-Number: 16008 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Anurag Mantripragada <anu...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 08 Jun 2020 13:29:05 +0000 Gerrit-HasComments: Yes