Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@26
PS8, Line 26: import java.text.DecimalFormat;
nit: unsed import


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@642
PS8, Line 642:     ValidTxnList validTxnList = validWriteIds_ != null ? 
loadValidTxns(client) : null;
This is no longer used.


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@997
PS8, Line 997:         loadValidWriteIdList(client);
We always update the validWriteIdList here no matter fully or incrementally 
reload the table. Most of the DDL/DML code paths invoke incremental reload. 
E.g. INSERT statement will only reload affected partitions or load new 
partitions. For other partitions, they are previously loaded with an old 
validWriteIdList. Should we trigger a full reload when we detect they are stale 
(e.g. validWriteIdList changes more than expected)? For instance, I'm not sure 
if we handle this scenario correctly:

impala> refresh tableA;   // tableA has two partitions p=1,2
hive> insert into tableA partition (p=1) values ...;
impala> insert into table tableA partition (p=2) values ...;

In the INSERT statement, we just reload the metadata of partition(p=2). 
However, the validWriteIdList is also updated. This patch use this new 
validWriteIdList to compare with those sent from clients(coordinators), if they 
match, we are returning the stale metadata for partition(p=1). Is my 
understanding correct?


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@79
PS8, Line 79:     // Group the partitions by their path (multiple partitions 
may point to the same
            :     // path).
nit: could you move this comment to line 91?


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101
PS8, Line 101:       FileMetadataLoader loader = new 
FileMetadataLoader(e.getKey(),
             :           Utils.shouldRecursivelyListPartitions(table), oldFds, 
table.getHostIndex(),
             :           validTxnList, writeIdList, 
e.getValue().get(0).getFileFormat());
> If 'validTxnList' is loaded after a while since 'writeIdList' is fetched by
IIUC, the 'writeIdList' sent from the client(coordinator) is only used to 
compare with the table's 'writeIdList'. I think 'writeIdList' here comes from 
HdfsTable.load() instead of the client(coordinator):
https://github.com/apache/impala/blob/d45e3a50b003259e4ef1023333b47781a028eb19/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1017
and it's only updated in HdfsTable.load() -> HdfsTable.loadValidWriteIdList(). 
Should we cache the 'validTxnList' there instead of getting it here? and also 
open a read transaction in HdfsTable.load()?


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java@372
PS8, Line 372: try
nit: trying



--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <anu...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 13:29:05 +0000
Gerrit-HasComments: Yes

Reply via email to