Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
......................................................................


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@26
PS8, Line 26: import java.text.DecimalFormat;
> nit: unsed import
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@642
PS8, Line 642:     ValidTxnList validTxnList = validWriteIds_ != null ? 
loadValidTxns(client) : null;
> This is no longer used.
the latest patch refactors the code and uses this.


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@997
PS8, Line 997:         loadValidWriteIdList(client);
> We always update the validWriteIdList here no matter fully or incrementally
Yes, I think you are right. The ValidWriteIdList should only be reloaded when 
file-metadata is being reloaded in my opinion. But this is tricky to do since 
that would mean that in case of incremental refresh we might end up doing a 
full file-metadata refresh and may be seen as a perf regression. I think it 
would require some more thought and testing.

For now, I would like to keep that out of scope of this patch since this 
focuses only on the read-path. The patch is not operational currently since the 
coordinators never send any ValidWriteIdList. I created IMPALA-9841 to track 
those changes.


http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@102
PS5, Line 102:           Utils.shouldRecursivelyListPartitions(table), oldFds, 
table.getHostIndex(),
> line too long (99 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@79
PS8, Line 79:     // Group the partitions by their path (multiple partitions 
may point to the same
            :     // path).
> nit: could you move this comment to line 91?
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@83
PS8, Line 83:       try (MetaStoreClient client = 
MetaStoreClientPool.get().getClient()) {
            :         try {
> nit: can we have a single try-catch?
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101
PS8, Line 101:       FileMetadataLoader loader = new 
FileMetadataLoader(e.getKey(),
             :           Utils.shouldRecursivelyListPartitions(table), oldFds, 
table.getHostIndex(),
             :           validTxnList, writeIdList, 
e.getValue().get(0).getFileFormat());
> IIUC, the 'writeIdList' sent from the client(coordinator) is only used to c
I would like to defer opening a transaction during table load to a separate 
change since it is needed for a separate issue.

I refactored the code so that ParallelFileMetadataLoader doesn't fetch the 
transaction id list here since its seems cleaner (FileMetadataLoaders deal with 
FileSystem not HMS).


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101
PS8, Line 101:       FileMetadataLoader loader = new 
FileMetadataLoader(e.getKey(),
             :           Utils.shouldRecursivelyListPartitions(table), oldFds, 
table.getHostIndex(),
             :           validTxnList, writeIdList, 
e.getValue().get(0).getFileFormat());
> If 'validTxnList' is loaded after a while since 'writeIdList' is fetched by
we are actually doing a strict check in the regular path as well now. But 
instead of throwing an exception we just ignore the files which have open 
writeIds in compacted files. See the changes in AcidUtils.java for 
WriteListBasedPredicate class.


http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@189
PS5, Line 189:       List<PartitionRef> partitionRefs) throws CatalogException, 
TException {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java@372
PS8, Line 372: try
> nit: trying
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@145
PS8, Line 145: MetaE
> nit: CatalogException
Done



--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Anurag Mantripragada <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 08 Jun 2020 21:43:31 +0000
Gerrit-HasComments: Yes

Reply via email to