Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during 
IcebergTable load
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@472
PS1, Line 472:     /**
             :      * Get FileDescriptor by data file location
             :      */
             :     public static HdfsPartition.FileDescriptor 
getFileDescriptor(Path fileLoc,
             :         Path tableLoc, ListMap<TNetworkAddress> hostIndex) 
throws IOException {
             :       FileSystem fs = 
FileSystemUtil.getFileSystemForPath(tableLoc);
             :       FileStatus fileStatus = fs.getFileStatus(fileLoc);
             :       return getFileDescriptor(fs, tableLoc, fileStatus, 
hostIndex);
             :     }
             :
             :     private static HdfsPartition.FileDescriptor 
getFileDescriptor(FileSystem fs,
             :         Path tableLoc, FileStatus fileStatus, 
ListMap<TNetworkAddress> hostIndex)
             :         throws IOException {
             :       Reference<Long> numUnknownDiskIds = new 
Reference<Long>(Long.valueOf(0));
             :       String relPath = 
FileSystemUtil.relativizePath(fileStatus.getPath(), tableLoc);
             :
             :       if (!FileSystemUtil.supportsStorageIds(fs)) {
             :         return 
HdfsPartition.FileDescriptor.createWithNoBlocks(fileStatus, relPath);
             :       }
             :
             :       BlockLocation[] locations;
             :       if (fileStatus instanceof LocatedFileStatus) {
             :         locations = 
((LocatedFileStatus)fileStatus).getBlockLocations();
             :       } else {
             :         locations = fs.getFileBlockLocations(fileStatus, 0, 
fileStatus.getLen());
             :       }
             :
             :       return HdfsPartition.FileDescriptor.create(fileStatus, 
relPath, locations,
             :           hostIndex, HdfsShim.isErasureCoded(fileStatus), 
numUnknownDiskIds);
             :     }
Becomes unused? Or maybe we can keep it as a fallback? E.g. generate a warning 
when we cannot find the file descriptor in the HDFS table, but load it anyway 
with these methods?


http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@516
PS1, Line 516: fileDescMap
We should only put entries to the fileDescMap if they are actually in the 
Iceberg data file list. E.g. if the table is compacted then the 'data' 
directory potentially contain much more files than the current snapshot.

Also, later we'll need to decorate the file descriptors with information coming 
from the Iceberg data files (e.g. which partition they belong, their file 
format, etc.).

So I think we should create a temporary map of the file descriptors in the HDFS 
table (only using paths under the 'data' directory) and use it as a lookup 
table to populate another map ('fileDescMap') that only contains the snapshot 
data files.



--
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 20 Jan 2022 19:29:31 +0000
Gerrit-HasComments: Yes

Reply via email to