Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19379 )

Change subject: IMPALA-11662: Improve 'refresh iceberg_tbl_on_oss' performance
......................................................................


Patch Set 5:

(6 comments)

Left a few comments, but the change looks great!

http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@212
PS5, Line 212: fd == null
I think fd cannot be null here, see my comment in IcebergFileMetadataLoader.


http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@17
PS5, Line 17: package org.apache.impala.catalog;
nit: please add empty line above this


http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@57
PS5, Line 57:   private static final int DEFAULT_MODIFICATION_TIME = 1;
nit: please add comment why we need this


http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@89
PS5, Line 89:         LOG.warn("Unable to load Iceberg datafile '{}', "
            :             + "because it's outside of the table location", 
fileStatus.getPath().toUri());
            :         ++loadStats_.skippedFiles;
            :         return null;
skippedFiles means the following based on its comment: "Number of files skipped 
from file metadata loading because the files have not changed since the last 
load."

It is not for data files that we cannot load. Also, if a data file is listed in 
the Iceberg metadata then we must be able to create a file descriptor for it, 
otherwise we would have a partial view of the table.

So I think in this case we should rather raise an exception and fail table 
loading.


http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@124
PS5, Line 124: // Avoid the cost of directory listing on OSS service e.g. S3A, 
COS, OSS, etc.
This comment mentions "Avoid the cost of directory listing", then there is a 
FileSystemUtil.listFiles() few lines under. Please add some details to the 
comment.


http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@113
PS5, Line 113:       if (format.equals(HdfsFileFormat.ICEBERG)) {
> I want to choose whether to create the metadata ourself or use LocatedFileS
I see, sorry, I should have taken a closer look at IcebergFileMetadataLoader.



--
To view, visit http://gerrit.cloudera.org:8080/19379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2ee8b6b7559e6590698b46ef1d574e55ed52f9a
Gerrit-Change-Number: 19379
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <lipeng...@sensorsdata.cn>
Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn>
Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Jan 2023 15:12:48 +0000
Gerrit-HasComments: Yes

Reply via email to