Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : [WIP] FileMetadataLoader should skip listing 
files in hidden and tmp directories
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@580
PS2, Line 580: /**
             :    * Filter interface for a FileStatus. Useful for filtering out 
disinteresting
             :    * FileStatuses from a given list
             :    */
             :   public interface FileStatusFilter {
             :
             :     /**
             :      * Given a FileStatus returns if the filter accepts it or not
             :      * @param fileStatus
             :      * @return true if the filter accepts it, else false
             :      */
             :     boolean accept(FileStatus fileStatus);
             :   }
             :
             :   /**
             :    * FileStatusFilter implementation which is
             :    * useful to filter out hidden directories and temporary 
staging directories
             :    * which tools like Hive create in the table/partition 
directories when a query is
             :    * inserting data into them.
             :    */
             :   public static final FileStatusFilter HIDDEN_DIRECTORIES_FILTER 
= fileStatus -> {
             :     if (!fileStatus.isDirectory()) return false;
             :     String filename = fileStatus.getPath().getName();
             :     return filename.startsWith(".") || 
filename.startsWith("_tmp.");
             :   };
You could also implement a Predicate<FileStatus> overriding test().


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@656
PS2, Line 656:     private final boolean isRecursive_;
I think the refactor is confusing. RecursiingIterator can also have 
isRecursive_ = false? Can we clean it up a bit?


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@716
PS2, Line 716: { return; }
nit: no braces


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@93
PS2, Line 93: hdfs://localhost:20500/
Do we need these qualifiers? I think the Configuration should resolve it to an 
appropriate file system


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@100
PS2, Line 100: dstFs.deleteOnExit(tmpTestPath);
Do you need wrap the fs in try-with-resources for this to work?


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@112
PS2, Line 112: 24
Instead, we could try loading sourcePath and substitute the values?



--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Wed, 19 Jun 2019 01:41:44 +0000
Gerrit-HasComments: Yes

Reply via email to