Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip listing files in 
hidden and tmp directories
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/13665/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13665/3//COMMIT_MSG@7
PS3, Line 7: IMPALA-8663 : FileMetadataLoader should skip listing files in 
hidden and tmp directories
nit: wrap


http://gerrit.cloudera.org:8080/#/c/13665/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@549
PS3, Line 549:    if (recursive) {
             :         // The Hadoop FileSystem API doesn't provide a recursive 
listStatus call that
             :         // doesn't also fetch block locations, and fetching 
block locations is expensive.
             :         // Here, our caller specifically doesn't need block 
locations, so we don't want to
             :         // call the expensive 'listFiles' call on HDFS. Instead, 
we need to "manually"
             :         // recursively call FileSystem.listStatusIterator().
             :         //
             :         // Note that this "manual" recursion is not actually any 
slower than the recursion
             :         // provided by the HDFS 'listFiles(recursive=true)' API, 
since the HDFS wire
             :         // protocol doesn't provide any such recursive support 
anyway. In other words,
             :         // the API that looks like a single recursive call is 
just as bad as what we're
             :         // doing here.
             :         //
             :         // However, S3 actually implements 
'listFiles(recursive=true)' with a faster path
             :         // which natively recurses. In that case, it's quite 
preferable to use 'listFiles'
             :         // even though it returns LocatedFileStatus objects with 
"fake" blocks which we
             :         // will ignore.
             :         if (isS3AFileSystem(fs)) {
             :           return listFiles(fs, p, recursive);
             :         }
             :
             :         return new RemoteIteratorWithFilter(fs, p, true);
             :       }
             :
             :       return new RemoteIteratorWithFilter(fs, p, false)
I think this is equivalent to...

if (S3) return listFiles(fs, p, recursive);
return new RemoteIteratorWithFilter(fs, p, recursive);


http://gerrit.cloudera.org:8080/#/c/13665/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@581
PS3, Line 581: useful
nit: used ?


http://gerrit.cloudera.org:8080/#/c/13665/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@585
PS3, Line 585: PREDICATE
nit: FILTER? (predicate is a java construct)


http://gerrit.cloudera.org:8080/#/c/13665/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@642
PS3, Line 642:  private final Predicate<FileStatus> fileStatusPredicate_;
This is always used, no? Hard code it?


http://gerrit.cloudera.org:8080/#/c/13665/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@643
PS3, Line 643:     private final boolean useListStatus_;
Document this?


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@100
PS2, Line 100: dstFs.deleteOnExit(tmpTestPath);
> don't think so. This will clean up the directory when the JVM exits. The ot
Are you sure?

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2505

Its done in close(), so you probably need to wrap it in try-with-resources or 
explicitly call in a finally {}


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@112
PS2, Line 112: 24
> I am vary of using load() to test load() method's behavior. Seems like a an
not sure I follow. I think what were asserting here is that the number of 
loaded files in the sourcePath would be the same with/without the filters. So 
why not load the sourcePath, get the the numFiles, add the junk folders, load 
again and compare the load numbers? I think thats better than hardcoding.



--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Jun 2019 18:44:19 +0000
Gerrit-HasComments: Yes

Reply via email to