Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 )
Change subject: IMPALA-8663 : [WIP] FileMetadataLoader should skip listing files in hidden and tmp directories ...................................................................... Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@580 PS2, Line 580: /** : * Filter interface for a FileStatus. Useful for filtering out disinteresting : * FileStatuses from a given list : */ : public interface FileStatusFilter { : : /** : * Given a FileStatus returns if the filter accepts it or not : * @param fileStatus : * @return true if the filter accepts it, else false : */ : boolean accept(FileStatus fileStatus); : } : : /** : * FileStatusFilter implementation which is : * useful to filter out hidden directories and temporary staging directories : * which tools like Hive create in the table/partition directories when a query is : * inserting data into them. : */ : public static final FileStatusFilter HIDDEN_DIRECTORIES_FILTER = fileStatus -> { : if (!fileStatus.isDirectory()) return false; : String filename = fileStatus.getPath().getName(); : return filename.startsWith(".") || filename.startsWith("_tmp."); : }; You could also implement a Predicate<FileStatus> overriding test(). http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@656 PS2, Line 656: private final boolean isRecursive_; I think the refactor is confusing. RecursiingIterator can also have isRecursive_ = false? Can we clean it up a bit? http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@716 PS2, Line 716: { return; } nit: no braces http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java: http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@93 PS2, Line 93: hdfs://localhost:20500/ Do we need these qualifiers? I think the Configuration should resolve it to an appropriate file system http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@100 PS2, Line 100: dstFs.deleteOnExit(tmpTestPath); Do you need wrap the fs in try-with-resources for this to work? http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@112 PS2, Line 112: 24 Instead, we could try loading sourcePath and substitute the values? -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Wed, 19 Jun 2019 01:41:44 +0000 Gerrit-HasComments: Yes
