Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 )
Change subject: [acid] Predicate to test if dir must be included. ...................................................................... Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@73 PS2, Line 73: pathPredicate Can you clarify in the javadocs whether this predicate is applied to all paths or just to directories (to limit recursion)? Also, why use String here instead of Path? If using String we should also clarify what the string is (full URL? just reltaive path name to the part dir?) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@121 PS2, Line 121: fileStatuses = FileSystemUtil.listFiles(fs, partDir_, recursive_); I'd think we'd need the predicate here too. http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@485 PS2, Line 485: public void initializePartitionMetadata( was this necessary>? http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@607 PS2, Line 607: * TODO:Sudhanshu: Move this to interface and then use this implementation in BaseTableRef yea FeFsTable.Utils already has some similar stuff (it's in that Utils class because it predcates java 8 default interfaces) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@616 PS2, Line 616: "true" perhaps we should be using Boolean.parseValue here? Maybe we need a tolower as well? is 'TRUE' supported? http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1562 PS2, Line 1562: Preconditions.checkNotNull(hdfsBaseDir_, "HdfsTable base dir is null"); how did we get to this state? http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1726 PS2, Line 1726: public FileSystemUtil.FsType getFsType() { seems to have been pulled into your patch by mistake? http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@293 PS2, Line 293: //TODO:Sudhanshu: Understand how DirectMetadataProvider is used. sadly it's not used right now. Basically dead code from a half-finished attempt to excise catalogd. Perhaps we should remove it for the time being. http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@525 PS2, Line 525: return listFiles(fs, p, recursive); We need to postprocess this result to evaluate the predicate, right? http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@603 PS2, Line 603: boolean includeDir = predicate_.test(fileStatus.getPath().toString()); We should clarify throughout that the predicate only applies to limit recursion and does not actually limit the returned _files_. http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@45 PS2, Line 45: /* oldFds = */Collections.emptyList(), hostIndex, x->true); maybe add a new case to this test that passes a predicate to exclude one of the partitions or something? http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java File fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@46 PS2, Line 46: @Mock impressive mocking, but I wonder whether it isn't simpler to just change one of the other tests that actually runs against the real filesystem. It's still relatively fast to run, and we always assume that unit tests run in an envronment with FileSystem set up http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/planner/ExplainTest.java File fe/src/test/java/org/apache/impala/planner/ExplainTest.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@59 PS2, Line 59: public class ExplainTest extends FrontendTestBase { somehow this got pulled into your patch? -- To view, visit http://gerrit.cloudera.org:8080/13091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c Gerrit-Change-Number: 13091 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Yongzhi Chen <[email protected]> Gerrit-Comment-Date: Fri, 26 Apr 2019 06:04:18 +0000 Gerrit-HasComments: Yes
