[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435110#comment-13435110 ] Daryn Sharp commented on HADOOP-8649: - I'm just generally concerned about the implications of stacking filesystems. Ie. a {{FilterFileSystem}} over a {{ChRootedFileSystem}} over a {{FilterFileSystem}}, etc. I'm not sure it's a problem, but you should make sure there are tests that prove the stacking works. I conceptually like the approach suggested. Throw something up and let's see how it looks! ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, TestChecksumFileSystemOnDFS.java, trunk-HADOOP-8649.patch, trunk-HADOOP-8649.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, customFilter) results in parsing the list twice to apply each of the filters - custom and checksum filter. By using a composite filter instead, we limit the parsing to once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433220#comment-13433220 ] Daryn Sharp commented on HADOOP-8649: - You may want to test if there's any incompatibilities with the chrooted filesystem. If so, I wonder if it would be better as in more generalized, to push the change down into {{FilterFileSystem}} or {{FileSystem}} itself. Haven't thought it all the way through, but a compound filter may use an array and each filesystem is given the opportunity to add additional filters. If there's no problem with chroot, and you feel that's too much work, perhaps it could be something for another jira. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, TestChecksumFileSystemOnDFS.java, trunk-HADOOP-8649.patch, trunk-HADOOP-8649.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, customFilter) results in parsing the list twice to apply each of the filters - custom and checksum filter. By using a composite filter instead, we limit the parsing to once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433541#comment-13433541 ] Karthik Kambatla commented on HADOOP-8649: -- Thanks for the review, Daryn. - I don't think it is incompatible with ChRootedFileSystem as it does not filter out any files. - +1 on generalizing and pushing the change down to FileSystem itself. -- We can add {{protected/public FileSystem#listStatus(Path f, ListPathFilter filters)}} and use {{MultiPathFilter}} as in {{o.a.h.m.FileInputFormat}} -- All FileSystems can use this to build a list of {{PathFilter}}s to be evaluated. -- {{o.a.h.m.FileInputFormat}} can use the common version of {{MultiPathFilter}} If we decide on this, I can go ahead and make the required changes. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, TestChecksumFileSystemOnDFS.java, trunk-HADOOP-8649.patch, trunk-HADOOP-8649.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, customFilter) results in parsing the list twice to apply each of the filters - custom and checksum filter. By using a composite filter instead, we limit the parsing to once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431992#comment-13431992 ] Hadoop QA commented on HADOOP-8649: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540032/trunk-HADOOP-8649.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1272//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1272//console This message is automatically generated. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, trunk-HADOOP-8649.patch, trunk-HADOOP-8649.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, customFilter) results in parsing the list twice to apply each of the filters - custom and checksum filter. By using a composite filter instead, we limit the parsing to once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432021#comment-13432021 ] Karthik Kambatla commented on HADOOP-8649: -- I don't think the patch has anything to do with the two failing tests, these tests fail on the latest trunk as well. Casual code examination shows no intersection between the patch and failing tests. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, trunk-HADOOP-8649.patch, trunk-HADOOP-8649.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, customFilter) results in parsing the list twice to apply each of the filters - custom and checksum filter. By using a composite filter instead, we limit the parsing to once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431337#comment-13431337 ] Hadoop QA commented on HADOOP-8649: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539893/trunk-HADOOP-8649.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.fs.TestLocalFSFileContextMainOperations org.apache.hadoop.fs.TestFileContextDeleteOnExit org.apache.hadoop.fs.TestFSMainOperationsLocalFileSystem org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1266//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1266//console This message is automatically generated. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, trunk-HADOOP-8649.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, customFilter) results in parsing the list twice to apply each of the filters - custom and checksum filter. By using a composite filter instead, we limit the parsing to once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431389#comment-13431389 ] Karthik Kambatla commented on HADOOP-8649: -- Found the javadoc warning. The noticed test failures seem to be due to one of the patch's tests creating a file and not deleting it. Running all the tests locally to make sure these issues are fixed. Will upload updated patch soon. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, trunk-HADOOP-8649.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, customFilter) results in parsing the list twice to apply each of the filters - custom and checksum filter. By using a composite filter instead, we limit the parsing to once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13430545#comment-13430545 ] Daryn Sharp commented on HADOOP-8649: - What I _think_ I see in trunk is: # (A) {{ChecksumFileSystem#listStatus(Path, PathFilter)}} calls (B) {{ChecksumFileSystem#listStatus(Path)}} # (B) {{ChecksumFileSystem#listStatus(Path)}} calls (C) {{fs.listStatus(Path, ChecksumFileSystem.DEFAULT_FILTER)}} to filter out crcs # (A) {{ChecksumFileSystem#listStatus(Path, PathFilter)}} further filters the crc filtered results with the custom {{PathFilter}} Do your test cases show this analysis is wrong? Or did you notice it through casual observation of the code? Perhaps a composite {{PathFilter}} is more efficient on large directory listings, but I'm curious if there's actually a bug. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) --- Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Bug Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3 Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, PathFilter) is implemented by parent class FileSystem, and hence doesn't filter out check-sum files. The implementation should use a composite filter of passed Filter and the Checksum filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13430626#comment-13430626 ] Karthik Kambatla commented on HADOOP-8649: -- Hi Daryn, thanks for your comments. I noticed it through casual observation. Let me put together a test case to test/explain this perceived bug better. Will update my patch soon with the new test case. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) --- Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Bug Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3 Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, PathFilter) is implemented by parent class FileSystem, and hence doesn't filter out check-sum files. The implementation should use a composite filter of passed Filter and the Checksum filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429184#comment-13429184 ] Daryn Sharp commented on HADOOP-8649: - Good catch! In {{ChecksumFileSystem#listStatus(Path, PathFilter)}}: # I question the null check, although useful, because {{FileSystems}} doesn't allow a null filter. This would make a {{FilterFileSystem}} behave differently. {{FileSystem}} itself should probably be changed to ignore a null since it appears to cause a NPE. # The invocation order of the filters should be flipped from (user, default) to (default, user) to prevent a user supplied filter ever seeing a checksum file. I've seen user cases where the filter takes action on a matching path to avoid waiting for the entire listing to return. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) --- Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Bug Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, PathFilter) is implemented by parent class FileSystem, and hence doesn't filter out check-sum files. The implementation should use a composite filter of passed Filter and the Checksum filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429259#comment-13429259 ] Karthik Kambatla commented on HADOOP-8649: -- Thanks for the review, Daryn. Great points, I have missed them. bq. I question the null check, although useful, because FileSystems doesn't allow a null filter. This would make a FilterFileSystem behave differently. FileSystem itself should probably be changed to ignore a null since it appears to cause a NPE. Agree. FileSystem should check for null. However, I believe ChecksumFileSystem should also check for a null, otherwise the joinFilter would be non-null and have a constituent null filter. Fixed the invocation order - will upload the new patch soon. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) --- Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Bug Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, PathFilter) is implemented by parent class FileSystem, and hence doesn't filter out check-sum files. The implementation should use a composite filter of passed Filter and the Checksum filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429277#comment-13429277 ] Daryn Sharp commented on HADOOP-8649: - Yes, we're in agreement. I intended to convey that a null should be handled in both {{FileSystem}} and {{ChecksumFileSystem}}, or neither. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) --- Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Bug Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, PathFilter) is implemented by parent class FileSystem, and hence doesn't filter out check-sum files. The implementation should use a composite filter of passed Filter and the Checksum filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429293#comment-13429293 ] Karthik Kambatla commented on HADOOP-8649: -- Wrong placement of null check in FileSystem. Will fix it, the test and update the patch. ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) --- Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Bug Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2 Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, PathFilter) is implemented by parent class FileSystem, and hence doesn't filter out check-sum files. The implementation should use a composite filter of passed Filter and the Checksum filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)
[ https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429459#comment-13429459 ] Karthik Kambatla commented on HADOOP-8649: -- Hi Daryn, The trunk code seems to be the roughly the same as branch-1. So, I believe both have the bug. Let me illustrate the issue that I see through some psuedo-code. Let me know if I am missing something here. {code} DistributedFileSystem dfs = ... // some initialization of distributed file system. ChecksumFileSystem cfs = new ChecksumFileSystem(dfs); // cfs.fs is set to dfs. Path randomPath = new Path(random-path); // some path with 'random' in it. PathFilter randomFilter = new PathFilter() { boolean accept(Path file) {return !file.toString().contains(random);} }; FileStatus[] listWithoutFilter = cfs.listStatus(randomPath); // in turn calls dfs.listStatus(randomPath, ChecksumFileSystem.DEFAULT_FILTER) FileStatus[] listWithFilter = cfs.listStatus(randomPath, randomFilter); // in turn calls dfs.listStatus(randomPath, randomFilter) {code} dfs.listStatus(Path, PathFilter) calls FileSystem.listStatus(Path, PathFilter), which first calls dfs.listStatus(path) and then applies PathFilter. Hence, while checksum filter is used in the first cfs.listStatus, it is not used in the second call to listStatus(). ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) --- Key: HADOOP-8649 URL: https://issues.apache.org/jira/browse/HADOOP-8649 Project: Hadoop Common Issue Type: Bug Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3 Currently, ChecksumFileSystem implements only listStatus(Path). The other form of listStatus(Path, PathFilter) is implemented by parent class FileSystem, and hence doesn't filter out check-sum files. The implementation should use a composite filter of passed Filter and the Checksum filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira