[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance

2012-08-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435110#comment-13435110
 ] 

Daryn Sharp commented on HADOOP-8649:
-

I'm just generally concerned about the implications of stacking filesystems.  
Ie. a {{FilterFileSystem}} over a {{ChRootedFileSystem}} over a 
{{FilterFileSystem}}, etc.  I'm not sure it's a problem, but you should make 
sure there are tests that prove the stacking works.

I conceptually like the approach suggested.  Throw something up and let's see 
how it looks!

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter) for improved performance
 

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, 
 HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, 
 HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, 
 TestChecksumFileSystemOnDFS.java, trunk-HADOOP-8649.patch, 
 trunk-HADOOP-8649.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). 
 The other form of listStatus(Path, customFilter) results in parsing the list 
 twice to apply each of the filters - custom and checksum filter.
 By using a composite filter instead, we limit the parsing to once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance

2012-08-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433220#comment-13433220
 ] 

Daryn Sharp commented on HADOOP-8649:
-

You may want to test if there's any incompatibilities with the chrooted 
filesystem.  If so, I wonder if it would be better as in more generalized, to 
push the change down into {{FilterFileSystem}} or {{FileSystem}} itself.  
Haven't thought it all the way through, but a compound filter may use an array 
and each filesystem is given the opportunity to add additional filters.

If there's no problem with chroot, and you feel that's too much work, perhaps 
it could be something for another jira.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter) for improved performance
 

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, 
 HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, 
 HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, 
 TestChecksumFileSystemOnDFS.java, trunk-HADOOP-8649.patch, 
 trunk-HADOOP-8649.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). 
 The other form of listStatus(Path, customFilter) results in parsing the list 
 twice to apply each of the filters - custom and checksum filter.
 By using a composite filter instead, we limit the parsing to once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance

2012-08-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433541#comment-13433541
 ] 

Karthik Kambatla commented on HADOOP-8649:
--

Thanks for the review, Daryn.

- I don't think it is incompatible with ChRootedFileSystem as it does not 
filter out any files.
- +1 on generalizing and pushing the change down to FileSystem itself.
-- We can add {{protected/public FileSystem#listStatus(Path f, ListPathFilter 
filters)}} and use {{MultiPathFilter}} as in {{o.a.h.m.FileInputFormat}}
-- All FileSystems can use this to build a list of {{PathFilter}}s to be 
evaluated.
-- {{o.a.h.m.FileInputFormat}} can use the common version of {{MultiPathFilter}}

If we decide on this, I can go ahead and make the required changes.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter) for improved performance
 

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: branch1-HADOOP-8649.patch, branch1-HADOOP-8649.patch, 
 HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, 
 HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, 
 TestChecksumFileSystemOnDFS.java, trunk-HADOOP-8649.patch, 
 trunk-HADOOP-8649.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). 
 The other form of listStatus(Path, customFilter) results in parsing the list 
 twice to apply each of the filters - custom and checksum filter.
 By using a composite filter instead, we limit the parsing to once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431992#comment-13431992
 ] 

Hadoop QA commented on HADOOP-8649:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12540032/trunk-HADOOP-8649.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.hdfs.TestFileConcurrentReader
  
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1272//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1272//console

This message is automatically generated.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter) for improved performance
 

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, 
 HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, 
 TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, 
 branch1-HADOOP-8649.patch, trunk-HADOOP-8649.patch, trunk-HADOOP-8649.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). 
 The other form of listStatus(Path, customFilter) results in parsing the list 
 twice to apply each of the filters - custom and checksum filter.
 By using a composite filter instead, we limit the parsing to once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance

2012-08-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432021#comment-13432021
 ] 

Karthik Kambatla commented on HADOOP-8649:
--

I don't think the patch has anything to do with the two failing tests, these 
tests fail on the latest trunk as well. Casual code examination shows no 
intersection between the patch and failing tests.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter) for improved performance
 

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, 
 HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, 
 TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, 
 branch1-HADOOP-8649.patch, trunk-HADOOP-8649.patch, trunk-HADOOP-8649.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). 
 The other form of listStatus(Path, customFilter) results in parsing the list 
 twice to apply each of the filters - custom and checksum filter.
 By using a composite filter instead, we limit the parsing to once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance

2012-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431337#comment-13431337
 ] 

Hadoop QA commented on HADOOP-8649:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12539893/trunk-HADOOP-8649.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.fs.TestLocalFSFileContextMainOperations
  org.apache.hadoop.fs.TestFileContextDeleteOnExit
  org.apache.hadoop.fs.TestFSMainOperationsLocalFileSystem
  org.apache.hadoop.hdfs.web.TestWebHDFS
  
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1266//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1266//console

This message is automatically generated.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter) for improved performance
 

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, 
 HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, 
 TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, 
 trunk-HADOOP-8649.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). 
 The other form of listStatus(Path, customFilter) results in parsing the list 
 twice to apply each of the filters - custom and checksum filter.
 By using a composite filter instead, we limit the parsing to once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter) for improved performance

2012-08-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431389#comment-13431389
 ] 

Karthik Kambatla commented on HADOOP-8649:
--

Found the javadoc warning. The noticed test failures seem to be due to one of 
the patch's tests creating a file and not deleting it. Running all the tests 
locally to make sure these issues are fixed. Will upload updated patch soon.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter) for improved performance
 

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch, 
 HADOOP-8649_branch1.patch_v2, HADOOP-8649_branch1.patch_v3, 
 TestChecksumFileSystemOnDFS.java, branch1-HADOOP-8649.patch, 
 trunk-HADOOP-8649.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). 
 The other form of listStatus(Path, customFilter) results in parsing the list 
 twice to apply each of the filters - custom and checksum filter.
 By using a composite filter instead, we limit the parsing to once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)

2012-08-07 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13430545#comment-13430545
 ] 

Daryn Sharp commented on HADOOP-8649:
-

What I _think_ I see in trunk is:
# (A) {{ChecksumFileSystem#listStatus(Path, PathFilter)}} calls (B) 
{{ChecksumFileSystem#listStatus(Path)}}
# (B) {{ChecksumFileSystem#listStatus(Path)}} calls (C) {{fs.listStatus(Path, 
ChecksumFileSystem.DEFAULT_FILTER)}} to filter out crcs
# (A) {{ChecksumFileSystem#listStatus(Path, PathFilter)}} further filters the 
crc filtered results with the custom {{PathFilter}}

Do your test cases show this analysis is wrong?  Or did you notice it through 
casual observation of the code?  Perhaps a composite {{PathFilter}} is more 
efficient on large directory listings, but I'm curious if there's actually a 
bug.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter)
 ---

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, 
 HADOOP-8649_branch1.patch_v3


 Currently, ChecksumFileSystem implements only listStatus(Path). The other 
 form of listStatus(Path, PathFilter) is implemented by parent class 
 FileSystem, and hence doesn't filter out check-sum files.
 The implementation should use a composite filter of passed Filter and the 
 Checksum filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)

2012-08-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13430626#comment-13430626
 ] 

Karthik Kambatla commented on HADOOP-8649:
--

Hi Daryn, thanks for your comments. I noticed it through casual observation. 
Let me put together a test case to test/explain this perceived bug better. Will 
update my patch soon with the new test case.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter)
 ---

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, 
 HADOOP-8649_branch1.patch_v3


 Currently, ChecksumFileSystem implements only listStatus(Path). The other 
 form of listStatus(Path, PathFilter) is implemented by parent class 
 FileSystem, and hence doesn't filter out check-sum files.
 The implementation should use a composite filter of passed Filter and the 
 Checksum filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)

2012-08-06 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429184#comment-13429184
 ] 

Daryn Sharp commented on HADOOP-8649:
-

Good catch!  In {{ChecksumFileSystem#listStatus(Path, PathFilter)}}:
# I question the null check, although useful, because {{FileSystems}} doesn't 
allow a null filter.  This would make a {{FilterFileSystem}} behave 
differently.  {{FileSystem}} itself should probably be changed to ignore a null 
since it appears to cause a NPE.
# The invocation order of the filters should be flipped from (user, default) to 
(default, user) to prevent a user supplied filter ever seeing a checksum file.  
I've seen user cases where the filter takes action on a matching path to avoid 
waiting for the entire listing to return.


 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter)
 ---

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). The other 
 form of listStatus(Path, PathFilter) is implemented by parent class 
 FileSystem, and hence doesn't filter out check-sum files.
 The implementation should use a composite filter of passed Filter and the 
 Checksum filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)

2012-08-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429259#comment-13429259
 ] 

Karthik Kambatla commented on HADOOP-8649:
--

Thanks for the review, Daryn. Great points, I have missed them.

bq. I question the null check, although useful, because FileSystems doesn't 
allow a null filter. This would make a FilterFileSystem behave differently. 
FileSystem itself should probably be changed to ignore a null since it appears 
to cause a NPE.

Agree. FileSystem should check for null. However, I believe ChecksumFileSystem 
should also check for a null, otherwise the joinFilter would be non-null and 
have a constituent null filter.

Fixed the invocation order - will upload the new patch soon.


 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter)
 ---

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). The other 
 form of listStatus(Path, PathFilter) is implemented by parent class 
 FileSystem, and hence doesn't filter out check-sum files.
 The implementation should use a composite filter of passed Filter and the 
 Checksum filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)

2012-08-06 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429277#comment-13429277
 ] 

Daryn Sharp commented on HADOOP-8649:
-

Yes, we're in agreement.  I intended to convey that a null should be handled in 
both {{FileSystem}} and {{ChecksumFileSystem}}, or neither.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter)
 ---

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch


 Currently, ChecksumFileSystem implements only listStatus(Path). The other 
 form of listStatus(Path, PathFilter) is implemented by parent class 
 FileSystem, and hence doesn't filter out check-sum files.
 The implementation should use a composite filter of passed Filter and the 
 Checksum filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)

2012-08-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429293#comment-13429293
 ] 

Karthik Kambatla commented on HADOOP-8649:
--

Wrong placement of null check in FileSystem. Will fix it, the test and update 
the patch.

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter)
 ---

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2


 Currently, ChecksumFileSystem implements only listStatus(Path). The other 
 form of listStatus(Path, PathFilter) is implemented by parent class 
 FileSystem, and hence doesn't filter out check-sum files.
 The implementation should use a composite filter of passed Filter and the 
 Checksum filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8649) ChecksumFileSystem should have an overriding implementation of listStatus(Path, PathFilter)

2012-08-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429459#comment-13429459
 ] 

Karthik Kambatla commented on HADOOP-8649:
--

Hi Daryn,

The trunk code seems to be the roughly the same as branch-1. So, I believe both 
have the bug.

Let me illustrate the issue that I see through some psuedo-code. Let me know if 
I am missing something here. 
{code}
DistributedFileSystem dfs = ... // some initialization of distributed file 
system.
ChecksumFileSystem cfs = new ChecksumFileSystem(dfs); // cfs.fs is set to dfs.

Path randomPath = new Path(random-path); // some path with 'random' in it. 
PathFilter randomFilter = new PathFilter() {
   boolean accept(Path file) {return !file.toString().contains(random);}
   };

FileStatus[] listWithoutFilter = cfs.listStatus(randomPath); // in turn calls 
dfs.listStatus(randomPath, ChecksumFileSystem.DEFAULT_FILTER)
FileStatus[] listWithFilter = cfs.listStatus(randomPath, randomFilter); // in 
turn calls dfs.listStatus(randomPath, randomFilter)
{code}

dfs.listStatus(Path, PathFilter) calls FileSystem.listStatus(Path, PathFilter), 
which first calls dfs.listStatus(path) and then applies PathFilter. Hence, 
while checksum filter is used in the first cfs.listStatus, it is not used in 
the second call to listStatus().

 ChecksumFileSystem should have an overriding implementation of 
 listStatus(Path, PathFilter)
 ---

 Key: HADOOP-8649
 URL: https://issues.apache.org/jira/browse/HADOOP-8649
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: HADOOP-8649_branch1.patch, HADOOP-8649_branch1.patch_v2, 
 HADOOP-8649_branch1.patch_v3


 Currently, ChecksumFileSystem implements only listStatus(Path). The other 
 form of listStatus(Path, PathFilter) is implemented by parent class 
 FileSystem, and hence doesn't filter out check-sum files.
 The implementation should use a composite filter of passed Filter and the 
 Checksum filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira