[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7235: Attachment: HDFS-7235.007.patch DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186433#comment-14186433 ] Hadoop QA commented on HDFS-7235: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677534/HDFS-7235.007.patch against trunk revision 971e91c. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8562//console This message is automatically generated. DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7291) Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows
[ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186434#comment-14186434 ] Hadoop QA commented on HDFS-7291: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677527/HDFS-7291.3.patch against trunk revision 971e91c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8561//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8561//console This message is automatically generated. Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows Key: HDFS-7291 URL: https://issues.apache.org/jira/browse/HDFS-7291 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7291.0.patch, HDFS-7291.1.patch, HDFS-7291.2.patch, HDFS-7291.3.patch HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows. On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html, this is only supported on Linux kernel 2.6.33+. As pointed by Haowei in the discussion below, FileChannel#transferTo already has support for native unbuffered IO on POSIX platform. On Windows, JDK 6/7/8 has not implemented native unbuffered IO yet. We change to use FileChannel#transfer for POSIX and our own native wrapper of CopyFileEx on Windows for unbuffered copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186437#comment-14186437 ] Yongjun Zhang commented on HDFS-7235: - There was TestDistributedShell failure reported as YARN-2607, the symptom there looks a bit different than the one reported above. I ran the same test locally before (trunk tip 5b1dfe78b8b06335bed0bcb83f12bb936d4c021b) and after the patc, they failed the same way, but the symptom running locally is different than YARN-2607, and the report above. Seems this test need some more study, just uploaded the same patch here again to see how it runs. DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7235: Attachment: (was: HDFS-7235.007.patch) DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7235: Attachment: HDFS-7235.007.patch DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7291) Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows
[ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7291: - Attachment: HDFS-7291.4.patch The NativeIO.c has mix usage of IOException and UnsupportedOperationException in similar case. But agree with [~wheat9] UnsupportedOperationException is more appropriate, patch updated. The previous findbugs error is caused by test-patch.sh issue which is unrelated to the patch. /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/dev-support/test-patch.sh: line 628: /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/../patchprocess/patchFindBugsOutputhadoop-hdfs.txt: No such file or directory Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows Key: HDFS-7291 URL: https://issues.apache.org/jira/browse/HDFS-7291 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7291.0.patch, HDFS-7291.1.patch, HDFS-7291.2.patch, HDFS-7291.3.patch, HDFS-7291.4.patch HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows. On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html, this is only supported on Linux kernel 2.6.33+. As pointed by Haowei in the discussion below, FileChannel#transferTo already has support for native unbuffered IO on POSIX platform. On Windows, JDK 6/7/8 has not implemented native unbuffered IO yet. We change to use FileChannel#transfer for POSIX and our own native wrapper of CopyFileEx on Windows for unbuffered copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186453#comment-14186453 ] Hadoop QA commented on HDFS-7235: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677536/HDFS-7235.007.patch against trunk revision 971e91c. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8564//console This message is automatically generated. DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7291) Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows
[ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186459#comment-14186459 ] Hadoop QA commented on HDFS-7291: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677537/HDFS-7291.4.patch against trunk revision 971e91c. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8563//console This message is automatically generated. Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows Key: HDFS-7291 URL: https://issues.apache.org/jira/browse/HDFS-7291 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7291.0.patch, HDFS-7291.1.patch, HDFS-7291.2.patch, HDFS-7291.3.patch, HDFS-7291.4.patch HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows. On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html, this is only supported on Linux kernel 2.6.33+. As pointed by Haowei in the discussion below, FileChannel#transferTo already has support for native unbuffered IO on POSIX platform. On Windows, JDK 6/7/8 has not implemented native unbuffered IO yet. We change to use FileChannel#transfer for POSIX and our own native wrapper of CopyFileEx on Windows for unbuffered copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186474#comment-14186474 ] Harsh J commented on HDFS-6741: --- Failed test appears unrelated. Manual run with patch passes, so the build problem was likely intermittent: {code}Running org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 58.649 sec - in org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA Results : Tests run: 10, Failures: 0, Errors: 0, Skipped: 0 {code} Build console appears truncated somehow for this test: {code} Running org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA estBlockMissingException {code} Thank you for the new patch Stephen, sorry again for having duplicated the effort. I'll commit this in momentarily. Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-6741: -- Target Version/s: (was: 3.0.0, 2.7.0) Hadoop Flags: Reviewed Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186481#comment-14186481 ] bc Wong commented on HDFS-7295: --- bq. Given the fact that in Hadoop there is no way to revoke a DT, expiration time serves as the last defense of stole tokens. Not quite true. The [mechanism|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java#L514] is there, and even exposed in [WebHDFS|http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Cancel_Delegation_Token]. But I'll concede that users can't get a list of all outstanding DTs (short of using the OIV), which make revocation difficult. Let's separate the right security model from current feature limitations in HDFS. It's straightforward to build a revocation mechanism, along with some stats reporting on DT usages, plus auditing. So lack of revocation today shouldn't affect the direction we choose. The alternative, which is to put real users' keytabs on the cluster, is far worse. (Again, the use case example is a long running Spark Streaming app, which runs as a real user, not a service account.) First, a compromise on the keytab affects the user's corporate AD account. Second, normal users can't get keytabs usually. I think it's hard to for most enterprise users to accept this alternative. Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-6741: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Committed to branch-2 and trunk, thank you again Stephen! Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Fix For: 2.7.0 Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186486#comment-14186486 ] Stephen Chu commented on HDFS-6741: --- Thanks a lot, Harsh! Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Fix For: 2.7.0 Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186488#comment-14186488 ] Hudson commented on HDFS-6741: -- FAILURE: Integrated in Hadoop-trunk-Commit #6365 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6365/]) HDFS-6741. Improve permission denied message when FSPermissionChecker#checkOwner fails. Contributed by Stephen Chu and Harsh J. (harsh) (harsh: rev 0398db19b2c4558a9f08ac2700a27752748896fa) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPermission.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Fix For: 2.7.0 Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186510#comment-14186510 ] Haohui Mai commented on HDFS-7295: -- bq. It's straightforward to build a revocation mechanism. This is a common misconception. I should have explained the threat model upfront, and what revocation exactly means. The threat model is that (1) an attacker can steal the user's DT, (2) the system has no knowledge which token is stolen, and (3) the system should not allow the attacker to reauthenticate indefinitely using the stolen token. The explicit revocation mechanism (canceling DT) you pointed out only works if the NN know exactly which token is stolen, which is unfortunately not the case in real-world environment. That's the exact reason why most of the capability systems also have implicit revocation mechanism -- capabilities always have expiration date, and the system ask the user to reauthenticate periodically to renew their capabilities. bq. The alternative, which is to put real users' keytabs on the cluster, is far worse. (Again, the use case example is a long running Spark Streaming app, which runs as a real user, not a service account.) I'm not sure this is a fair comparison. If the requirement is to run long-lasting jobs securely in the cluster, I'm unconvinced that the proposed approach actually runs the jobs securely w.r.t. the threat model, as it contains security flaws pointed out by [~ste...@apache.org] and [~aw]. I understand there is usability concern, but this is an important correctness issue from a security point of view. Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7299) Hadoop Namenode failing because of negative value in fsimage
Vishnu Ganth created HDFS-7299: -- Summary: Hadoop Namenode failing because of negative value in fsimage Key: HDFS-7299 URL: https://issues.apache.org/jira/browse/HDFS-7299 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Vishnu Ganth Hadoop Namenode is getting failed because of some unexpected value of block size in fsimage. Stack trace: {code} 2014-10-27 16:22:12,107 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = mastermachine-hostname/ip STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.4.0 STARTUP_MSG: classpath =
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186545#comment-14186545 ] bc Wong commented on HDFS-7295: --- Yup. The thread model is a good place to start. * Re (2) the system has no knowledge which token is stolen --- I think auditing can definitely help. The NN audit logger does tell you who from which host is accessing what file using which DT. * Re (3) the system should not allow the attacker to reauthenticate indefinitely using the stolen token --- This is where the configurable lifetime cap comes in. For example, some IT admins make people change passwords every 3 months, some every year. If the config is user hbase's DT never expires, then that's the same as having hbase's keytab on all nodes on the cluster. I don't think that (3) is reasonable since not even Kerberos can satisfy (3) if the attacker can steal the keytab in this case. bq. I'm not sure this is a fair comparison Could you elaborate more on that? The options brought up so far are (A) arbitrary DT lifetime and (B) deploying users' keytabs, I'd think that (B) is less secure due to its consequence. In addition, in all the situations (that I can think of) where an attacker can steal the DT, the same attack can be used to steal the keytab. Do we have other proposals for the long running user app use case (e.g. Spark Streaming)? Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5894) Refactor a private internal class DataTransferEncryptor.SaslParticipant
[ https://issues.apache.org/jira/browse/HDFS-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-5894: -- Status: Patch Available (was: Open) Refactor a private internal class DataTransferEncryptor.SaslParticipant --- Key: HDFS-5894 URL: https://issues.apache.org/jira/browse/HDFS-5894 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Hiroshi Ikeda Assignee: Harsh J Priority: Trivial Attachments: HDFS-5894.patch, HDFS-5894.patch It is appropriate to use polymorphism for SaslParticipant instead of scattering if-else statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5894) Refactor a private internal class DataTransferEncryptor.SaslParticipant
[ https://issues.apache.org/jira/browse/HDFS-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-5894: -- Attachment: HDFS-5894.patch Thank you for getting back. I've attached a new patch that applies to current trunk state. Refactor a private internal class DataTransferEncryptor.SaslParticipant --- Key: HDFS-5894 URL: https://issues.apache.org/jira/browse/HDFS-5894 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Hiroshi Ikeda Assignee: Harsh J Priority: Trivial Attachments: HDFS-5894.patch, HDFS-5894.patch It is appropriate to use polymorphism for SaslParticipant instead of scattering if-else statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5894) Refactor a private internal class DataTransferEncryptor.SaslParticipant
[ https://issues.apache.org/jira/browse/HDFS-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186559#comment-14186559 ] Hadoop QA commented on HDFS-5894: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677557/HDFS-5894.patch against trunk revision 0398db1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8565//console This message is automatically generated. Refactor a private internal class DataTransferEncryptor.SaslParticipant --- Key: HDFS-5894 URL: https://issues.apache.org/jira/browse/HDFS-5894 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Hiroshi Ikeda Assignee: Harsh J Priority: Trivial Attachments: HDFS-5894.patch, HDFS-5894.patch It is appropriate to use polymorphism for SaslParticipant instead of scattering if-else statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-6606: - Attachment: HDFS-6606.009.patch Chris, you are right, I update the patch to address it. Thank you, ATM and tucu for the review. Also thanks your volunteer to commit :) Thanks Suresh, Andy, Mike and Srikanth for the comments. Optimize HDFS Encrypted Transport performance - Key: HDFS-6606 URL: https://issues.apache.org/jira/browse/HDFS-6606 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, security Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, HDFS-6606.006.patch, HDFS-6606.007.patch, HDFS-6606.008.patch, HDFS-6606.009.patch, OptimizeHdfsEncryptedTransportperformance.pdf In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, it was a great work. It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports three security strength: * high 3des or rc4 (128bits) * medium des or rc4(56bits) * low rc4(40bits) 3des and rc4 are slow, only *tens of MB/s*, http://www.javamex.com/tutorials/cryptography/ciphers.shtml http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ I will give more detailed performance data in future. Absolutely it’s bottleneck and will vastly affect the end to end performance. AES(Advanced Encryption Standard) is recommended as a replacement of DES, it’s more secure; with AES-NI support, the throughput can reach nearly *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add a new mode support for AES). This JIRA will use AES with AES-NI support as encryption algorithm for DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7299) Hadoop Namenode failing because of negative value in fsimage
[ https://issues.apache.org/jira/browse/HDFS-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186582#comment-14186582 ] Hu Liu, commented on HDFS-7299: --- It seems that the fsimage is broken. You can use the offline image viewer to confirm. Hadoop Namenode failing because of negative value in fsimage Key: HDFS-7299 URL: https://issues.apache.org/jira/browse/HDFS-7299 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Vishnu Ganth Hadoop Namenode is getting failed because of some unexpected value of block size in fsimage. Stack trace: {code} 2014-10-27 16:22:12,107 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = mastermachine-hostname/ip STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.4.0 STARTUP_MSG: classpath =
[jira] [Commented] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186625#comment-14186625 ] Tony Reix commented on HDFS-6515: - Hello Reading the console output, I see nothing related to the patch that could be the root cause of the failure. The error seems to be around: HDFS-6515 patch is being downloaded at Mon Oct 27 07:51:14 UTC 2014 from http://issues.apache.org/jira/secure/attachment/12654094/HDFS-6515-1.patch cp: cannot stat '/home/jenkins/buildSupport/lib/*': No such file or directory The patch does not appear to apply with p0 to p2 PATCH APPLICATION FAILED where the error deals with some lib stored in Jenkins directory. On my side, I gonna check that the patch works fine on the branch 'trunk'. testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: test Attachments: HDFS-6515-1.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186642#comment-14186642 ] Steve Loughran commented on HDFS-7295: -- [~aw] bq. What Steve Loughran said. I don't know whether to be pleased or scared by the fact you are agreeing with me. Maybe both. [~adhoot] bq. My concern is the damage with a stolen keytab is far greater than the HDFS token. Its universal kerberos identity versus something that works only with HDFS. In a more complex application you end up needing to authenticate IPC/REST between different services anyway. Example: pool of tomcat instances talking to HBase in YARN running against HDFS. Keytabs avoid having different solutions for different parts of the stack. For the example cited, I'd just have one single app account for the HBase and tomcat instances; {{sudo}} launch them all as that user. bq. Ops team might consider a longer delegation token to be lower risk than having a more valuable asset - users's keytab - be exposed on a wide surface area (we need all nodes to have access to the keytabs) push it out during localization; rely on the NM to set up the paths securely and to clean up afterwards. The weaknesses become # packet sniffing. Better encrypt your wires. # NM process fails, container then terminates: no cleanup # malicious processes able to gain root access to the system. But do that and you get enough other things away... bq. Using keytabs for headless accounts will work for services that do not use the user account. Spark streaming, for example, runs as the user just like Map Reduce. This would mean asking user to create and deploy keytabs for those scenarios, correct? Depends on the duration of the instance. Short-lived: no. Medium lived: no. Long-lived, you need a keytab —but it does not have to be that of the user submitting the job, merely one with access to the (persistent) data. [~bcwalrus] bq. perhaps we can add a whitelist/blacklist for who can set arbitrary lifetime on their DT, and whether there is a cap to the lifetime. This adding even more complexity to a security system that is already hard for some people (myself, for example) to understand. bq. It's straightforward to build a revocation mechanism, along with some stats reporting on DT usages, plus auditing. Yes —but does it scale? Is every request going to have to trigger a token revocation check, or simply a fraction? Even with that fraction, what load ends up being placed on the infrastructure -including potentially the enterprise wide Kerberos/AD systems. We also need to think about the availability of this token revocation check infrastructure, whether to hide in the NN and add more overhead there (as well as more data to keep in sync), or deploy and manage some other token revocation infrastructure. I am not, personally, enthused by the idea. I don't think anyone pretends that keytabs are an ideal solution, I know some cluster ops teams will be unhappy about this, but also think that saying near-indefinite kerberos tokens isn't going to make those people happy either. There's another option which we looked at for slider: pushing out new tokens from the client, just as the RM does token renewal today. you've got to remember to refresh them regularly, and be able to get those tokens to the processes in the YARN containers, processes that may then want to switch over to them. I could imagine this though, with Oozie jobs scheduled to do the renewal, and something in YARN to help with token propagation. Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186650#comment-14186650 ] Tony Reix commented on HDFS-6515: - Patching the trunk of Hadoop Common trunk from official GitHub with the patch provided here works perfectly : $ patch -p0 ../HDFS-6515-1.patch patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java Hunk #1 succeeded at 166 (offset 1 line). patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java I've checked the 2 files and they are OK. testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: test Attachments: HDFS-6515-1.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186652#comment-14186652 ] Steve Loughran commented on HDFS-7295: -- I should add that in TWILL-101, twill is doing the push out new token strategy Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7299) Hadoop Namenode failing because of negative value in fsimage
[ https://issues.apache.org/jira/browse/HDFS-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186678#comment-14186678 ] Vishnu Ganth commented on HDFS-7299: [~huLiu]Thanks for the response Liu. I tried giving the command hdfs oiv -i fsimage_file -o output. I am getting the directory structure of hdfs. But how to confirm whether it is broken or working fine? Hadoop Namenode failing because of negative value in fsimage Key: HDFS-7299 URL: https://issues.apache.org/jira/browse/HDFS-7299 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Vishnu Ganth Hadoop Namenode is getting failed because of some unexpected value of block size in fsimage. Stack trace: {code} 2014-10-27 16:22:12,107 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = mastermachine-hostname/ip STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.4.0 STARTUP_MSG: classpath =
[jira] [Commented] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186699#comment-14186699 ] Tony Reix commented on HDFS-6515: - There is an issue on Hadoop Common GitHub Web page: downloading ZIP File from Download ZIP button generated a set of source code that is old, probably dated August 23th, since the latest commit displayed is said to be: Arpit Agarwal arp7 authored on 23 Aug latest commit 42a61a4fbc When getting Hadoop Common source code with git clone, I see that the patch fails. I've change the patch, and tested it on a git clone, and that works. I gonna push a new patch now. testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: test Attachments: HDFS-6515-1.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Reix updated HDFS-6515: Status: Open (was: Patch Available) Source code of Hadoop has changed since the patch was produced. I gonna provide a new version of the patch, that works with today's code. testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0, 3.0.0 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: test Attachments: HDFS-6515-1.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Reix updated HDFS-6515: Labels: hadoop test (was: test) Affects Version/s: 2.4.1 Status: Patch Available (was: Open) Previous patch has been updated to work with current Hadoop code. $ patch -p0 /tmp/HDFS-6515-2.patch patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java Hunk #1 succeeded at 171 (offset 6 lines). patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java git status # On branch trunk # Changes not staged for commit: # (use git add file... to update what will be committed) # (use git checkout -- file... to discard changes in working directory) # # modified: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java # modified: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.1, 2.4.0, 3.0.0 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: test, hadoop Attachments: HDFS-6515-1.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Reix updated HDFS-6515: Attachment: HDFS-6515-2.patch testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0, 2.4.1 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: hadoop, test Attachments: HDFS-6515-1.patch, HDFS-6515-2.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6934) Move checksum computation off the hot path when writing to RAM disk
[ https://issues.apache.org/jira/browse/HDFS-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186727#comment-14186727 ] Hudson commented on HDFS-6934: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) HDFS-6934. Move checksum computation off the hot path when writing to RAM disk. Contributed by Chris Nauroth. (cnauroth: rev 463aec11718e47d4aabb86a7a539cb973460aae6) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestShell.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestScrLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LazyPersistTestCase.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/ReplicaOutputStreams.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskReplicaLruTracker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskAsyncLazyPersistService.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java HDFS-6934. Revert files accidentally committed. (cnauroth: rev 5b1dfe78b8b06335bed0bcb83f12bb936d4c021b) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestShell.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java Move checksum computation off the hot path when writing to RAM disk --- Key: HDFS-6934 URL: https://issues.apache.org/jira/browse/HDFS-6934 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Arpit Agarwal Assignee: Chris Nauroth Fix For: 2.6.0 Attachments: HDFS-6934-branch-2.6.5.patch, HDFS-6934.3.patch, HDFS-6934.4.patch, HDFS-6934.5.patch, h6934_20141003b.patch, h6934_20141005.patch Since local RAM is considered reliable we can avoid writing checksums on the hot path when replicas are being written to a local RAM disk. The checksum can be computed by the lazy writer when moving replicas to disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186729#comment-14186729 ] Hudson commented on HDFS-5928: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) HDFS-5928. Show namespace and namenode ID on NN dfshealth page. Contributed by Siqi Li. (wheat9: rev 00b4e44a2eba871b4ab47e51c52de95b12dca82e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.7.0 Attachments: HDFS-5928.007.patch, HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v6.patch, HDFS-5928.v1.patch, screenshot-1.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6538) Comment format error in ShortCircuitRegistry javadoc
[ https://issues.apache.org/jira/browse/HDFS-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186731#comment-14186731 ] Hudson commented on HDFS-6538: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) HDFS-6538. Comment format error in ShortCircuitRegistry javadoc. Contributed by David Luo. (harsh) (harsh: rev 0058eadbd3149a5dee1ffc69c2d9f21caa916fb5) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java Comment format error in ShortCircuitRegistry javadoc Key: HDFS-6538 URL: https://issues.apache.org/jira/browse/HDFS-6538 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: debugging Assignee: David Luo Priority: Trivial Labels: documentation Fix For: 2.7.0 Attachments: HDFS-6538.patch Original Estimate: 1h Remaining Estimate: 1h The element comment for javadoc should be started by {noformat}/**{noformat}, but it starts with only {noformat}/*{noformat} for class ShortCircuitRegistry. So I think there is a {noformat}*{noformat} Omitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7282) Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC
[ https://issues.apache.org/jira/browse/HDFS-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186734#comment-14186734 ] Hudson commented on HDFS-7282: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) HDFS-7282. Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC (Jinghui Wang via Colin P. McCabe) (cmccabe: rev 518a7f4af3d8deeecabfa0629b16521ce09de459) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC - Key: HDFS-7282 URL: https://issues.apache.org/jira/browse/HDFS-7282 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.1 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.7.0 Attachments: HDFS-7282.patch TemporarySocketDirectory has finalize method deletes the directory, in TestShortCircuitCache and TestBlockReaderFactory, the TemporarySocketDirectory created are not refereced later in the tests, which can get garbage collected (deleted the dir) before Datanode start up accessing the directory under TemporarySocketDirectory causing FileNotFoundException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186722#comment-14186722 ] Hudson commented on HDFS-6741: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) HDFS-6741. Improve permission denied message when FSPermissionChecker#checkOwner fails. Contributed by Stephen Chu and Harsh J. (harsh) (harsh: rev 0398db19b2c4558a9f08ac2700a27752748896fa) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPermission.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Fix For: 2.7.0 Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186719#comment-14186719 ] Hadoop QA commented on HDFS-6606: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677559/HDFS-6606.009.patch against trunk revision 0398db1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.viewfs.TestViewFsHdfs org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8566//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8566//console This message is automatically generated. Optimize HDFS Encrypted Transport performance - Key: HDFS-6606 URL: https://issues.apache.org/jira/browse/HDFS-6606 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, security Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, HDFS-6606.006.patch, HDFS-6606.007.patch, HDFS-6606.008.patch, HDFS-6606.009.patch, OptimizeHdfsEncryptedTransportperformance.pdf In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, it was a great work. It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports three security strength: * high 3des or rc4 (128bits) * medium des or rc4(56bits) * low rc4(40bits) 3des and rc4 are slow, only *tens of MB/s*, http://www.javamex.com/tutorials/cryptography/ciphers.shtml http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ I will give more detailed performance data in future. Absolutely it’s bottleneck and will vastly affect the end to end performance. AES(Advanced Encryption Standard) is recommended as a replacement of DES, it’s more secure; with AES-NI support, the throughput can reach nearly *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add a new mode support for AES). This JIRA will use AES with AES-NI support as encryption algorithm for DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186724#comment-14186724 ] Hudson commented on HDFS-7278: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) HDFS-7278. Add a command that allows sysadmins to manually trigger full block reports from a DN (cmccabe) (cmccabe: rev baf794dc404ac54f4e8332654eadfac1bebacb8f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/BlockReportOptions.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7278.002.patch, HDFS-7278.003.patch, HDFS-7278.004.patch, HDFS-7278.005.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186737#comment-14186737 ] Hadoop QA commented on HDFS-6515: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677586/HDFS-6515-2.patch against trunk revision c9bec46. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8567//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8567//console This message is automatically generated. testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0, 2.4.1 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: hadoop, test Attachments: HDFS-6515-1.patch, HDFS-6515-2.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7299) Hadoop Namenode failing because of negative value in fsimage
[ https://issues.apache.org/jira/browse/HDFS-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186756#comment-14186756 ] Hu Liu, commented on HDFS-7299: --- If you can get the correct directory structure without any error, the fsimage should be ok. Hadoop Namenode failing because of negative value in fsimage Key: HDFS-7299 URL: https://issues.apache.org/jira/browse/HDFS-7299 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Vishnu Ganth Hadoop Namenode is getting failed because of some unexpected value of block size in fsimage. Stack trace: {code} 2014-10-27 16:22:12,107 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = mastermachine-hostname/ip STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.4.0 STARTUP_MSG: classpath =
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186760#comment-14186760 ] Steve Loughran commented on HDFS-7295: -- Linking to YARN-2704 and log aggregation. There's also the need of the NM's to be able to get the localised resources of the AM submission in the event of an AM restart event after the original tokens have expired Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7299) Hadoop Namenode failing because of negative value in fsimage
[ https://issues.apache.org/jira/browse/HDFS-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186781#comment-14186781 ] Vishnu Ganth commented on HDFS-7299: Thanks Liu. i am getting the correct directory structure using offline image viewer. So any further ways to debug this issue.. Hadoop Namenode failing because of negative value in fsimage Key: HDFS-7299 URL: https://issues.apache.org/jira/browse/HDFS-7299 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Vishnu Ganth Hadoop Namenode is getting failed because of some unexpected value of block size in fsimage. Stack trace: {code} 2014-10-27 16:22:12,107 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = mastermachine-hostname/ip STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.4.0 STARTUP_MSG: classpath =
[jira] [Updated] (HDFS-5894) Refactor a private internal class DataTransferEncryptor.SaslParticipant
[ https://issues.apache.org/jira/browse/HDFS-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-5894: -- Attachment: HDFS-5894.patch Re-uploading patch to retry after the build-bot was fixed to properly apply patches. Refactor a private internal class DataTransferEncryptor.SaslParticipant --- Key: HDFS-5894 URL: https://issues.apache.org/jira/browse/HDFS-5894 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Hiroshi Ikeda Assignee: Harsh J Priority: Trivial Attachments: HDFS-5894.patch, HDFS-5894.patch, HDFS-5894.patch It is appropriate to use polymorphism for SaslParticipant instead of scattering if-else statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37
[ https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-4836: -- Resolution: Duplicate Status: Resolved (was: Patch Available) Update Tomcat version for httpfs to 6.0.37 -- Key: HDFS-4836 URL: https://issues.apache.org/jira/browse/HDFS-4836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Trivial Attachments: HDFS-4836.patch Tomcat has release a new version of tomcat with security fixes http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37
[ https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186794#comment-14186794 ] Harsh J commented on HDFS-4836: --- Similar JIRA HADOOP-10814 has bumped it up to 6.0.41 on trunk. Update Tomcat version for httpfs to 6.0.37 -- Key: HDFS-4836 URL: https://issues.apache.org/jira/browse/HDFS-4836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Trivial Attachments: HDFS-4836.patch Tomcat has release a new version of tomcat with security fixes http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7299) Hadoop Namenode failing because of negative value in fsimage
[ https://issues.apache.org/jira/browse/HDFS-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186800#comment-14186800 ] Vishnu Ganth commented on HDFS-7299: I found one of the file in hdfs containing negative NUM_BYTES set in fsimage. INODE INODE_PATH = /user/root/dir/out/part-m-05990 REPLICATION = 3 MODIFICATION_TIME = 2014-09-05 04:09 ACCESS_TIME = 2014-09-05 07:42 BLOCK_SIZE = 134217728 BLOCKS [NUM_BLOCKS = 1] BLOCK BLOCK_ID = 8582078737 *NUM_BYTES = -1945969516689645797* GENERATION_STAMP = 5 NS_QUOTA = -1 DS_QUOTA = -1 PERMISSIONS USER_NAME = root GROUP_NAME = supergroup PERMISSION_STRING = rw-r--r-- Is there any way to edit the fsimage file... Hadoop Namenode failing because of negative value in fsimage Key: HDFS-7299 URL: https://issues.apache.org/jira/browse/HDFS-7299 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Vishnu Ganth Hadoop Namenode is getting failed because of some unexpected value of block size in fsimage. Stack trace: {code} 2014-10-27 16:22:12,107 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = mastermachine-hostname/ip STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.4.0 STARTUP_MSG: classpath =
[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186802#comment-14186802 ] Hudson commented on HDFS-6606: -- FAILURE: Integrated in Hadoop-trunk-Commit #6367 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6367/]) HDFS-6606. Optimize HDFS Encrypted Transport performance. (yliu) (yliu: rev 58c0bb9ed9f4a2491395b63c68046562a73526c9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslParticipant.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CipherOption.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslResponseWithNegotiatedCipherOption.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptedTransfer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto Optimize HDFS Encrypted Transport performance - Key: HDFS-6606 URL: https://issues.apache.org/jira/browse/HDFS-6606 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, security Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, HDFS-6606.006.patch, HDFS-6606.007.patch, HDFS-6606.008.patch, HDFS-6606.009.patch, OptimizeHdfsEncryptedTransportperformance.pdf In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, it was a great work. It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports three security strength: * high 3des or rc4 (128bits) * medium des or rc4(56bits) * low rc4(40bits) 3des and rc4 are slow, only *tens of MB/s*, http://www.javamex.com/tutorials/cryptography/ciphers.shtml http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ I will give more detailed performance data in future. Absolutely it’s bottleneck and will vastly affect the end to end performance. AES(Advanced Encryption Standard) is recommended as a replacement of DES, it’s more secure; with AES-NI support, the throughput can reach nearly *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add a new mode support for AES). This JIRA will use AES with AES-NI support as encryption algorithm for DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186821#comment-14186821 ] Hudson commented on HDFS-6741: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) HDFS-6741. Improve permission denied message when FSPermissionChecker#checkOwner fails. Contributed by Stephen Chu and Harsh J. (harsh) (harsh: rev 0398db19b2c4558a9f08ac2700a27752748896fa) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPermission.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Fix For: 2.7.0 Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6934) Move checksum computation off the hot path when writing to RAM disk
[ https://issues.apache.org/jira/browse/HDFS-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186826#comment-14186826 ] Hudson commented on HDFS-6934: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) HDFS-6934. Move checksum computation off the hot path when writing to RAM disk. Contributed by Chris Nauroth. (cnauroth: rev 463aec11718e47d4aabb86a7a539cb973460aae6) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestScrLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskAsyncLazyPersistService.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/ReplicaOutputStreams.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestShell.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LazyPersistTestCase.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskReplicaLruTracker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java HDFS-6934. Revert files accidentally committed. (cnauroth: rev 5b1dfe78b8b06335bed0bcb83f12bb936d4c021b) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestShell.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java Move checksum computation off the hot path when writing to RAM disk --- Key: HDFS-6934 URL: https://issues.apache.org/jira/browse/HDFS-6934 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Arpit Agarwal Assignee: Chris Nauroth Fix For: 2.6.0 Attachments: HDFS-6934-branch-2.6.5.patch, HDFS-6934.3.patch, HDFS-6934.4.patch, HDFS-6934.5.patch, h6934_20141003b.patch, h6934_20141005.patch Since local RAM is considered reliable we can avoid writing checksums on the hot path when replicas are being written to a local RAM disk. The checksum can be computed by the lazy writer when moving replicas to disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186823#comment-14186823 ] Hudson commented on HDFS-7278: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) HDFS-7278. Add a command that allows sysadmins to manually trigger full block reports from a DN (cmccabe) (cmccabe: rev baf794dc404ac54f4e8332654eadfac1bebacb8f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/BlockReportOptions.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7278.002.patch, HDFS-7278.003.patch, HDFS-7278.004.patch, HDFS-7278.005.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186828#comment-14186828 ] Hudson commented on HDFS-5928: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) HDFS-5928. Show namespace and namenode ID on NN dfshealth page. Contributed by Siqi Li. (wheat9: rev 00b4e44a2eba871b4ab47e51c52de95b12dca82e) * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.7.0 Attachments: HDFS-5928.007.patch, HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v6.patch, HDFS-5928.v1.patch, screenshot-1.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6538) Comment format error in ShortCircuitRegistry javadoc
[ https://issues.apache.org/jira/browse/HDFS-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186830#comment-14186830 ] Hudson commented on HDFS-6538: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) HDFS-6538. Comment format error in ShortCircuitRegistry javadoc. Contributed by David Luo. (harsh) (harsh: rev 0058eadbd3149a5dee1ffc69c2d9f21caa916fb5) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java Comment format error in ShortCircuitRegistry javadoc Key: HDFS-6538 URL: https://issues.apache.org/jira/browse/HDFS-6538 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: debugging Assignee: David Luo Priority: Trivial Labels: documentation Fix For: 2.7.0 Attachments: HDFS-6538.patch Original Estimate: 1h Remaining Estimate: 1h The element comment for javadoc should be started by {noformat}/**{noformat}, but it starts with only {noformat}/*{noformat} for class ShortCircuitRegistry. So I think there is a {noformat}*{noformat} Omitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186838#comment-14186838 ] Yi Liu commented on HDFS-6606: -- Chris, I commit it to avoid rebase, since I see other JIRA doing small refactor for _SaslParticipant_. Thanks again for your review. Optimize HDFS Encrypted Transport performance - Key: HDFS-6606 URL: https://issues.apache.org/jira/browse/HDFS-6606 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, security Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, HDFS-6606.006.patch, HDFS-6606.007.patch, HDFS-6606.008.patch, HDFS-6606.009.patch, OptimizeHdfsEncryptedTransportperformance.pdf In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, it was a great work. It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports three security strength: * high 3des or rc4 (128bits) * medium des or rc4(56bits) * low rc4(40bits) 3des and rc4 are slow, only *tens of MB/s*, http://www.javamex.com/tutorials/cryptography/ciphers.shtml http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ I will give more detailed performance data in future. Absolutely it’s bottleneck and will vastly affect the end to end performance. AES(Advanced Encryption Standard) is recommended as a replacement of DES, it’s more secure; with AES-NI support, the throughput can reach nearly *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add a new mode support for AES). This JIRA will use AES with AES-NI support as encryption algorithm for DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-6606: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) commit to trunk, branch-2, branch-2.6 Optimize HDFS Encrypted Transport performance - Key: HDFS-6606 URL: https://issues.apache.org/jira/browse/HDFS-6606 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, security Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.6.0 Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, HDFS-6606.006.patch, HDFS-6606.007.patch, HDFS-6606.008.patch, HDFS-6606.009.patch, OptimizeHdfsEncryptedTransportperformance.pdf In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, it was a great work. It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports three security strength: * high 3des or rc4 (128bits) * medium des or rc4(56bits) * low rc4(40bits) 3des and rc4 are slow, only *tens of MB/s*, http://www.javamex.com/tutorials/cryptography/ciphers.shtml http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ I will give more detailed performance data in future. Absolutely it’s bottleneck and will vastly affect the end to end performance. AES(Advanced Encryption Standard) is recommended as a replacement of DES, it’s more secure; with AES-NI support, the throughput can reach nearly *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add a new mode support for AES). This JIRA will use AES with AES-NI support as encryption algorithm for DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7282) Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC
[ https://issues.apache.org/jira/browse/HDFS-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186833#comment-14186833 ] Hudson commented on HDFS-7282: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) HDFS-7282. Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC (Jinghui Wang via Colin P. McCabe) (cmccabe: rev 518a7f4af3d8deeecabfa0629b16521ce09de459) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderFactory.java Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC - Key: HDFS-7282 URL: https://issues.apache.org/jira/browse/HDFS-7282 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.1 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.7.0 Attachments: HDFS-7282.patch TemporarySocketDirectory has finalize method deletes the directory, in TestShortCircuitCache and TestBlockReaderFactory, the TemporarySocketDirectory created are not refereced later in the tests, which can get garbage collected (deleted the dir) before Datanode start up accessing the directory under TemporarySocketDirectory causing FileNotFoundException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186878#comment-14186878 ] Yongjun Zhang commented on HDFS-7235: - I was able to apply the patch locally even at the latest tip of trunk {quote} commit 58c0bb9ed9f4a2491395b63c68046562a73526c9 Author: yliu y...@apache.org Date: Tue Oct 28 21:11:31 2014 +0800 {quote} DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7235: Attachment: (was: HDFS-7235.007.patch) DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7235: Attachment: HDFS-7235.007.patch DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6741) Improve permission denied message when FSPermissionChecker#checkOwner fails
[ https://issues.apache.org/jira/browse/HDFS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186883#comment-14186883 ] Hudson commented on HDFS-6741: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) HDFS-6741. Improve permission denied message when FSPermissionChecker#checkOwner fails. Contributed by Stephen Chu and Harsh J. (harsh) (harsh: rev 0398db19b2c4558a9f08ac2700a27752748896fa) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPermission.java Improve permission denied message when FSPermissionChecker#checkOwner fails --- Key: HDFS-6741 URL: https://issues.apache.org/jira/browse/HDFS-6741 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Harsh J Priority: Trivial Labels: supportability Fix For: 2.7.0 Attachments: HDFS-6741.1.patch, HDFS-6741.2.patch, HDFS-6741.2.patch Currently, FSPermissionChecker#checkOwner throws an AccessControlException with a simple Permission denied message. When users try to set an ACL without ownership permissions, they'll see something like: {code} [schu@hdfs-vanilla-1 hadoop]$ hdfs dfs -setfacl -m user:schu:--- /tmp setfacl: Permission denied {code} It'd be helpful if the message had an explanation why the permission was denied to avoid confusion for users who aren't familiar with permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6934) Move checksum computation off the hot path when writing to RAM disk
[ https://issues.apache.org/jira/browse/HDFS-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186888#comment-14186888 ] Hudson commented on HDFS-6934: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) HDFS-6934. Move checksum computation off the hot path when writing to RAM disk. Contributed by Chris Nauroth. (cnauroth: rev 463aec11718e47d4aabb86a7a539cb973460aae6) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskAsyncLazyPersistService.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/ReplicaOutputStreams.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LazyPersistTestCase.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestShell.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestScrLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskReplicaLruTracker.java HDFS-6934. Revert files accidentally committed. (cnauroth: rev 5b1dfe78b8b06335bed0bcb83f12bb936d4c021b) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestShell.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java Move checksum computation off the hot path when writing to RAM disk --- Key: HDFS-6934 URL: https://issues.apache.org/jira/browse/HDFS-6934 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Arpit Agarwal Assignee: Chris Nauroth Fix For: 2.6.0 Attachments: HDFS-6934-branch-2.6.5.patch, HDFS-6934.3.patch, HDFS-6934.4.patch, HDFS-6934.5.patch, h6934_20141003b.patch, h6934_20141005.patch Since local RAM is considered reliable we can avoid writing checksums on the hot path when replicas are being written to a local RAM disk. The checksum can be computed by the lazy writer when moving replicas to disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7282) Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC
[ https://issues.apache.org/jira/browse/HDFS-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186896#comment-14186896 ] Hudson commented on HDFS-7282: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) HDFS-7282. Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC (Jinghui Wang via Colin P. McCabe) (cmccabe: rev 518a7f4af3d8deeecabfa0629b16521ce09de459) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderFactory.java Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC - Key: HDFS-7282 URL: https://issues.apache.org/jira/browse/HDFS-7282 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.1 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.7.0 Attachments: HDFS-7282.patch TemporarySocketDirectory has finalize method deletes the directory, in TestShortCircuitCache and TestBlockReaderFactory, the TemporarySocketDirectory created are not refereced later in the tests, which can get garbage collected (deleted the dir) before Datanode start up accessing the directory under TemporarySocketDirectory causing FileNotFoundException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186885#comment-14186885 ] Hudson commented on HDFS-7278: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) HDFS-7278. Add a command that allows sysadmins to manually trigger full block reports from a DN (cmccabe) (cmccabe: rev baf794dc404ac54f4e8332654eadfac1bebacb8f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/BlockReportOptions.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7278.002.patch, HDFS-7278.003.patch, HDFS-7278.004.patch, HDFS-7278.005.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186890#comment-14186890 ] Hudson commented on HDFS-5928: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) HDFS-5928. Show namespace and namenode ID on NN dfshealth page. Contributed by Siqi Li. (wheat9: rev 00b4e44a2eba871b4ab47e51c52de95b12dca82e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.7.0 Attachments: HDFS-5928.007.patch, HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v6.patch, HDFS-5928.v1.patch, screenshot-1.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6538) Comment format error in ShortCircuitRegistry javadoc
[ https://issues.apache.org/jira/browse/HDFS-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186893#comment-14186893 ] Hudson commented on HDFS-6538: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) HDFS-6538. Comment format error in ShortCircuitRegistry javadoc. Contributed by David Luo. (harsh) (harsh: rev 0058eadbd3149a5dee1ffc69c2d9f21caa916fb5) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Comment format error in ShortCircuitRegistry javadoc Key: HDFS-6538 URL: https://issues.apache.org/jira/browse/HDFS-6538 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: debugging Assignee: David Luo Priority: Trivial Labels: documentation Fix For: 2.7.0 Attachments: HDFS-6538.patch Original Estimate: 1h Remaining Estimate: 1h The element comment for javadoc should be started by {noformat}/**{noformat}, but it starts with only {noformat}/*{noformat} for class ShortCircuitRegistry. So I think there is a {noformat}*{noformat} Omitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186918#comment-14186918 ] Tony Reix commented on HDFS-6515: - Test report says: [INFO] BUILD SUCCESS but there are errors after: - Determining number of patched Findbugs warnings : /home/jenkins/j/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build@2/dev-support/test-patch.sh: line 622: 2899 Killed enkins-slave/workspace/PreCommit-HDFS-Build@2/dev-support/test-patch.sh: line 622: 2899 Killed - Running tests: /bin/grep: /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build@2/../patchprocess/patch: No such file or directory {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. - Checking the integrity of system test framework code.: mv: cannot stat '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build@2/../patchprocess': No such file or directory I'm now running: mvn clean test findbugs:findbugs -DskipTests -DHadoopPatchProcess in my environment, with trunk patched with 6515, in order to understand what's wrong. Result: [INFO] Apache Hadoop Project POM . FAILURE [1:20.245s] [ERROR] Failed to execute goal org.codehaus.mojo:findbugs-maven-plugin:2.3.2:findbugs (default-cli) on project hadoop-project: Execution default-cli of goal org.codehaus.mojo:findbugs-maven-plugin:2.3.2:findbugs failed: Plugin org.codehaus.mojo:findbugs-maven-plugin:2.3.2 or one of its dependencies could not be resolved: Could not transfer artifact asm:asm-xml:jar:3.1 from/to central (http://repo.maven.apache.org/maven2): Read timed out - [Help 1] Retesting with -X and Oracle 1.7 JVM instead of IBM JVM: Result of: mvn -X test findbugs:findbugs -DskipTests -DHadoopPatchProcess -l mvn.findbugs.OpenJDK.res in my environment (Ubuntu 14.04/Intel, Maven 3.0.4) is : BUILD SUCCESS testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) - Key: HDFS-6515 URL: https://issues.apache.org/jira/browse/HDFS-6515 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0, 2.4.1 Environment: Linux on PPC64 Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test Reporter: Tony Reix Priority: Blocker Labels: hadoop, test Attachments: HDFS-6515-1.patch, HDFS-6515-2.patch I have an issue with test : testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) on Linux/PowerPC. On Linux/Intel, test runs fine. On Linux/PowerPC, I have: testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) Time elapsed: 64.037 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds Looking at details, I see that some Failed to cache messages appear in the traces. Only 10 on Intel, but 186 on PPC64. On PPC64, it looks like some thread is waiting for something that never happens, generating a TimeOut. I'm now using IBM JVM, however I've just checked that the issue also appears with OpenJDK. I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . I need help for understanding what the test is doing, what traces are expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186986#comment-14186986 ] Chris Nauroth commented on HDFS-6606: - I had forgotten that you can do your own commits now, Yi. :-) Thank you for the patch, and thank you to all code reviewers. Optimize HDFS Encrypted Transport performance - Key: HDFS-6606 URL: https://issues.apache.org/jira/browse/HDFS-6606 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, security Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.6.0 Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, HDFS-6606.006.patch, HDFS-6606.007.patch, HDFS-6606.008.patch, HDFS-6606.009.patch, OptimizeHdfsEncryptedTransportperformance.pdf In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, it was a great work. It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports three security strength: * high 3des or rc4 (128bits) * medium des or rc4(56bits) * low rc4(40bits) 3des and rc4 are slow, only *tens of MB/s*, http://www.javamex.com/tutorials/cryptography/ciphers.shtml http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ I will give more detailed performance data in future. Absolutely it’s bottleneck and will vastly affect the end to end performance. AES(Advanced Encryption Standard) is recommended as a replacement of DES, it’s more secure; with AES-NI support, the throughput can reach nearly *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add a new mode support for AES). This JIRA will use AES with AES-NI support as encryption algorithm for DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7291) Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows
[ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187013#comment-14187013 ] Hadoop QA commented on HDFS-7291: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677537/HDFS-7291.4.patch against trunk revision 58c0bb9. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8570//console This message is automatically generated. Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows Key: HDFS-7291 URL: https://issues.apache.org/jira/browse/HDFS-7291 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7291.0.patch, HDFS-7291.1.patch, HDFS-7291.2.patch, HDFS-7291.3.patch, HDFS-7291.4.patch HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows. On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html, this is only supported on Linux kernel 2.6.33+. As pointed by Haowei in the discussion below, FileChannel#transferTo already has support for native unbuffered IO on POSIX platform. On Windows, JDK 6/7/8 has not implemented native unbuffered IO yet. We change to use FileChannel#transfer for POSIX and our own native wrapper of CopyFileEx on Windows for unbuffered copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7291) Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows
[ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7291: Hadoop Flags: Reviewed +1 for the patch. The Jenkins failure looks spurious. I triggered another run. I'll wait for that and then commit. https://builds.apache.org/job/PreCommit-HDFS-Build/8570/ Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows Key: HDFS-7291 URL: https://issues.apache.org/jira/browse/HDFS-7291 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7291.0.patch, HDFS-7291.1.patch, HDFS-7291.2.patch, HDFS-7291.3.patch, HDFS-7291.4.patch HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows. On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html, this is only supported on Linux kernel 2.6.33+. As pointed by Haowei in the discussion below, FileChannel#transferTo already has support for native unbuffered IO on POSIX platform. On Windows, JDK 6/7/8 has not implemented native unbuffered IO yet. We change to use FileChannel#transfer for POSIX and our own native wrapper of CopyFileEx on Windows for unbuffered copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5894) Refactor a private internal class DataTransferEncryptor.SaslParticipant
[ https://issues.apache.org/jira/browse/HDFS-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187012#comment-14187012 ] Hadoop QA commented on HDFS-5894: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677596/HDFS-5894.patch against trunk revision c9bec46. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8568//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8568//console This message is automatically generated. Refactor a private internal class DataTransferEncryptor.SaslParticipant --- Key: HDFS-5894 URL: https://issues.apache.org/jira/browse/HDFS-5894 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Hiroshi Ikeda Assignee: Harsh J Priority: Trivial Attachments: HDFS-5894.patch, HDFS-5894.patch, HDFS-5894.patch It is appropriate to use polymorphism for SaslParticipant instead of scattering if-else statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
Kihwal Lee created HDFS-7300: Summary: The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed Key: HDFS-7300 URL: https://issues.apache.org/jira/browse/HDFS-7300 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases. - Three replicas on two racks. The max is 3, so everything can go to one rack. - Two replicas on two or more racks. The max is 2, both replicas can end up in the same rack. {{BlockManager#isNeededReplication()}} fixes this after block/file is closed because {{blockHasEnoughRacks()}} will return fail. This is not only extra work, but also can break the favored nodes feature. When there are two racks and two favored nodes are specified in the same rack, NN may allocate the third replica on a node in the same rack, because {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the other rack. There is 66% chance that a favored node is moved. If {{maxNodesPerRack}} was 2, this would not happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
[ https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7300: - Target Version/s: 2.7.0 The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed Key: HDFS-7300 URL: https://issues.apache.org/jira/browse/HDFS-7300 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases. - Three replicas on two racks. The max is 3, so everything can go to one rack. - Two replicas on two or more racks. The max is 2, both replicas can end up in the same rack. {{BlockManager#isNeededReplication()}} fixes this after block/file is closed because {{blockHasEnoughRacks()}} will return fail. This is not only extra work, but also can break the favored nodes feature. When there are two racks and two favored nodes are specified in the same rack, NN may allocate the third replica on a node in the same rack, because {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the other rack. There is 66% chance that a favored node is moved. If {{maxNodesPerRack}} was 2, this would not happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7291) Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows
[ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187057#comment-14187057 ] Xiaoyu Yao commented on HDFS-7291: -- The Jenkins failure is related to the change from HADOOP-10926 on test-patch.sh. An earlier break was found by HADOOP-11240 and resolved. I attached the new failures to HADOOP-10926. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows Key: HDFS-7291 URL: https://issues.apache.org/jira/browse/HDFS-7291 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7291.0.patch, HDFS-7291.1.patch, HDFS-7291.2.patch, HDFS-7291.3.patch, HDFS-7291.4.patch HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows. On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html, this is only supported on Linux kernel 2.6.33+. As pointed by Haowei in the discussion below, FileChannel#transferTo already has support for native unbuffered IO on POSIX platform. On Windows, JDK 6/7/8 has not implemented native unbuffered IO yet. We change to use FileChannel#transfer for POSIX and our own native wrapper of CopyFileEx on Windows for unbuffered copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187153#comment-14187153 ] Hadoop QA commented on HDFS-7235: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677607/HDFS-7235.007.patch against trunk revision 58c0bb9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8569//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8569//console This message is automatically generated. DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6252) Phase out the old web UI in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187163#comment-14187163 ] Zhe Zhang commented on HDFS-6252: - [~wheat9] I'm working on HDFS-7165 which requires a change to {{TestMissingBlocksAlert}}. It seems to me your changes to {{TestMissingBlocksAlert}} is compatible with and should go into branch-2. Let me know if you agree. Thanks! Phase out the old web UI in HDFS Key: HDFS-6252 URL: https://issues.apache.org/jira/browse/HDFS-6252 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.0 Reporter: Fengdong Yu Assignee: Haohui Mai Priority: Minor Fix For: 2.7.0 Attachments: HDFS-6252-branch-2.000.patch, HDFS-6252.000.patch, HDFS-6252.001.patch, HDFS-6252.002.patch, HDFS-6252.003.patch, HDFS-6252.004.patch, HDFS-6252.005.patch, HDFS-6252.006.patch We've deprecated hftp and hsftp in HDFS-5570, so if we always download file from download this file on the browseDirectory.jsp, it will throw an error: Problem accessing /streamFile/*** because streamFile servlet was deleted in HDFS-5570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
[ https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7300: - Assignee: Kihwal Lee Status: Patch Available (was: Open) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed Key: HDFS-7300 URL: https://issues.apache.org/jira/browse/HDFS-7300 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-7300.patch The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases. - Three replicas on two racks. The max is 3, so everything can go to one rack. - Two replicas on two or more racks. The max is 2, both replicas can end up in the same rack. {{BlockManager#isNeededReplication()}} fixes this after block/file is closed because {{blockHasEnoughRacks()}} will return fail. This is not only extra work, but also can break the favored nodes feature. When there are two racks and two favored nodes are specified in the same rack, NN may allocate the third replica on a node in the same rack, because {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the other rack. There is 66% chance that a favored node is moved. If {{maxNodesPerRack}} was 2, this would not happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
[ https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7300: - Attachment: HDFS-7300.patch Submitting the patch without a test case to run tests by precommit. There is also a bug in {{chooseTarget()}} for the favored nodes case. The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed Key: HDFS-7300 URL: https://issues.apache.org/jira/browse/HDFS-7300 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical Attachments: HDFS-7300.patch The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases. - Three replicas on two racks. The max is 3, so everything can go to one rack. - Two replicas on two or more racks. The max is 2, both replicas can end up in the same rack. {{BlockManager#isNeededReplication()}} fixes this after block/file is closed because {{blockHasEnoughRacks()}} will return fail. This is not only extra work, but also can break the favored nodes feature. When there are two racks and two favored nodes are specified in the same rack, NN may allocate the third replica on a node in the same rack, because {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the other rack. There is 66% chance that a favored node is moved. If {{maxNodesPerRack}} was 2, this would not happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
[ https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187195#comment-14187195 ] Kihwal Lee commented on HDFS-7300: -- The base formula for caluculating the value is {noformat} maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2 {noformat} In the patch, the single rack case and the single replica case are handled without applying this formula. Then it is guaranteed that the number of rack is greater than 1 when calculating the max value. It is also guaranteed to give a sufficiently big max value. {noformat} maxNodePerRack * numOfRacks = totalNumOfReplicas totalNumOfReplicas-1 + 2 * numOfRack = totalNumOfReplicas numOfRack = 0.5 {noformat} Since numOfRacks is greater than 1, maxNodePerRack is guaranteed to be large enough. In order to take care of the case of {{maxNodePerRack == totalNumOfReplicas}}, which happens in the cases described in the description, maxNodePerRack is decremented if necessary. This still results in a sufficiently large value. {noformat} (maxNodePerRack - 1) * numOfRacks totalNumOfReplicas totalNumOfReplicas-1 + numOfRack totalNumOfReplicas numOfRack 1 {noformat} It shows the resulting max value is not only large enough, but also allows a bit of slack for unbalanced racks, as the original formula does. The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed Key: HDFS-7300 URL: https://issues.apache.org/jira/browse/HDFS-7300 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-7300.patch The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases. - Three replicas on two racks. The max is 3, so everything can go to one rack. - Two replicas on two or more racks. The max is 2, both replicas can end up in the same rack. {{BlockManager#isNeededReplication()}} fixes this after block/file is closed because {{blockHasEnoughRacks()}} will return fail. This is not only extra work, but also can break the favored nodes feature. When there are two racks and two favored nodes are specified in the same rack, NN may allocate the third replica on a node in the same rack, because {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the other rack. There is 66% chance that a favored node is moved. If {{maxNodesPerRack}} was 2, this would not happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7291) Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows
[ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187224#comment-14187224 ] Chris Nauroth commented on HDFS-7291: - This build seems to be doing better: https://builds.apache.org/job/PreCommit-HDFS-Build/8571/ Persist in-memory replicas with appropriate unbuffered copy API on POSIX and Windows Key: HDFS-7291 URL: https://issues.apache.org/jira/browse/HDFS-7291 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7291.0.patch, HDFS-7291.1.patch, HDFS-7291.2.patch, HDFS-7291.3.patch, HDFS-7291.4.patch HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows. On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html, this is only supported on Linux kernel 2.6.33+. As pointed by Haowei in the discussion below, FileChannel#transferTo already has support for native unbuffered IO on POSIX platform. On Windows, JDK 6/7/8 has not implemented native unbuffered IO yet. We change to use FileChannel#transfer for POSIX and our own native wrapper of CopyFileEx on Windows for unbuffered copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6252) Phase out the old web UI in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187231#comment-14187231 ] Haohui Mai commented on HDFS-6252: -- Yes. Please feel free to file a jira and merge the changes to branch-2. Phase out the old web UI in HDFS Key: HDFS-6252 URL: https://issues.apache.org/jira/browse/HDFS-6252 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.0 Reporter: Fengdong Yu Assignee: Haohui Mai Priority: Minor Fix For: 2.7.0 Attachments: HDFS-6252-branch-2.000.patch, HDFS-6252.000.patch, HDFS-6252.001.patch, HDFS-6252.002.patch, HDFS-6252.003.patch, HDFS-6252.004.patch, HDFS-6252.005.patch, HDFS-6252.006.patch We've deprecated hftp and hsftp in HDFS-5570, so if we always download file from download this file on the browseDirectory.jsp, it will throw an error: Problem accessing /streamFile/*** because streamFile servlet was deleted in HDFS-5570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7301) TestMissingBlocksAlert should use MXBeans instead of old web UI
Zhe Zhang created HDFS-7301: --- Summary: TestMissingBlocksAlert should use MXBeans instead of old web UI Key: HDFS-7301 URL: https://issues.apache.org/jira/browse/HDFS-7301 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang HDFS-6252 has phased out the old web UI in trunk. {{TestMissingBlocksAlert}} was excluded in its branch-2 patch. After revisiting the problem [~wheat9] and I agreed that it should go into branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
[ https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187195#comment-14187195 ] Kihwal Lee edited comment on HDFS-7300 at 10/28/14 7:08 PM: The base formula for caluculating the value is {noformat} maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2 {noformat} In the patch, the single rack case and the single replica case are handled without applying this formula. Then it is guaranteed that the number of rack is greater than 1 when calculating the max value. It is also guaranteed to give a sufficiently big max value. {noformat} maxNodePerRack * numOfRacks = totalNumOfReplicas totalNumOfReplicas-1 + 2 * numOfRack = totalNumOfReplicas numOfRack = 0.5 {noformat} Since numOfRacks is greater than 1, maxNodePerRack is guaranteed to be large enough. In order to take care of the case of {{maxNodePerRack == totalNumOfReplicas}}, which happens in the cases listed in the description, maxNodePerRack is decremented if necessary. This still results in a sufficiently large value. {noformat} (maxNodePerRack - 1) * numOfRacks totalNumOfReplicas totalNumOfReplicas-1 + numOfRack totalNumOfReplicas numOfRack 1 {noformat} It shows the resulting max value is not only large enough, but also allows a bit of slack for unbalanced racks, as the original formula does. was (Author: kihwal): The base formula for caluculating the value is {noformat} maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2 {noformat} In the patch, the single rack case and the single replica case are handled without applying this formula. Then it is guaranteed that the number of rack is greater than 1 when calculating the max value. It is also guaranteed to give a sufficiently big max value. {noformat} maxNodePerRack * numOfRacks = totalNumOfReplicas totalNumOfReplicas-1 + 2 * numOfRack = totalNumOfReplicas numOfRack = 0.5 {noformat} Since numOfRacks is greater than 1, maxNodePerRack is guaranteed to be large enough. In order to take care of the case of {{maxNodePerRack == totalNumOfReplicas}}, which happens in the cases described in the description, maxNodePerRack is decremented if necessary. This still results in a sufficiently large value. {noformat} (maxNodePerRack - 1) * numOfRacks totalNumOfReplicas totalNumOfReplicas-1 + numOfRack totalNumOfReplicas numOfRack 1 {noformat} It shows the resulting max value is not only large enough, but also allows a bit of slack for unbalanced racks, as the original formula does. The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed Key: HDFS-7300 URL: https://issues.apache.org/jira/browse/HDFS-7300 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-7300.patch The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases. - Three replicas on two racks. The max is 3, so everything can go to one rack. - Two replicas on two or more racks. The max is 2, both replicas can end up in the same rack. {{BlockManager#isNeededReplication()}} fixes this after block/file is closed because {{blockHasEnoughRacks()}} will return fail. This is not only extra work, but also can break the favored nodes feature. When there are two racks and two favored nodes are specified in the same rack, NN may allocate the third replica on a node in the same rack, because {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the other rack. There is 66% chance that a favored node is moved. If {{maxNodesPerRack}} was 2, this would not happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7276: -- Attachment: h7276_20141028.patch h7276_20141028.patch: using timed wait. Limit the number of byte arrays used by DFSOutputStream --- Key: HDFS-7276 URL: https://issues.apache.org/jira/browse/HDFS-7276 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7276_20141021.patch, h7276_20141022.patch, h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, h7276_20141027b.patch, h7276_20141028.patch When there are a lot of DFSOutputStream's writing concurrently, the number of outstanding packets could be large. The byte arrays created by those packets could occupy a lot of memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187317#comment-14187317 ] Tsz Wo Nicholas Sze commented on HDFS-7276: --- The percentage differences are shown in the performance test {noformat} arrayLength=65536, nThreads=512, nAllocations=32768, maxArrays=1024 NewByteArrayWithoutLimit: 3439, 3394, 3497, 3459, 3442, avg= 3.446s NewByteArrayWithLimit: 3448, 3563, 3552, 3492, 3509, avg= 3.513s ( 1.93%) UsingByteArrayManager: 3357, 3369, 3327, 3345, 3324, avg= 3.344s ( -2.95%) ( -4.79%) {noformat} The time elapsed for UsingByteArrayManager is 2.95% and 4.79% less than NewByteArrayWithoutLimit and NewByteArrayWithLimit, respectively. Limit the number of byte arrays used by DFSOutputStream --- Key: HDFS-7276 URL: https://issues.apache.org/jira/browse/HDFS-7276 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7276_20141021.patch, h7276_20141022.patch, h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, h7276_20141027b.patch, h7276_20141028.patch When there are a lot of DFSOutputStream's writing concurrently, the number of outstanding packets could be large. The byte arrays created by those packets could occupy a lot of memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187324#comment-14187324 ] bc Wong commented on HDFS-7295: --- [~ste...@apache.org], I don't understand the scaling concern with revocation. I was thinking that we can just cancel the DT with NN. And that's already supported. If the NN no longer knows about a DT, then the requests using such a DT will automatically get rejected. No need to keep a separate revocation list, unlike the X509 stuff. This requires showing the HDFS admin what are all the outstanding DTs, and logging the DT (a SHA hash) in the audit log. That latter facility is already in place today, and the SHA hash of the DT is cached (HDFS-4680). This works at a pretty large scale for us. So I'm not that concerned about perf here. bq. pushing out new tokens from the client When the Spark Streaming app is running, it's all in the cluster. It doesn't have any Kerberos credential at that point. I don't think it can get new tokens. Right? Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187328#comment-14187328 ] Kihwal Lee commented on HDFS-6663: -- +1 the latest patch looks good. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187338#comment-14187338 ] Hudson commented on HDFS-6663: -- FAILURE: Integrated in Hadoop-trunk-Commit #6370 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6370/]) HDFS-6663. Admin command to track file and locations from block id. (kihwal: rev 371a3b87ed346732ed58a4faab0c6c1db57c86ed) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187339#comment-14187339 ] Kihwal Lee commented on HDFS-6663: -- Committed to trunk and cherry-picked to branch-2. There was a merge conflict due to the context difference in help message in branch-2. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6663: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Fix For: 2.7.0 Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6663: - Target Version/s: 2.7.0 (was: 2.6.0) Fix Version/s: 2.7.0 Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Fix For: 2.7.0 Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187357#comment-14187357 ] Kihwal Lee commented on HDFS-7213: -- +1 processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7213: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and cherry-picked to branch-2. Thanks for fixing it, Eric. processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187362#comment-14187362 ] Hudson commented on HDFS-7213: -- FAILURE: Integrated in Hadoop-trunk-Commit #6371 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6371/]) HDFS-7213. processIncrementalBlockReport performance degradation. (kihwal: rev e226b5b40d716b6d363c43a8783766b72734e347) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade
Tsz Wo Nicholas Sze created HDFS-7302: - Summary: namenode -rollingUpgrade downgrade may finalize a rolling upgrade Key: HDFS-7302 URL: https://issues.apache.org/jira/browse/HDFS-7302 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The namenode startup option -rollingUpgrade downgrade is originally designed for downgrading cluster. However, running namenode -rollingUpgrade downgrade with the new software could result in finalizing the ongoing rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187373#comment-14187373 ] Tsz Wo Nicholas Sze commented on HDFS-7302: --- Downgrade actually can be done in a rolling fashion as shown in HDFS-7230. So the -rollingUpgrade downgrade startup option is indeed not very useful. I suggest removing it. namenode -rollingUpgrade downgrade may finalize a rolling upgrade - Key: HDFS-7302 URL: https://issues.apache.org/jira/browse/HDFS-7302 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The namenode startup option -rollingUpgrade downgrade is originally designed for downgrading cluster. However, running namenode -rollingUpgrade downgrade with the new software could result in finalizing the ongoing rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7303) If there are multiple datanodes on the same host, then only one datanode is listed on the NN UI’s datanode tab
Benoy Antony created HDFS-7303: -- Summary: If there are multiple datanodes on the same host, then only one datanode is listed on the NN UI’s datanode tab Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7303) If there are multiple datanodes on the same host, then only one datanode is listed on the NN UI’s datanode tab
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-7303: --- Status: Patch Available (was: Open) If there are multiple datanodes on the same host, then only one datanode is listed on the NN UI’s datanode tab --- Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Attachments: HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7303) If there are multiple datanodes on the same host, then only one datanode is listed on the NN UI’s datanode tab
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-7303: --- Attachment: HDFS-7303.patch Attaching patch which does the following : 1. If there are multiple datanodes on the same host, host:port displayed. 2. If there is a single datanode on the host, only host is displayed. This is done for live nodes, dead nodes and decommissioned nodes. If there are multiple datanodes on the same host, then only one datanode is listed on the NN UI’s datanode tab --- Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Attachments: HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7301) TestMissingBlocksAlert should use MXBeans instead of old web UI
[ https://issues.apache.org/jira/browse/HDFS-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7301: Status: Patch Available (was: Open) TestMissingBlocksAlert should use MXBeans instead of old web UI --- Key: HDFS-7301 URL: https://issues.apache.org/jira/browse/HDFS-7301 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang HDFS-6252 has phased out the old web UI in trunk. {{TestMissingBlocksAlert}} was excluded in its branch-2 patch. After revisiting the problem [~wheat9] and I agreed that it should go into branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7301) TestMissingBlocksAlert should use MXBeans instead of old web UI
[ https://issues.apache.org/jira/browse/HDFS-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7301: Attachment: HDFS-7301.patch This should be identical to HDFS-6252 in regard of {{TestMissingBlocksAlert}} TestMissingBlocksAlert should use MXBeans instead of old web UI --- Key: HDFS-7301 URL: https://issues.apache.org/jira/browse/HDFS-7301 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7301.patch HDFS-6252 has phased out the old web UI in trunk. {{TestMissingBlocksAlert}} was excluded in its branch-2 patch. After revisiting the problem [~wheat9] and I agreed that it should go into branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)