[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt uses raw dir.getFileInfo() to getAuditFileInfo()
[ https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040479#comment-14040479 ] Haohui Mai commented on HDFS-6580: -- Looks good to me. +1 FSNamesystem.mkdirsInt uses raw dir.getFileInfo() to getAuditFileInfo() --- Key: HDFS-6580 URL: https://issues.apache.org/jira/browse/HDFS-6580 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file info for auditing purpose. getAuditFileInfo() returns null when auditing is disabled, and calls dir.getFileInfo() when auditing is enabled. One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to get file info for auditing. Should change to getAuditFileInfo(). Note that another internal API, startFileInt() uses dir.getFileInfo() correctly, because the returned file stat is returned out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
[ https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6580: - Summary: FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper (was: FSNamesystem.mkdirsInt uses raw dir.getFileInfo() to getAuditFileInfo()) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper - Key: HDFS-6580 URL: https://issues.apache.org/jira/browse/HDFS-6580 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file info for auditing purpose. getAuditFileInfo() returns null when auditing is disabled, and calls dir.getFileInfo() when auditing is enabled. One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to get file info for auditing. Should change to getAuditFileInfo(). Note that another internal API, startFileInt() uses dir.getFileInfo() correctly, because the returned file stat is returned out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
[ https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6580: - Resolution: Fixed Fix Version/s: 2.5.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~timxzl] for the contribution. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper - Key: HDFS-6580 URL: https://issues.apache.org/jira/browse/HDFS-6580 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 2.5.0 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file info for auditing purpose. getAuditFileInfo() returns null when auditing is disabled, and calls dir.getFileInfo() when auditing is enabled. One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to get file info for auditing. Should change to getAuditFileInfo(). Note that another internal API, startFileInt() uses dir.getFileInfo() correctly, because the returned file stat is returned out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
[ https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040483#comment-14040483 ] Hudson commented on HDFS-6580: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5753 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5753/]) HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. Contributed bu Zhilei Xu. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper - Key: HDFS-6580 URL: https://issues.apache.org/jira/browse/HDFS-6580 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 2.5.0 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file info for auditing purpose. getAuditFileInfo() returns null when auditing is disabled, and calls dir.getFileInfo() when auditing is enabled. One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to get file info for auditing. Should change to getAuditFileInfo(). Note that another internal API, startFileInt() uses dir.getFileInfo() correctly, because the returned file stat is returned out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040606#comment-14040606 ] Binglin Chang commented on HDFS-6586: - TestBalancerWithNodeGroup also failed before with the same reason(HDFS-6250), we fixed TestBalancerWithNodeGroup, but looks like TestBalancer have the same bug, and potentially also have bug HDFS-6250. TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040607#comment-14040607 ] Binglin Chang commented on HDFS-6586: - bq. and potentially also have bug HDFS-6250 sorry, it was HDFS-6506 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6506: Attachment: HDFS-6506.v2.patch Update patch to add fix of bug in HDFS-6586, TestBalancer is affected by balancer.id file. Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by
[jira] [Resolved] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang resolved HDFS-6586. - Resolution: Duplicate TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040625#comment-14040625 ] Binglin Chang commented on HDFS-6586: - I updated the patch in HDSF-6506 to fix the bug, close this jira as duplicate. Thanks for reporting this, Ted. TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6382) HDFS File/Directory TTL
[ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6382: - Attachment: HDFS-TTL-Design-3.pdf Update the document according to the implementation. HDFS File/Directory TTL --- Key: HDFS-6382 URL: https://issues.apache.org/jira/browse/HDFS-6382 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design-3.pdf, HDFS-TTL-Design.pdf In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. Following are some details of this proposal: 1. HDFS can support TTL on a specified file or directory 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired 4. The child file/directory's TTL configuration should override its parent directory's 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL
[ https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6525: - Attachment: HDFS-6525.1.patch Initial implementation. FsShell supports HDFS TTL - Key: HDFS-6525 URL: https://issues.apache.org/jira/browse/HDFS-6525 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6525.1.patch This issue is used to track development of supporting HDFS TTL for FsShell, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL
[ https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6525: - Target Version/s: 2.5.0 Status: Patch Available (was: Open) FsShell supports HDFS TTL - Key: HDFS-6525 URL: https://issues.apache.org/jira/browse/HDFS-6525 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6525.1.patch This issue is used to track development of supporting HDFS TTL for FsShell, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager
[ https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6526: - Attachment: HDFS-6526.1.patch Initial implementation, the unit test depends on HDFS-6525, so we should commit HDFS-6525 before commit this. Implement HDFS TtlManager - Key: HDFS-6526 URL: https://issues.apache.org/jira/browse/HDFS-6526 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6526.1.patch This issue is used to track development of HDFS TtlManager, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager
[ https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6526: - Target Version/s: 2.5.0 Status: Patch Available (was: Open) Implement HDFS TtlManager - Key: HDFS-6526 URL: https://issues.apache.org/jira/browse/HDFS-6526 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6526.1.patch This issue is used to track development of HDFS TtlManager, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040644#comment-14040644 ] Hudson commented on HDFS-6507: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #592 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/592/]) HDFS-6507. Improve DFSAdmin to support HA cluster better. (Contributd by Zesheng Wu) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604692) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Fix For: 2.5.0 Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
[ https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040643#comment-14040643 ] Hudson commented on HDFS-6580: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #592 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/592/]) HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. Contributed bu Zhilei Xu. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper - Key: HDFS-6580 URL: https://issues.apache.org/jira/browse/HDFS-6580 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 2.5.0 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file info for auditing purpose. getAuditFileInfo() returns null when auditing is disabled, and calls dir.getFileInfo() when auditing is enabled. One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to get file info for auditing. Should change to getAuditFileInfo(). Note that another internal API, startFileInt() uses dir.getFileInfo() correctly, because the returned file stat is returned out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
[ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040683#comment-14040683 ] Zesheng Wu commented on HDFS-6382: -- Hi guys, I've uploaded an initial implementation on HDFS-6525 and HDFS-6526 separately, hope you can take a look at, any comments will be appreciated. Thanks in advance. HDFS File/Directory TTL --- Key: HDFS-6382 URL: https://issues.apache.org/jira/browse/HDFS-6382 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design-3.pdf, HDFS-TTL-Design.pdf In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. Following are some details of this proposal: 1. HDFS can support TTL on a specified file or directory 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired 4. The child file/directory's TTL configuration should override its parent directory's 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6592) Use Fluent to collect data to append to HDFS. Throw the AlreadyBeingCreatedException exception
jack created HDFS-6592: -- Summary: Use Fluent to collect data to append to HDFS. Throw the AlreadyBeingCreatedException exception Key: HDFS-6592 URL: https://issues.apache.org/jira/browse/HDFS-6592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: jack We use Fluent to collect log data. The log data append to the files in HDFS. The cluster configuration: Namenode : namenode1(hostname) secondnamenode: namenode2 3 datanodes: datanode1, datanode2, datanode3 3 replications Every few days, suffere from the following exception: Exception in nameNode1: 2014-06-22 09:54:41,892 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.append: Failed to create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] 2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] Exception in DataNode1: 2014-06-22 09:54:45,771 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-06-22 09:54:45,813 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2441) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) According to the log, we infer the flow of the exception: 1. Namenode update pipeline with just one datanode namenode1 log: 2014-06-22 09:54:16,604 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline (block=BP-1611177164-datanode1-1399894698024:blk_1074496235_1935947, newGenerationStamp=1935951, newLength=98839816, newNodes=[datanode1:50010], clientName=DFSClient_NONMAPREDUCE_349196146_2027206) 2. datanode1 throw exception during close. datanode1 log: 2014-06-22 09:54:26,569 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file_name retrying... 3. The subsequent collected data from Fluent will triger another DFSClient to append to the same file. namenode1 log: 2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to create file [file_name] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [datanode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [datanode1] 4. The subsequent DFSClient will triger to recover the Lease every LEASE_SOFTLIMIT_PERIOD namenode1 log: 2014-06-22 09:58:34,722 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover [Lease. Holder:
[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
[ https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040751#comment-14040751 ] Hudson commented on HDFS-6580: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1783 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1783/]) HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. Contributed bu Zhilei Xu. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper - Key: HDFS-6580 URL: https://issues.apache.org/jira/browse/HDFS-6580 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 2.5.0 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file info for auditing purpose. getAuditFileInfo() returns null when auditing is disabled, and calls dir.getFileInfo() when auditing is enabled. One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to get file info for auditing. Should change to getAuditFileInfo(). Note that another internal API, startFileInt() uses dir.getFileInfo() correctly, because the returned file stat is returned out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040752#comment-14040752 ] Hudson commented on HDFS-6507: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1783 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1783/]) HDFS-6507. Improve DFSAdmin to support HA cluster better. (Contributd by Zesheng Wu) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604692) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Fix For: 2.5.0 Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040758#comment-14040758 ] Hadoop QA commented on HDFS-6506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651956/HDFS-6506.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.TestRefreshCallQueue {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7210//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7210//console This message is automatically generated. Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040790#comment-14040790 ] Daryn Sharp commented on HDFS-6475: --- bq. Your earlier suggestion indicated that we should use SecretManager#retriableRetrievePassword instead of SecretManager#retrievePassword, does that mean client code has to be modified? If I understand the question: The methods are only used server-side so no client-side changes should be required, so no incompatibility concerns. Did you happen to trace how/where the {{StandbyException}} is wrapped in an {{InvalidToken}}? It looks like {{DelegationTokenSecretManager#retrievePassword}} is the only place it occurs, but {{DelegationTokenSecretManager#retriableRetrievePassword}} does not wrap exceptions in {{InvalidToken}}. Is this maybe just a test case issue? Which testcase is failing? WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, HDFS-6475.008.patch, HDFS-6475.009.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6526) Implement HDFS TtlManager
[ https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040792#comment-14040792 ] Hadoop QA commented on HDFS-6526: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651960/HDFS-6526.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1264 javac compiler warnings (more than the trunk's current 1259 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.ttlmanager.TestTtlPolicy {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7212//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7212//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7212//console This message is automatically generated. Implement HDFS TtlManager - Key: HDFS-6526 URL: https://issues.apache.org/jira/browse/HDFS-6526 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6526.1.patch This issue is used to track development of HDFS TtlManager, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6582) Missing null check in RpcProgramNfs3#read(XDR, SecurityHandler)
[ https://issues.apache.org/jira/browse/HDFS-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6582: - Component/s: nfs Missing null check in RpcProgramNfs3#read(XDR, SecurityHandler) --- Key: HDFS-6582 URL: https://issues.apache.org/jira/browse/HDFS-6582 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Ted Yu Priority: Minor Around line 691: {code} FSDataInputStream fis = clientCache.getDfsInputStream(userName, Nfs3Utils.getFileIdPath(handle)); try { readCount = fis.read(offset, readbuffer, 0, count); {code} fis may be null, leading to NullPointerException -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6525) FsShell supports HDFS TTL
[ https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040810#comment-14040810 ] Hadoop QA commented on HDFS-6525: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651959/HDFS-6525.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ipc.TestIPC {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7211//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7211//console This message is automatically generated. FsShell supports HDFS TTL - Key: HDFS-6525 URL: https://issues.apache.org/jira/browse/HDFS-6525 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6525.1.patch This issue is used to track development of supporting HDFS TTL for FsShell, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6585) INodesInPath.resolve is called multiple times in FSNamesystem.setPermission
[ https://issues.apache.org/jira/browse/HDFS-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040823#comment-14040823 ] Daryn Sharp commented on HDFS-6585: --- I've been working on the exact same change! Just to a larger extent. I'll take a look this afternoon. INodesInPath.resolve is called multiple times in FSNamesystem.setPermission --- Key: HDFS-6585 URL: https://issues.apache.org/jira/browse/HDFS-6585 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Attachments: patch_ab60af58e03b323dd4b18d32c4def1f008b98822.txt, patch_f15b7d505f12213f1ee9fb5ddb4bdaa64f9f623d.txt Most of the APIs (both internal and external) in FSNamesystem calls INodesInPath.resolve() to get the list of INodes corresponding to a file path. Usually one API will call resolve() multiple times and that's a waste of time. This issue particularly refers to FSNamesystem.setPermission, which calls resolve() twice indirectly: one from checkOwner(), another from dir.setPermission(). Should save the result of resolve(), and use it whenever possible throughout the lifetime of an API call, instead of making new resolve() calls. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6525) FsShell supports HDFS TTL
[ https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040828#comment-14040828 ] Daryn Sharp commented on HDFS-6525: --- The test cases should verify that the inherited path displayed is correct. Like most other tests, it should verify that relative and scheme-absolute paths are displayed correctly. It might make sense to print path: ttl instead of the reverse, but it's up to you. Minor suggestion is have the values for the units computed with math so a reviewer doesn't have to do the math to verify the numbers. FsShell supports HDFS TTL - Key: HDFS-6525 URL: https://issues.apache.org/jira/browse/HDFS-6525 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6525.1.patch This issue is used to track development of supporting HDFS TTL for FsShell, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
[ https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040835#comment-14040835 ] Hudson commented on HDFS-6580: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1810 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1810/]) HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. Contributed bu Zhilei Xu. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper - Key: HDFS-6580 URL: https://issues.apache.org/jira/browse/HDFS-6580 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 2.5.0 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file info for auditing purpose. getAuditFileInfo() returns null when auditing is disabled, and calls dir.getFileInfo() when auditing is enabled. One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to get file info for auditing. Should change to getAuditFileInfo(). Note that another internal API, startFileInt() uses dir.getFileInfo() correctly, because the returned file stat is returned out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040836#comment-14040836 ] Hudson commented on HDFS-6507: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1810 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1810/]) HDFS-6507. Improve DFSAdmin to support HA cluster better. (Contributd by Zesheng Wu) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604692) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Fix For: 2.5.0 Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040873#comment-14040873 ] Sanjay Radia commented on HDFS-6134: Aaron said: bq. distcp... I disagree - this is exactly what one wants .. So you are saying that distcp should decrypt and re-encrypt data as it copies it ... most backup tools do not this as they copy data - it is extra CPU resources and further unneeded venerability. There are customer use cases where distcp not over an encrypted channel; hence if one of the files being copied is encrypted one may not want the file to be transparently sent decrypted. Further, a sensitive file in a subtree may have been encrypted because the subtree is readable by a larger group and hence the distcp user may not have access to the keys. bq. delegation tokens - KMS ... Owen and Tucu have already discussed this quite a bit above Turns out this issue come up in discussion with Owen, and he shares the concern and suggested that I post the concern. Besides even if Alejandro and Owen are in agreement, my question is relevant and has not been raised so far above: Encryption is used to overcome limitations of authorization and authentication in the system. It is relevant to ask if the use of delegation tokens to obtain keys adds weakness. bq. meeting ... Aaron .. you are misunderstanding my point. I am not saying that the discussion on this jira have not been open. * See Alejandro's comments: Todd Lipcon and I had an offline discussion with Andrew Purtell, Yi Liu and Avik Dey and After some offline discussions with Yi, Tianyou, ATM, Todd, Andrew and Charles ... ** there have been such meetings and I have *no objections* to such private meetings because I know that the bandwidth helps. I am merely asking for one more meeting where I can quickly come up to speed on the context that Alejandro, Todd, Yi, Tianyou, Andrew, Atm, share. It will help me and others better understand the viewpoint that some of you share due to prevous high bandwidth meetings. ** There is a precedent of HDFS meetings in spite of open jira discussion - higher bandwidth to progress faster. **Perhaps I should have worded the private meetings differently ... sorry if it came across the wrong way. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040966#comment-14040966 ] Steve Loughran commented on HDFS-6134: -- Maybe the issue with distcp is sometimes you want to get at the raw data -backups and copying being examples. This lets admin work on the data without needing access to the keys, just as today I can back up the underlying native OS disks without understanding HDFS (or any future encryption) Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040984#comment-14040984 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], Thanks a lot for the comments. Calling {{DelegationTokenSecretManager#retrievePassword}} is the sole place I have seen. And, the following method in AbstractDelegationTokenSecretManager is where retrievePassword is called, {code} public synchronized void verifyToken(TokenIdent identifier, byte[] password) throws InvalidToken { byte[] storedPassword = retrievePassword(identifier); if (!Arrays.equals(password, storedPassword)) { throw new InvalidToken(token ( + identifier + ) is invalid, password doesn't match); } } {code} I wonder whether we can just replace the above retrievePassword call with retriableRetrievePassword here. I will give it a try. The failed tests are reported in HDFS-6589, related to HDFS-5322. Hi [~jingzhao], I put a question in HDFS-6589. I wonder if the failed tests are designed to cover real user scenarios? Thanks for clarifying. Best regards. WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, HDFS-6475.008.patch, HDFS-6475.009.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4629) Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA
[ https://issues.apache.org/jira/browse/HDFS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040995#comment-14040995 ] Steve Loughran commented on HDFS-4629: -- # the declaration of the xerces lib version MUST go into {{hadoop-project/pom.xml}}; all JAR version logic goes in there to avoid inconsistencies # is is going to add yet-another-dependency. # we may need this import with java 9 anyway, as com.sun is potentially going to be inaccessible. Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA - Key: HDFS-4629 URL: https://issues.apache.org/jira/browse/HDFS-4629 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.0.3-alpha Environment: OS:fedora and RHEL (64 bit) Platform: x86, POWER, and SystemZ JVM Vendor = IBM Reporter: Amir Sanjar Attachments: HDFS-4629-1.patch, HDFS-4629.patch Porting to a non-JVM vendor solution by replacing: import com.sun.org.apache.xml.internal.serialize.OutputFormat; import com.sun.org.apache.xml.internal.serialize.XMLSerializer; with import org.apache.xml.serialize.OutputFormat; import org.apache.xml.serialize.XMLSerializer; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041008#comment-14041008 ] Aaron T. Myers commented on HDFS-6134: -- Sanjay, Steve - regarding distcp, Alejandro has already said the following, which I think addresses what both of you are getting at. Note the second paragraph: {quote} Vanilla distcp will just work with transparent encryption. Data will be decrypted on read and encrypted on write, assuming both source and target are in encrypted zones. The proposal on changing distcp is to enable a second use used case, copy data from one cluster to another without having to decrypt/encrypt the data while doing the copy. This is useful when doing copies for disaster recovery, hdfs admins could do the copy without having to have access to the encryption keys. {quote} Sanjay: bq. Turns out this issue come up in discussion with Owen, and he shares the concern and suggested that I post the concern. Besides even if Alejandro and Owen are in agreement, my question is relevant and has not been raised so far above: Encryption is used to overcome limitations of authorization and authentication in the system. It is relevant to ask if the use of delegation tokens to obtain keys adds weakness. Transparent at-rest encryption is used to address other possible attack vectors, for example an admin removing hard drives from the cluster and looking at the data offline, or various attack vectors if network communication can be intercepted. I was under the impression that Owen's concern was mostly around performance, i.e. that he didn't want all of the many tasks/containers in an MR/YARN job to each request the same encryption key(s) from the KMS at startup. I think that's quite reasonable, but it doesn't need to be an either/or thing - YARN jobs can request the appropriate keys upfront to address performance concerns _and_ the KMS can accept DTs for authentication to enable other use cases. Regardless, I don't see how being able to request encryption keys via DTs adds any weakness. The DTs can only be granted via Kerberos-authenticated channels, and they expire, so they allow no more access than one can get via Kerberos. Could you perhaps elaborate on the specific concern there? bq. Aaron .. you are misunderstanding my point. I am not saying that the discussion on this jira have not been open.snip OK, good to hear. Sorry if I misinterpreted what you were saying. bq. I am merely asking for one more meeting where I can quickly come up to speed on the context that Alejandro, Todd, Yi, Tianyou, Andrew, Atm, share. It will help me and others better understand the viewpoint that some of you share due to prevous high bandwidth meetings. I'm certainly open to another meeting in the abstract to bring folks up to speed, but I'd still like to know what questions you have that haven't been addressed so far on the JIRA. So far I think that most of the questions you've been asking have already been discussed. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041020#comment-14041020 ] Lei (Eddy) Xu commented on HDFS-5546: - Maybe I misunderstand this JIRA. If printing FNF exception during printing out ls information is normal behavior as what {{/bin/ls}} do, the current {{trunk}} works correctly and thus it does not need to be fixed. race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Lei (Eddy) Xu Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC
[ https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6579: --- Attachment: HDFS-6579.patch TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC -- Key: HDFS-6579 URL: https://issues.apache.org/jira/browse/HDFS-6579 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0 Reporter: Jinghui Wang Attachments: HDFS-6579.patch SocketOutputStream closes its writer if it's partial written. But on PPC, after writing for some time, buf.capacity still equals buf.remaining. The reason might be what's written on PPC is buffered,so the buf.remaining will not change till a flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC
[ https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6579: --- Attachment: (was: HDFS-6579.patch) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC -- Key: HDFS-6579 URL: https://issues.apache.org/jira/browse/HDFS-6579 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0 Reporter: Jinghui Wang Attachments: HDFS-6579.patch SocketOutputStream closes its writer if it's partial written. But on PPC, after writing for some time, buf.capacity still equals buf.remaining. The reason might be what's written on PPC is buffered,so the buf.remaining will not change till a flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC
[ https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041047#comment-14041047 ] Jinghui Wang commented on HDFS-6579: Thanks for the prompt review. Yes, this is for PPC64 Linux. I have modified the patch per you suggestion. However, rather than introduce a method that is as extentsive as the getOSType method, I simply added the detection for PPC64 since there is no need for detecting other architectures yet. Please let me know if a more extensive method is necessary. TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC -- Key: HDFS-6579 URL: https://issues.apache.org/jira/browse/HDFS-6579 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0 Reporter: Jinghui Wang Attachments: HDFS-6579.patch SocketOutputStream closes its writer if it's partial written. But on PPC, after writing for some time, buf.capacity still equals buf.remaining. The reason might be what's written on PPC is buffered,so the buf.remaining will not change till a flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041062#comment-14041062 ] Jing Zhao commented on HDFS-6588: - HDFS-5322 should not be related to the failed test. In general, HDFS-5322 simply handles the same issue you're fixing in HDFS-6475, but for the RPC side. So please feel free to make any change you think necessary. Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041065#comment-14041065 ] Jing Zhao commented on HDFS-6562: - The new patch looks pretty good to me. +1 [~szetszwo], do you also want to take a look at the patch? Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure
[ https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6587: Summary: Bug in TestBPOfferService can cause test failure (was: Bug in TestBPOfferService blocks the trunk build) Bug in TestBPOfferService can cause test failure Key: HDFS-6587 URL: https://issues.apache.org/jira/browse/HDFS-6587 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Attachments: patch_TestBPOfferService.txt need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the trunk, e.g. in Build #1781. Details: in this test, the utility function waitForBlockReceived() has a bug: parameter mockNN is never used but hard-coded mockNN1 is used. This bug introduces undeterministic test failure when testBasicFunctionality() calls ret = waitForBlockReceived(FAKE_BLOCK, mockNN2); and the call finishes before the actual interaction with mockNN2 happens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure
[ https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6587: Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Target Version/s: 2.5.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this to trunk and branch-2. Thanks for the contribution [~timxzl]. Bug in TestBPOfferService can cause test failure Key: HDFS-6587 URL: https://issues.apache.org/jira/browse/HDFS-6587 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 3.0.0, 2.5.0 Attachments: patch_TestBPOfferService.txt need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the trunk, e.g. in Build #1781. Details: in this test, the utility function waitForBlockReceived() has a bug: parameter mockNN is never used but hard-coded mockNN1 is used. This bug introduces undeterministic test failure when testBasicFunctionality() calls ret = waitForBlockReceived(FAKE_BLOCK, mockNN2); and the call finishes before the actual interaction with mockNN2 happens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure
[ https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6587: Component/s: test Bug in TestBPOfferService can cause test failure Key: HDFS-6587 URL: https://issues.apache.org/jira/browse/HDFS-6587 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 3.0.0, 2.5.0 Attachments: patch_TestBPOfferService.txt need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the trunk, e.g. in Build #1781. Details: in this test, the utility function waitForBlockReceived() has a bug: parameter mockNN is never used but hard-coded mockNN1 is used. This bug introduces undeterministic test failure when testBasicFunctionality() calls ret = waitForBlockReceived(FAKE_BLOCK, mockNN2); and the call finishes before the actual interaction with mockNN2 happens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure
[ https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6587: Affects Version/s: 2.4.0 Bug in TestBPOfferService can cause test failure Key: HDFS-6587 URL: https://issues.apache.org/jira/browse/HDFS-6587 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Zhilei Xu Assignee: Zhilei Xu Labels: patch Fix For: 3.0.0, 2.5.0 Attachments: patch_TestBPOfferService.txt need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the trunk, e.g. in Build #1781. Details: in this test, the utility function waitForBlockReceived() has a bug: parameter mockNN is never used but hard-coded mockNN1 is used. This bug introduces undeterministic test failure when testBasicFunctionality() calls ret = waitForBlockReceived(FAKE_BLOCK, mockNN2); and the call finishes before the actual interaction with mockNN2 happens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure
[ https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6587: Labels: (was: patch) Bug in TestBPOfferService can cause test failure Key: HDFS-6587 URL: https://issues.apache.org/jira/browse/HDFS-6587 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Zhilei Xu Assignee: Zhilei Xu Fix For: 3.0.0, 2.5.0 Attachments: patch_TestBPOfferService.txt need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the trunk, e.g. in Build #1781. Details: in this test, the utility function waitForBlockReceived() has a bug: parameter mockNN is never used but hard-coded mockNN1 is used. This bug introduces undeterministic test failure when testBasicFunctionality() calls ret = waitForBlockReceived(FAKE_BLOCK, mockNN2); and the call finishes before the actual interaction with mockNN2 happens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC
[ https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6579: Assignee: Jinghui Wang Hadoop Flags: Reviewed Status: Patch Available (was: Open) Thanks [~jwang302]. +1 pending Jenkins. TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC -- Key: HDFS-6579 URL: https://issues.apache.org/jira/browse/HDFS-6579 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0, 2.3.0, 2.2.0, 2.0.4-alpha, 2.1.0-beta Reporter: Jinghui Wang Assignee: Jinghui Wang Attachments: HDFS-6579.patch SocketOutputStream closes its writer if it's partial written. But on PPC, after writing for some time, buf.capacity still equals buf.remaining. The reason might be what's written on PPC is buffered,so the buf.remaining will not change till a flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6587) Bug in TestBPOfferService can cause test failure
[ https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041092#comment-14041092 ] Hudson commented on HDFS-6587: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5754 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5754/]) HDFS-6587. Bug in TestBPOfferService can cause test failure. (Contributed by Zhilei Xu) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604899) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java Bug in TestBPOfferService can cause test failure Key: HDFS-6587 URL: https://issues.apache.org/jira/browse/HDFS-6587 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Zhilei Xu Assignee: Zhilei Xu Fix For: 3.0.0, 2.5.0 Attachments: patch_TestBPOfferService.txt need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the trunk, e.g. in Build #1781. Details: in this test, the utility function waitForBlockReceived() has a bug: parameter mockNN is never used but hard-coded mockNN1 is used. This bug introduces undeterministic test failure when testBasicFunctionality() calls ret = waitForBlockReceived(FAKE_BLOCK, mockNN2); and the call finishes before the actual interaction with mockNN2 happens. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041112#comment-14041112 ] Yongjun Zhang commented on HDFS-6588: - Hi [~jingzhao], thanks a lot for the comments. Sorry I didn't make it clear. What I wanted to say was that the getTrueCause method is part of the HDFS-5322 work, the reported tests failed here because of removing getTrueCause(). We could modify the tests to make them pass, but my worry was that the tests were set up to capture real user scenario, changing the test setup might make them no longer reflect real user scenario. Based on your answer above, however, I guess we could just modify the tests accordingly after removing the getTrueCause() method. Please correct me if I'm wrong. Thanks. Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041119#comment-14041119 ] Jing Zhao commented on HDFS-6588: - Yeah, also the failed tests were not introduced by HDFS-5322 actually. Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC
[ https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041155#comment-14041155 ] Hadoop QA commented on HDFS-6579: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652012/HDFS-6579.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7213//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7213//console This message is automatically generated. TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC -- Key: HDFS-6579 URL: https://issues.apache.org/jira/browse/HDFS-6579 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0 Reporter: Jinghui Wang Assignee: Jinghui Wang Attachments: HDFS-6579.patch SocketOutputStream closes its writer if it's partial written. But on PPC, after writing for some time, buf.capacity still equals buf.remaining. The reason might be what's written on PPC is buffered,so the buf.remaining will not change till a flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041159#comment-14041159 ] Colin Patrick McCabe commented on HDFS-5546: I think what Daryn is advocating is that when attempting to recurse into a directory, we should catch IOE for the {{listStatus}} operation, not just FNF. Although this makes sense to me, there is a bit of a fly in the ointment-- if we have a glob expression like {{/\*/\*}}, the Globber internally will throw an exception if there is a path error while resolving the globs. For example, if you have {{/a/b/c}} and {{/a/r/c}}, and /a/r is inaccessible to you, {{ls /\*/\*/c}} will fail with an {{AccessControlException}} before displaying anything. This behavior has existed basically forever in the globber code (it wasn't added by the globber rewrite) and unfortunately, there is no good way to fix it now. The problem is that there is no way to indicate that we got an error other than throwing an exception, and an exception terminates the whole glob operation, even if there were other valid results. So in the interest of consistency, perhaps we should keep things the way they are, and only catch FNF? {{ls /a/b/c /a/r/c}} seems similar conceptually to {{ls /\*/\*/c}}... it is tricky to explain why an exception should terminate one but not the other... Eddy, can you take a look at the internal JIRA that prompted this and see if it was user error? I'm less and less convinced we should change {{ls -R}}... race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Lei (Eddy) Xu Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041171#comment-14041171 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], Would like to check with you, you mentioned {quote} If it turns out to be a lot more complicated, then perhaps a followup jira is ok {quote} Based on the information we have so far, the work involved is to remove getTrueCause, and replace the retrievePassword with retriableRetrievePassword, changing the interface spec of relevant methods because retriableRetrievePassword throws more exceptions, removing getTrueCause method, fixing the test failures reported in HDFS-6588. I hope you'd agree that it's appropriate to dedicate HDFS-6588 for the above mentioned work, and use the lastest patch I posted for HDFS-6475 to handle the SecurityException that UserProvider throws. Would you please comment again? Thanks. BTW, thanks [~jingzhao] for clarifying things in HDFS-6588 (sorry I had a typo in last update as 6589). WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, HDFS-6475.008.patch, HDFS-6475.009.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
Jing Zhao created HDFS-6593: --- Summary: Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6593: Status: Patch Available (was: Open) Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6593: Attachment: HDFS-6593.000.patch Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041337#comment-14041337 ] Yongjun Zhang commented on HDFS-6588: - Thanks Jing. I found that the getTrueCause() method interacts with the following code {code} @Override public byte[] retrievePassword( DelegationTokenIdentifier identifier) throws InvalidToken { try { // this check introduces inconsistency in the authentication to a // HA standby NN. non-token auths are allowed into the namespace which // decides whether to throw a StandbyException. tokens are a bit // different in that a standby may be behind and thus not yet know // of all tokens issued by the active NN. the following check does // not allow ANY token auth, however it should allow known tokens in namesystem.checkOperation(OperationCategory.READ); } catch (StandbyException se) { // FIXME: this is a hack to get around changing method signatures by // tunneling a non-InvalidToken exception as the cause which the // RPC server will unwrap before returning to the client InvalidToken wrappedStandby = new InvalidToken(StandbyException); wrappedStandby.initCause(se); throw wrappedStandby; } {code} in DelegationTokenSecretManager.java introduced by HADOOP-9880. If we remove the getTrueCause() logic, at minimum, still need to retain the logic (currently in getTrueCause) to return the InvalidToken exception that's wrapped by SaslException. Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6578) add toString method to DatanodeStorage etc for easier debugging
[ https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041338#comment-14041338 ] Arpit Agarwal commented on HDFS-6578: - Your original understanding was correct. i.e. 1-3 are valid. I don't want to spend more time on the exact wording of one comment and your comment is clearer than no comment at all. I will commit your v2 patch. +1 add toString method to DatanodeStorage etc for easier debugging --- Key: HDFS-6578 URL: https://issues.apache.org/jira/browse/HDFS-6578 Project: Hadoop HDFS Issue Type: Bug Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch It seems to be nice to add a toString() method for DatanodeStorage class, so we can print out its key info easier while doing debuging. Another thing is, in the end of BlockManager#processReport, there is the following message, {code} blockLog.info(BLOCK* processReport: from storage + storage.getStorageID() + node + nodeID + , blocks: + newReport.getNumberOfBlocks() + , processing time: + (endTime - startTime) + msecs); return !node.hasStaleStorages(); {code} We could add node.hasStaleStorages() to the log, and possibly replace storage.getSorateID() with the suggested storage.toString(). Any comments? thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041339#comment-14041339 ] Haohui Mai commented on HDFS-6593: -- The patch looks good to me. Is it possible to refactor the code of {{FSNameSystem.getSnapshotDiffReport}} in this patch as well, so that {{SnapshotDiffInfo}} can be declared as a package-local class? Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6578) add toString method to DatanodeStorage etc for easier debugging
[ https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6578: Issue Type: Improvement (was: Bug) add toString method to DatanodeStorage etc for easier debugging --- Key: HDFS-6578 URL: https://issues.apache.org/jira/browse/HDFS-6578 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch It seems to be nice to add a toString() method for DatanodeStorage class, so we can print out its key info easier while doing debuging. Another thing is, in the end of BlockManager#processReport, there is the following message, {code} blockLog.info(BLOCK* processReport: from storage + storage.getStorageID() + node + nodeID + , blocks: + newReport.getNumberOfBlocks() + , processing time: + (endTime - startTime) + msecs); return !node.hasStaleStorages(); {code} We could add node.hasStaleStorages() to the log, and possibly replace storage.getSorateID() with the suggested storage.toString(). Any comments? thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6578) add toString method to DatanodeStorage for easier debugging
[ https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6578: Summary: add toString method to DatanodeStorage for easier debugging (was: add toString method to DatanodeStorage etc for easier debugging) add toString method to DatanodeStorage for easier debugging --- Key: HDFS-6578 URL: https://issues.apache.org/jira/browse/HDFS-6578 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch It seems to be nice to add a toString() method for DatanodeStorage class, so we can print out its key info easier while doing debuging. Another thing is, in the end of BlockManager#processReport, there is the following message, {code} blockLog.info(BLOCK* processReport: from storage + storage.getStorageID() + node + nodeID + , blocks: + newReport.getNumberOfBlocks() + , processing time: + (endTime - startTime) + msecs); return !node.hasStaleStorages(); {code} We could add node.hasStaleStorages() to the log, and possibly replace storage.getSorateID() with the suggested storage.toString(). Any comments? thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6578) add toString method to DatanodeStorage for easier debugging
[ https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6578: Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Target Version/s: 2.5.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this to trunk and branch-2. Thanks for the improvement [~yzhangal]! add toString method to DatanodeStorage for easier debugging --- Key: HDFS-6578 URL: https://issues.apache.org/jira/browse/HDFS-6578 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch It seems to be nice to add a toString() method for DatanodeStorage class, so we can print out its key info easier while doing debuging. Another thing is, in the end of BlockManager#processReport, there is the following message, {code} blockLog.info(BLOCK* processReport: from storage + storage.getStorageID() + node + nodeID + , blocks: + newReport.getNumberOfBlocks() + , processing time: + (endTime - startTime) + msecs); return !node.hasStaleStorages(); {code} We could add node.hasStaleStorages() to the log, and possibly replace storage.getSorateID() with the suggested storage.toString(). Any comments? thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-5546: Attachment: HDFS-5546.2.004.patch This patch captures {{IOException}} instead of {{FNF}} based on the first patch's logic, as [~daryn] suggested. race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Lei (Eddy) Xu Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, HDFS-5546.2.004.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041352#comment-14041352 ] Lei (Eddy) Xu commented on HDFS-5546: - [~daryn] was right on this one, we should just replace FNF to IOException in the first patch. Two test cases to verify the expected behaviors are added though. [~cmccabe] shouldn't the {{globStatus()}} be out of scope for this JIRA? Maybe we should open another related JIRA? race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Lei (Eddy) Xu Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, HDFS-5546.2.004.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6593: Attachment: HDFS-6593.001.patch Thanks for the review, Haohui! Update the patch to address your comments. Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6578) add toString method to DatanodeStorage for easier debugging
[ https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041354#comment-14041354 ] Hudson commented on HDFS-6578: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5755 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5755/]) HDFS-6578. add toString method to DatanodeStorage for easier debugging. (Contributed by Yongjun Zhang) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604942) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java add toString method to DatanodeStorage for easier debugging --- Key: HDFS-6578 URL: https://issues.apache.org/jira/browse/HDFS-6578 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch It seems to be nice to add a toString() method for DatanodeStorage class, so we can print out its key info easier while doing debuging. Another thing is, in the end of BlockManager#processReport, there is the following message, {code} blockLog.info(BLOCK* processReport: from storage + storage.getStorageID() + node + nodeID + , blocks: + newReport.getNumberOfBlocks() + , processing time: + (endTime - startTime) + msecs); return !node.hasStaleStorages(); {code} We could add node.hasStaleStorages() to the log, and possibly replace storage.getSorateID() with the suggested storage.toString(). Any comments? thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6594) Use inodes to determine membership in an encryption zone
Charles Lamb created HDFS-6594: -- Summary: Use inodes to determine membership in an encryption zone Key: HDFS-6594 URL: https://issues.apache.org/jira/browse/HDFS-6594 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb We should use inodes to determine if a path is in an ez, rather than string parsing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6565) Use jackson instead jetty json in hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6565: Attachment: HDFS-6565.patch Use jackson instead jetty json in hdfs-client - Key: HDFS-6565 URL: https://issues.apache.org/jira/browse/HDFS-6565 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Akira AJISAKA Attachments: HDFS-6565.patch hdfs-client should use Jackson instead of jetty to parse JSON. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6578) add toString method to DatanodeStorage for easier debugging
[ https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041395#comment-14041395 ] Yongjun Zhang commented on HDFS-6578: - Thanks a lot [~arpitagarwal]! add toString method to DatanodeStorage for easier debugging --- Key: HDFS-6578 URL: https://issues.apache.org/jira/browse/HDFS-6578 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch It seems to be nice to add a toString() method for DatanodeStorage class, so we can print out its key info easier while doing debuging. Another thing is, in the end of BlockManager#processReport, there is the following message, {code} blockLog.info(BLOCK* processReport: from storage + storage.getStorageID() + node + nodeID + , blocks: + newReport.getNumberOfBlocks() + , processing time: + (endTime - startTime) + msecs); return !node.hasStaleStorages(); {code} We could add node.hasStaleStorages() to the log, and possibly replace storage.getSorateID() with the suggested storage.toString(). Any comments? thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6565) Use jackson instead jetty json in hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041409#comment-14041409 ] Akira AJISAKA commented on HDFS-6565: - Attaching a patch to remove jetty json library from JsonUtil and WebHdfsFileSystem. The way Jackson parse JSON number is different from jetty json: * Jackson: number - Integer, Long, or BigInteger (smallest applicable) * jetty json: number - Long so I changed the code for parsing JSON number {code} (Long) m.get(blockId) // doesn't work if m.get(blockId) is Integer {code} to {code} ((Number) m.get(blockId)).longValue() // support all classes extends Number {code} In addition, the way Jackson parse JSON array is different from jetty json: * Jackson: array - ArrayListObject * jetty json: array - Object[] so I changed the code for parsing JSON array {code} (Object[]) m.get(locatedBlocks) {code} to {code} (ListObject) m.get(locatedBlocks) {code} Use jackson instead jetty json in hdfs-client - Key: HDFS-6565 URL: https://issues.apache.org/jira/browse/HDFS-6565 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Akira AJISAKA Attachments: HDFS-6565.patch hdfs-client should use Jackson instead of jetty to parse JSON. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6565) Use jackson instead jetty json in hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6565: Status: Patch Available (was: Open) Use jackson instead jetty json in hdfs-client - Key: HDFS-6565 URL: https://issues.apache.org/jira/browse/HDFS-6565 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Akira AJISAKA Attachments: HDFS-6565.patch hdfs-client should use Jackson instead of jetty to parse JSON. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041422#comment-14041422 ] Tsz Wo Nicholas Sze commented on HDFS-6562: --- Patch looks good. Some comments: - Before the patch, the first if-statement below thows to an exception. But it will return false after the patch. {code} -if (srcInode.isSymlink() -dst.equals(srcInode.asSymlink().getSymlinkString())) { - throw new FileAlreadyExistsException( - Cannot rename symlink +src+ to its target +dst); -} - -// dst cannot be directory or a file under src -if (dst.startsWith(src) -dst.charAt(src.length()) == Path.SEPARATOR_CHAR) { - NameNode.stateChangeLog.warn(DIR* FSDirectory.unprotectedRenameTo: - + failed to rename + src + to + dst - + because destination starts with src); + +try { + validateRenameDestination(src, dst, srcInode); +} catch (IOException ignored) { return false; } {code} - prepare() should be combined with the RenameOperation constructor. Then all the fields except srcChild can be changed to final. Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041420#comment-14041420 ] Hadoop QA commented on HDFS-5546: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.ha.TestZKFailoverControllerStress {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7216//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7216//console This message is automatically generated. race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Lei (Eddy) Xu Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, HDFS-5546.2.004.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041429#comment-14041429 ] Sanjay Radia commented on HDFS-6134: I believe the transparent encryption will break the HAR file system. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6562: - Attachment: HDFS-6562.005.patch Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, HDFS-6562.005.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041440#comment-14041440 ] Haohui Mai commented on HDFS-6562: -- Uploaded v6 patch to address [~szetszwo]'s comments. The changes about the symlinks are intentional. Please correct me if I'm wrong, but it looks to me that the old rename prefers to returning {{false}} instead of throwing exceptions. We can change this behavior without introducing backward compatibility issues since symlink is only available in trunk. Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, HDFS-6562.005.patch, HDFS-6562.006.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6562: - Attachment: HDFS-6562.006.patch Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, HDFS-6562.005.patch, HDFS-6562.006.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041449#comment-14041449 ] Sanjay Radia commented on HDFS-6134: bq. Vanilla distcp will just work with transparent encryption. Data will be decrypted on read and encrypted on write, assuming both source and target are in encrypted zones. ...The proposal on changing distcp is to enable a second use used case. Alejandro, Aaron the general practice is not to give the admins running distcp access to keys. Hence, as you suggest, we could change distcp so that it does not use transparent decryption by default; however, there may be other such backup tools and applications that customers and other vendors may have written and we would be breaking them. This may also break the HAR filesystem. Aaron, you took on a very strong position that transparent decryption/reencryption is is exactly what one wants. I am missing this - what are the use cases for distcp where one wants transparent decryption/reencryption? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041456#comment-14041456 ] Siqi Li commented on HDFS-6527: --- When running unit tests in this patch v5, I get the following errors 2014-06-23 13:36:09,516 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(873)) - Failed to close file /testDeleteAddBlockRace org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /testDeleteAddBlockRace: File does not exist. Holder DFSClient_NONMAPREDUCE_1652233532_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2941) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2762) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:585) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy17.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy17.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1443) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1265) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:529) Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Fix For: 2.4.1 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6584) Support archival storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6584: -- Attachment: HDFSArchivalStorageDesign20140623.pdf HDFSArchivalStorageDesign20140623.pdf: design doc. Support archival storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFSArchivalStorageDesign20140623.pdf In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6560) Byte array native checksumming on DN side
[ https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041477#comment-14041477 ] Colin Patrick McCabe commented on HDFS-6560: {code} + public static void verifyChunkedSumsByteArray(int bytesPerSum, + int checksumType, byte[] sums, int sumsOffset, byte[] data, + int dataOffset, int dataLength, String fileName, long basePos) + throws ChecksumException { +nativeVerifyChunkedSumsByteArray(bytesPerSum, checksumType, +sums, sumsOffset, +data, dataOffset, dataLength, +fileName, basePos); + } {code} What's the purpose of this wrapper function? It just passes all its arguments directly to the other function. Public functions can have the native annotation too. {code} + sums_addr = (*env)-GetPrimitiveArrayCritical(env, j_sums, NULL); + data_addr = (*env)-GetPrimitiveArrayCritical(env, j_data, NULL); + + if (unlikely(!sums_addr || !data_addr)) { +THROW(env, java/lang/OutOfMemoryError, + not enough memory for byte arrays in JNI code); +return; + } {code} This is going to leak memory if {{GetPrimitiveArrayCritical}} succeeds for {{sums_addr}} but not for {{data_addr}}. Byte array native checksumming on DN side - Key: HDFS-6560 URL: https://issues.apache.org/jira/browse/HDFS-6560 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-3528.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5369) Support negative caching of user-group mapping
[ https://issues.apache.org/jira/browse/HDFS-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041484#comment-14041484 ] Lei (Eddy) Xu commented on HDFS-5369: - [~andrew.wang] What should be an expected behavior for the negative caching here? I am currently thinking of a solution that if {{getGroups()}} returns empty list, we assign a much shorter expiration period for the cached item (e.g., 30 seconds instead of 4 hours), so that a transient failure might be handled. Just wondering whether it is realistic in production? Support negative caching of user-group mapping -- Key: HDFS-5369 URL: https://issues.apache.org/jira/browse/HDFS-5369 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Andrew Wang We've seen a situation at a couple of our customers where interactions from an unknown user leads to a high-rate of group mapping calls. In one case, this was happening at a rate of 450 calls per second with the shell-based group mapping, enough to severely impact overall namenode performance and also leading to large amounts of log spam (prints a stack trace each time). Let's consider negative caching of group mapping, as well as quashing the rate of this log message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6570) add api that enables checking if a user has certain permissions on a file
[ https://issues.apache.org/jira/browse/HDFS-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041492#comment-14041492 ] Colin Patrick McCabe commented on HDFS-6570: bq. Note that the man page for access clearly spells out the risk of time-of-check/time-of-use race conditions. This API is only going to be useful for systems implementing their own authorization enforcement on top of HDFS files, and only if those systems consider the risk acceptable. Let's make sure that we spell out the risks in the API. In fact, I wonder if we should we make this {{\@LimitedPrivate}} between Hive and HDFS. The man page for the {{access}} system call is pretty blunt on my machine: the use of this system call should be avoided. add api that enables checking if a user has certain permissions on a file - Key: HDFS-6570 URL: https://issues.apache.org/jira/browse/HDFS-6570 Project: Hadoop HDFS Issue Type: Bug Reporter: Thejas M Nair Assignee: Chris Nauroth For some of the authorization modes in Hive, the servers in Hive check if a given user has permissions on a certain file or directory. For example, the storage based authorization mode allows hive table metadata to be modified only when the user has access to the corresponding table directory on hdfs. There are likely to be such use cases outside of Hive as well. HDFS does not provide an api for such checks. As a result, the logic to check if a user has permissions on a directory gets replicated in Hive. This results in duplicate logic and there introduces possibilities for inconsistencies in the interpretation of the permission model. This becomes a bigger problem with the complexity of ACL logic. HDFS should provide an api that provides functionality that is similar to access function in unistd.h - http://linux.die.net/man/2/access . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041495#comment-14041495 ] Hadoop QA commented on HDFS-6593: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652059/HDFS-6593.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7214//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7214//console This message is automatically generated. Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5369) Support negative caching of user-group mapping
[ https://issues.apache.org/jira/browse/HDFS-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041497#comment-14041497 ] Andrew Wang commented on HDFS-5369: --- Hey [~eddyxu], 30s sounds okay to me, maybe even a bit longer than that (i.e. 1 or 2 min). [~kihwal] might be able to make a quick comment about this, since he mentioned tight job SLAs in HADOOP-8088. HADOOP-8088 also mentions handling error codes indicative of a transient error differently, so let's keep that in mind here too. Would also still be good to squish the stack trace if possible too, since it's not very useful. Support negative caching of user-group mapping -- Key: HDFS-5369 URL: https://issues.apache.org/jira/browse/HDFS-5369 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Andrew Wang We've seen a situation at a couple of our customers where interactions from an unknown user leads to a high-rate of group mapping calls. In one case, this was happening at a rate of 450 calls per second with the shell-based group mapping, enough to severely impact overall namenode performance and also leading to large amounts of log spam (prints a stack trace each time). Let's consider negative caching of group mapping, as well as quashing the rate of this log message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041501#comment-14041501 ] Haohui Mai commented on HDFS-6593: -- +1 on the latest patch, pending jenkins. Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041507#comment-14041507 ] Tsz Wo Nicholas Sze commented on HDFS-6562: --- Thanks for the explanation. Returning false sounds good. For the new patch, - There are two srcChild = srcIIP.getLastINode() in the RenameOperation constructor. - The field srcRefDstSnapshot can be changed to final. Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, HDFS-6562.005.patch, HDFS-6562.006.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6562: - Attachment: HDFS-6562.007.patch Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, HDFS-6562.005.patch, HDFS-6562.006.patch, HDFS-6562.007.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6562: -- Priority: Minor (was: Major) Hadoop Flags: Reviewed +1 the new patch looks good. Refactor rename() in FSDirectory Key: HDFS-6562 URL: https://issues.apache.org/jira/browse/HDFS-6562 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, HDFS-6562.005.patch, HDFS-6562.006.patch, HDFS-6562.007.patch Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. Both implementation shares quite a bit of common code. This jira proposes to clean up these two variants and extract the common code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side
[ https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041522#comment-14041522 ] Andrew Wang commented on HDFS-6561: --- +1 sounds good to me, pretty sure that flushes are normally bigger than 100B. Not really a usecase we're optimized for anyway. Byte array native checksumming on client side - Key: HDFS-6561 URL: https://issues.apache.org/jira/browse/HDFS-6561 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: James Thomas Assignee: James Thomas -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041524#comment-14041524 ] Alejandro Abdelnur commented on HDFS-6134: -- [~sanjay.radia], Can you be a bit more specific on HAR breaking? Regarding distcp, you want to support both modes: raw copies, without d/e for admins running distcp. Regular copies, with e/d to copy data in/out or an encryption zone, or to another encryption zone; and this within or across clusters. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041529#comment-14041529 ] Hadoop QA commented on HDFS-6593: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652071/HDFS-6593.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/7215//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSClientRetries {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7215//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7215//console This message is automatically generated. Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3528) Use native CRC32 in DFS write path
[ https://issues.apache.org/jira/browse/HDFS-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041545#comment-14041545 ] Tsz Wo Nicholas Sze commented on HDFS-3528: --- Is the native library faster than the Java implementation only for CRC32C but not CRC32? Use native CRC32 in DFS write path -- Key: HDFS-3528 URL: https://issues.apache.org/jira/browse/HDFS-3528 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, performance Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: James Thomas HDFS-2080 improved the CPU efficiency of the read path by using native SSE-enabled code for CRC verification. Benchmarks of the write path show that it's often CPU bound by checksums as well, so we should make the same improvement there. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable
[ https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6593: Attachment: HDFS-6593.002.patch Fix the failed unit test and javadoc. Move SnapshotDiffInfo out of INodeDirectorySnapshottable Key: HDFS-6593 URL: https://issues.apache.org/jira/browse/HDFS-6593 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch, HDFS-6593.002.patch Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of INodeDirectorySnapshottable as an individual class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6592) Use Fluent to collect data to append to HDFS. Throw the AlreadyBeingCreatedException exception
[ https://issues.apache.org/jira/browse/HDFS-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jack updated HDFS-6592: --- Description: We use Fluent to collect log data. The log data append to the files in HDFS. The cluster configuration: Namenode : namenode1(hostname) secondnamenode: namenode2 3 datanodes: datanode1, datanode2, datanode3 3 replications Every few days, suffere from the following exception: Exception in nameNode1: 2014-06-22 09:54:41,892 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.append: Failed to create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] 2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] Exception in DataNode1: 2014-06-22 09:54:45,771 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-06-22 09:54:45,813 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2441) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) According to the log, we infer the flow of the exception: 1. Namenode update pipeline with just one datanode namenode1 log: 2014-06-22 09:54:16,604 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline (block=BP-1611177164-datanode1-1399894698024:blk_1074496235_1935947, newGenerationStamp=1935951, newLength=98839816, newNodes=[datanode1:50010], clientName=DFSClient_NONMAPREDUCE_349196146_2027206) 2. datanode1 throw exception during close. datanode1 log: 2014-06-22 09:54:26,569 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file_name retrying... 3. The subsequent collected data from Fluent will triger another DFSClient to append to the same file. namenode1 log: 2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to create file [file_name] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [datanode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [datanode1] 4. The subsequent DFSClient will triger to recover the Lease every LEASE_SOFTLIMIT_PERIOD namenode1 log: 2014-06-22 09:58:34,722 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover [Lease. Holder: DFSClient_NONMAPREDUCE_349196146_2027206, pendingcreates: 1], src=file_name client DFSClient_NONMAPREDUCE_349196146_2027206 5. Fail to recover the lease. namenode1 log: 2014-06-22 09:58:34,722
[jira] [Commented] (HDFS-6430) HTTPFS - Implement XAttr support
[ https://issues.apache.org/jira/browse/HDFS-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041561#comment-14041561 ] Alejandro Abdelnur commented on HDFS-6430: -- [~hitliuyi], my question was if we are testing the behavior of the xattr methods when they are switched off. Other than that LGTM. HTTPFS - Implement XAttr support Key: HDFS-6430 URL: https://issues.apache.org/jira/browse/HDFS-6430 Project: Hadoop HDFS Issue Type: Task Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0 Attachments: HDFS-6430.1.patch, HDFS-6430.2.patch, HDFS-6430.3.patch, HDFS-6430.4.patch, HDFS-6430.patch Add xattr support to HttpFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6595) Configure the maximum threads allowed for balancing on datanodes
Benoy Antony created HDFS-6595: -- Summary: Configure the maximum threads allowed for balancing on datanodes Key: HDFS-6595 URL: https://issues.apache.org/jira/browse/HDFS-6595 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Benoy Antony Assignee: Benoy Antony Currently datanode allows a max of 5 threads to be used for balancing. In some cases, , it may make sense to use a different number of threads to the purpose of moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6595) Configure the maximum threads allowed for balancing on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6595: --- Attachment: HDFS-6595.patch Attaching the patch which adds new configuration - _dfs.datanode.balance.max.concurrent.moves_ . The number of threads is set based on this parameter Configure the maximum threads allowed for balancing on datanodes Key: HDFS-6595 URL: https://issues.apache.org/jira/browse/HDFS-6595 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6595.patch Currently datanode allows a max of 5 threads to be used for balancing. In some cases, , it may make sense to use a different number of threads to the purpose of moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6595) Configure the maximum threads allowed for balancing on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6595: --- Status: Patch Available (was: Open) Configure the maximum threads allowed for balancing on datanodes Key: HDFS-6595 URL: https://issues.apache.org/jira/browse/HDFS-6595 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6595.patch Currently datanode allows a max of 5 threads to be used for balancing. In some cases, , it may make sense to use a different number of threads to the purpose of moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6525) FsShell supports HDFS TTL
[ https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041572#comment-14041572 ] Zesheng Wu commented on HDFS-6525: -- Thanks [~daryn], I will update the patch to address your comments immediately. FsShell supports HDFS TTL - Key: HDFS-6525 URL: https://issues.apache.org/jira/browse/HDFS-6525 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6525.1.patch This issue is used to track development of supporting HDFS TTL for FsShell, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6565) Use jackson instead jetty json in hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041608#comment-14041608 ] Hadoop QA commented on HDFS-6565: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652077/HDFS-6565.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7217//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7217//console This message is automatically generated. Use jackson instead jetty json in hdfs-client - Key: HDFS-6565 URL: https://issues.apache.org/jira/browse/HDFS-6565 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Akira AJISAKA Attachments: HDFS-6565.patch hdfs-client should use Jackson instead of jetty to parse JSON. -- This message was sent by Atlassian JIRA (v6.2#6252)