date:20140623


[ 
https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040479#comment-14040479
 ] 

Haohui Mai commented on HDFS-6580:
--

Looks good to me. +1

 FSNamesystem.mkdirsInt uses raw dir.getFileInfo() to getAuditFileInfo()
 ---

 Key: HDFS-6580
 URL: https://issues.apache.org/jira/browse/HDFS-6580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt


 In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file 
 info for auditing purpose. getAuditFileInfo() returns null when auditing is 
 disabled, and calls dir.getFileInfo() when auditing is enabled.
 One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to 
 get file info for auditing. Should change to getAuditFileInfo().
 Note that another internal API, startFileInt() uses dir.getFileInfo() 
 correctly, because the returned file stat is returned out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper


 [ 
https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6580:
-

Summary: FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper  
(was: FSNamesystem.mkdirsInt uses raw dir.getFileInfo() to getAuditFileInfo())

 FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
 -

 Key: HDFS-6580
 URL: https://issues.apache.org/jira/browse/HDFS-6580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt


 In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file 
 info for auditing purpose. getAuditFileInfo() returns null when auditing is 
 disabled, and calls dir.getFileInfo() when auditing is enabled.
 One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to 
 get file info for auditing. Should change to getAuditFileInfo().
 Note that another internal API, startFileInt() uses dir.getFileInfo() 
 correctly, because the returned file stat is returned out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper


 [ 
https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6580:
-

   Resolution: Fixed
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~timxzl] for the 
contribution.

 FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
 -

 Key: HDFS-6580
 URL: https://issues.apache.org/jira/browse/HDFS-6580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 2.5.0

 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt


 In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file 
 info for auditing purpose. getAuditFileInfo() returns null when auditing is 
 disabled, and calls dir.getFileInfo() when auditing is enabled.
 One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to 
 get file info for auditing. Should change to getAuditFileInfo().
 Note that another internal API, startFileInt() uses dir.getFileInfo() 
 correctly, because the returned file stat is returned out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper


[ 
https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040483#comment-14040483
 ] 

Hudson commented on HDFS-6580:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5753 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5753/])
HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. 
Contributed bu Zhilei Xu. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
 -

 Key: HDFS-6580
 URL: https://issues.apache.org/jira/browse/HDFS-6580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 2.5.0

 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt


 In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file 
 info for auditing purpose. getAuditFileInfo() returns null when auditing is 
 disabled, and calls dir.getFileInfo() when auditing is enabled.
 One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to 
 get file info for auditing. Should change to getAuditFileInfo().
 Note that another internal API, startFileInt() uses dir.getFileInfo() 
 correctly, because the returned file stat is returned out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040606#comment-14040606
 ] 

Binglin Chang commented on HDFS-6586:
-

TestBalancerWithNodeGroup also failed before with the same reason(HDFS-6250), 
we fixed TestBalancerWithNodeGroup, but looks like TestBalancer have the same 
bug, and potentially also have bug HDFS-6250.

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040607#comment-14040607
 ] 

Binglin Chang commented on HDFS-6586:
-

bq. and potentially also have bug HDFS-6250
sorry, it was HDFS-6506

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:


Attachment: HDFS-6506.v2.patch

Update patch to add fix of bug in HDFS-6586, TestBalancer is affected by 
balancer.id file.

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by

[jira] [Resolved] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk


 [ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang resolved HDFS-6586.
-

Resolution: Duplicate

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040625#comment-14040625
 ] 

Binglin Chang commented on HDFS-6586:
-

I updated the patch in HDSF-6506 to fix the bug, close this jira as duplicate. 
Thanks for reporting this, Ted.

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6382) HDFS File/Directory TTL

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zesheng Wu updated HDFS-6382:
-

Attachment: HDFS-TTL-Design-3.pdf

Update the document according to the implementation.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design-3.pdf,
HDFS-TTL-Design.pdf

In production environment, we always have scenario like this, we want to
backup files on hdfs for some time and then hope to delete these files
automatically. For example, we keep only 1 day's logs on local disk due to
limited disk space, but we need to keep about 1 month's logs in order to
debug program bugs, so we keep all the logs on hdfs and delete logs which are
older than 1 month. This is a typical scenario of HDFS TTL. So here we
propose that hdfs can support TTL.
Following are some details of this proposal:
1. HDFS can support TTL on a specified file or directory
2. If a TTL is set on a file, the file will be deleted automatically after
the TTL is expired
3. If a TTL is set on a directory, the child files and directories will be
deleted automatically after the TTL is expired
4. The child file/directory's TTL configuration should override its parent
directory's
5. A global configuration is needed to configure that whether the deleted
files/directories should go to the trash or not
6. A global configuration is needed to configure that whether a directory
with TTL should be deleted when it is emptied by TTL mechanism or not.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL


 [ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6525:
-

Attachment: HDFS-6525.1.patch

Initial implementation.

 FsShell supports HDFS TTL
 -

 Key: HDFS-6525
 URL: https://issues.apache.org/jira/browse/HDFS-6525
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6525.1.patch


 This issue is used to track development of supporting  HDFS TTL for FsShell, 
 for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL


 [ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6525:
-

Target Version/s: 2.5.0
  Status: Patch Available  (was: Open)

 FsShell supports HDFS TTL
 -

 Key: HDFS-6525
 URL: https://issues.apache.org/jira/browse/HDFS-6525
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6525.1.patch


 This issue is used to track development of supporting  HDFS TTL for FsShell, 
 for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager


 [ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6526:
-

Attachment: HDFS-6526.1.patch

Initial implementation, the unit test depends on HDFS-6525, so we should commit 
HDFS-6525 before commit this.

 Implement HDFS TtlManager
 -

 Key: HDFS-6526
 URL: https://issues.apache.org/jira/browse/HDFS-6526
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6526.1.patch


 This issue is used to track development of HDFS TtlManager, for details see 
 HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager


 [ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6526:
-

Target Version/s: 2.5.0
  Status: Patch Available  (was: Open)

 Implement HDFS TtlManager
 -

 Key: HDFS-6526
 URL: https://issues.apache.org/jira/browse/HDFS-6526
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6526.1.patch


 This issue is used to track development of HDFS TtlManager, for details see 
 HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040644#comment-14040644
]

Hudson commented on HDFS-6507:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #592 (See
[https://builds.apache.org/job/Hadoop-Yarn-trunk/592/])
HDFS-6507. Improve DFSAdmin to support HA cluster better. (Contributd by
Zesheng Wu) (vinayakumarb:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604692)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml

Improve DFSAdmin to support HA cluster better
-

Key: HDFS-6507
URL: https://issues.apache.org/jira/browse/HDFS-6507
Project: Hadoop HDFS
Issue Type: Improvement
Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
Fix For: 2.5.0

Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch,
HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch,
HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch

Currently, the commands supported in DFSAdmin can be classified into three
categories according to the protocol used:
1. ClientProtocol
Commands in this category generally implement by calling the corresponding
function of the DFSClient class, and will call the corresponding remote
implementation function at the NN side finally. At the NN side, all these
operations are classified into five categories: UNCHECKED, READ, WRITE,
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
allows UNCHECKED operations. In the current implementation of DFSClient, it
will connect one NN first, if the first NN is not Active and the operation is
not allowed, it will failover to the second NN. So here comes the problem,
some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage,
refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as
UNCHECKED operations, and when executing these commands in the DFSAdmin
command line, they will be sent to a definite NN, no matter it is Active or
Standby. This may result in two problems:
a. If the first tried NN is standby, and the operation takes effect only on
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on
only one NN. In the future, when there is a NN failover, there may have
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations,
or if the command needs to take effect on both NN, we should send the request
to both Active and Standby NNs.
2. Refresh protocols: RefreshAuthorizationPolicyProtocol,
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol,
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl,
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and
refreshCallQueue, are implemented by creating a corresponding RPC proxy and
sending the request to remote NN. In the current implementation, these
requests will be sent to a definite NN, no matter it is Active or Standby.
Here I propose that we sent these requests to both NNs.
3. ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper


[ 
https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040643#comment-14040643
 ] 

Hudson commented on HDFS-6580:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #592 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/592/])
HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. 
Contributed bu Zhilei Xu. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
 -

 Key: HDFS-6580
 URL: https://issues.apache.org/jira/browse/HDFS-6580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 2.5.0

 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt


 In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file 
 info for auditing purpose. getAuditFileInfo() returns null when auditing is 
 disabled, and calls dir.getFileInfo() when auditing is enabled.
 One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to 
 get file info for auditing. Should change to getAuditFileInfo().
 Note that another internal API, startFileInt() uses dir.getFileInfo() 
 correctly, because the returned file stat is returned out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040683#comment-14040683
]

Zesheng Wu commented on HDFS-6382:
--

Hi guys, I've uploaded an initial implementation on HDFS-6525 and HDFS-6526
separately, hope you can take a look at, any comments will be appreciated.
Thanks in advance.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6592) Use Fluent to collect data to append to HDFS. Throw the AlreadyBeingCreatedException exception

2014-06-23 Thread jack (JIRA)

jack created HDFS-6592:
--

 Summary: Use Fluent to collect data to append to HDFS. Throw the 
AlreadyBeingCreatedException exception
 Key: HDFS-6592
 URL: https://issues.apache.org/jira/browse/HDFS-6592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: jack


We use Fluent to collect log data. The log data append to the files in HDFS. 

The cluster configuration:

Namenode : namenode1(hostname)

secondnamenode: namenode2
3 datanodes: datanode1, datanode2, datanode3
3 replications

Every few days,  suffere from the following exception:

Exception in nameNode1:

2014-06-22 09:54:41,892 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.append: Failed to create file [file_nameX] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because 
this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] 
on client [dataNode1], because this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]

Exception in DataNode1:

2014-06-22 09:54:45,771 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
Unable to close file because the last block does not have enough number of 
replicas.
2014-06-22 09:54:45,813 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file [file_nameX] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because 
this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2441)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

According to the log, we infer the flow of the exception:

1. Namenode update pipeline with just one datanode

namenode1 log: 2014-06-22 09:54:16,604 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline  
(block=BP-1611177164-datanode1-1399894698024:blk_1074496235_1935947, 
newGenerationStamp=1935951, newLength=98839816, newNodes=[datanode1:50010], 
clientName=DFSClient_NONMAPREDUCE_349196146_2027206)

2. datanode1 throw exception during close.

datanode1 log: 2014-06-22 09:54:26,569 INFO 
org.apache.hadoop.hdfs.DFSClient: Could not complete file_name retrying...

3. The subsequent collected data from  Fluent will triger another DFSClient to 
append to the same file.

namenode1 log: 2014-06-22 09:54:41,892 WARN 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: 
Failed to create file [file_name] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client 
[datanode1], because this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [datanode1]

4. The subsequent DFSClient will triger to recover the Lease every 
LEASE_SOFTLIMIT_PERIOD

namenode1 log: 2014-06-22 09:58:34,722 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover [Lease. 
 Holder:

[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper


[ 
https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040751#comment-14040751
 ] 

Hudson commented on HDFS-6580:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1783 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1783/])
HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. 
Contributed bu Zhilei Xu. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
 -

 Key: HDFS-6580
 URL: https://issues.apache.org/jira/browse/HDFS-6580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 2.5.0

 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt


 In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file 
 info for auditing purpose. getAuditFileInfo() returns null when auditing is 
 disabled, and calls dir.getFileInfo() when auditing is enabled.
 One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to 
 get file info for auditing. Should change to getAuditFileInfo().
 Note that another internal API, startFileInt() uses dir.getFileInfo() 
 correctly, because the returned file stat is returned out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040752#comment-14040752
]

Hudson commented on HDFS-6507:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1783 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1783/])
HDFS-6507. Improve DFSAdmin to support HA cluster better. (Contributd by
Zesheng Wu) (vinayakumarb:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604692)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted


[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040758#comment-14040758
 ] 

Hadoop QA commented on HDFS-6506:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651956/HDFS-6506.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.TestRefreshCallQueue

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7210//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7210//console

This message is automatically generated.

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange

[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-23 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040790#comment-14040790
 ] 

Daryn Sharp commented on HDFS-6475:
---

bq. Your earlier suggestion indicated that we should use 
SecretManager#retriableRetrievePassword instead of 
SecretManager#retrievePassword, does that mean client code has to be modified?

If I understand the question: The methods are only used server-side so no 
client-side changes should be required, so no incompatibility concerns.

Did you happen to trace how/where the {{StandbyException}} is wrapped in an 
{{InvalidToken}}? It looks like 
{{DelegationTokenSecretManager#retrievePassword}} is the only place it occurs, 
but {{DelegationTokenSecretManager#retriableRetrievePassword}} does not wrap 
exceptions in {{InvalidToken}}.

Is this maybe just a test case issue?  Which testcase is failing?

 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
 HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, 
 HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, 
 HDFS-6475.008.patch, HDFS-6475.009.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6526) Implement HDFS TtlManager


[ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040792#comment-14040792
 ] 

Hadoop QA commented on HDFS-6526:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651960/HDFS-6526.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1264 javac 
compiler warnings (more than the trunk's current 1259 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.ttlmanager.TestTtlPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7212//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7212//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7212//console

This message is automatically generated.

 Implement HDFS TtlManager
 -

 Key: HDFS-6526
 URL: https://issues.apache.org/jira/browse/HDFS-6526
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6526.1.patch


 This issue is used to track development of HDFS TtlManager, for details see 
 HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6582) Missing null check in RpcProgramNfs3#read(XDR, SecurityHandler)

2014-06-23 Thread Brandon Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6582:
-

Component/s: nfs

 Missing null check in RpcProgramNfs3#read(XDR, SecurityHandler)
 ---

 Key: HDFS-6582
 URL: https://issues.apache.org/jira/browse/HDFS-6582
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Ted Yu
Priority: Minor

 Around line 691:
 {code}
 FSDataInputStream fis = clientCache.getDfsInputStream(userName,
 Nfs3Utils.getFileIdPath(handle));
 try {
   readCount = fis.read(offset, readbuffer, 0, count);
 {code}
 fis may be null, leading to NullPointerException



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6525) FsShell supports HDFS TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040810#comment-14040810
 ] 

Hadoop QA commented on HDFS-6525:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651959/HDFS-6525.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ipc.TestIPC

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7211//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7211//console

This message is automatically generated.

 FsShell supports HDFS TTL
 -

 Key: HDFS-6525
 URL: https://issues.apache.org/jira/browse/HDFS-6525
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6525.1.patch


 This issue is used to track development of supporting  HDFS TTL for FsShell, 
 for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6585) INodesInPath.resolve is called multiple times in FSNamesystem.setPermission

2014-06-23 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040823#comment-14040823
 ] 

Daryn Sharp commented on HDFS-6585:
---

I've been working on the exact same change!  Just to a larger extent.  I'll 
take a look this afternoon.

 INodesInPath.resolve is called multiple times in FSNamesystem.setPermission
 ---

 Key: HDFS-6585
 URL: https://issues.apache.org/jira/browse/HDFS-6585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Attachments: patch_ab60af58e03b323dd4b18d32c4def1f008b98822.txt, 
 patch_f15b7d505f12213f1ee9fb5ddb4bdaa64f9f623d.txt


 Most of the APIs (both internal and external) in FSNamesystem calls 
 INodesInPath.resolve() to get the list of INodes corresponding to a file 
 path. Usually one API will call resolve() multiple times and that's a waste 
 of time.
 This issue particularly refers to FSNamesystem.setPermission, which calls 
 resolve() twice indirectly: one from checkOwner(), another from 
 dir.setPermission().
 Should save the result of resolve(), and use it whenever possible throughout 
 the lifetime of an API call, instead of making new resolve() calls.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6525) FsShell supports HDFS TTL

2014-06-23 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040828#comment-14040828
 ] 

Daryn Sharp commented on HDFS-6525:
---

The test cases should verify that the inherited path displayed is correct.  
Like most other tests, it should verify that relative and scheme-absolute paths 
are displayed correctly.  It might make sense to print path: ttl instead of 
the reverse, but it's up to you.  Minor suggestion is have the values for the 
units computed with math so a reviewer doesn't have to do the math to verify 
the numbers.

 FsShell supports HDFS TTL
 -

 Key: HDFS-6525
 URL: https://issues.apache.org/jira/browse/HDFS-6525
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6525.1.patch


 This issue is used to track development of supporting  HDFS TTL for FsShell, 
 for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6580) FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper


[ 
https://issues.apache.org/jira/browse/HDFS-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040835#comment-14040835
 ] 

Hudson commented on HDFS-6580:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1810 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1810/])
HDFS-6580. FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper. 
Contributed bu Zhilei Xu. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604704)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 FSNamesystem.mkdirsInt should call the getAuditFileInfo() wrapper
 -

 Key: HDFS-6580
 URL: https://issues.apache.org/jira/browse/HDFS-6580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 2.5.0

 Attachments: patch_c89bff2bb7a06bb2b0c66a85acbd5113db6b0526.txt


 In FSNamesystem.java, getAuditFileInfo() is the canonical way to get file 
 info for auditing purpose. getAuditFileInfo() returns null when auditing is 
 disabled, and calls dir.getFileInfo() when auditing is enabled.
 One internal APIs, mkdirsInt() mistakenly use the raw dir.getFileInfo() to 
 get file info for auditing. Should change to getAuditFileInfo().
 Note that another internal API, startFileInt() uses dir.getFileInfo() 
 correctly, because the returned file stat is returned out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040836#comment-14040836
]

Hudson commented on HDFS-6507:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1810 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1810/])
HDFS-6507. Improve DFSAdmin to support HA cluster better. (Contributd by
Zesheng Wu) (vinayakumarb:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604692)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040873#comment-14040873
]

Sanjay Radia commented on HDFS-6134:

Aaron said:
bq. distcp... I disagree - this is exactly what one wants ..
So you are saying that distcp should decrypt and re-encrypt data as it copies
it ... most backup tools do not this as they copy data - it is extra CPU
resources and further unneeded venerability. There are customer use cases where
distcp not over an encrypted channel; hence if one of the files being copied
is encrypted one may not want the file to be transparently sent decrypted.
Further, a sensitive file in a subtree may have been encrypted because the
subtree is readable by a larger group and hence the distcp user may not have
access to the keys.

bq. delegation tokens - KMS ... Owen and Tucu have already discussed this quite
a bit above
Turns out this issue come up in discussion with Owen, and he shares the concern
and suggested that I post the concern. Besides even if Alejandro and Owen are
in agreement, my question is relevant and has not been raised so far above:
Encryption is used to overcome limitations of authorization and authentication
in the system. It is relevant to ask if the use of delegation tokens to obtain
keys adds weakness.

bq. meeting ...
Aaron .. you are misunderstanding my point. I am not saying that the discussion
on this jira have not been open.
* See Alejandro's comments: Todd Lipcon and I had an offline discussion with
Andrew Purtell, Yi Liu and Avik Dey and After some offline discussions with
Yi, Tianyou, ATM, Todd, Andrew and Charles ...
** there have been such meetings and I have *no objections* to such private
meetings because I know that the bandwidth helps. I am merely asking for one
more meeting where I can quickly come up to speed on the context that
Alejandro, Todd, Yi, Tianyou, Andrew, Atm, share. It will help me and others
better understand the viewpoint that some of you share due to prevous high
bandwidth meetings.

** There is a precedent of HDFS meetings in spite of open jira discussion -
higher bandwidth to progress faster.
**Perhaps I should have worded the private meetings differently ... sorry if
it came across the wrong way.

Transparent data at rest encryption
---

Key: HDFS-6134
URL: https://issues.apache.org/jira/browse/HDFS-6134
Project: Hadoop HDFS
Issue Type: New Feature
Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf,
HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf

Because of privacy and security regulations, for many industries, sensitive
data at rest must be in encrypted form. For example: the healthcare industry
(HIPAA regulations), the card payment industry (PCI DSS regulations) or the
US government (FISMA regulations).
This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can
be used transparently by any application accessing HDFS via Hadoop Filesystem
Java API, Hadoop libhdfs C library, or WebHDFS REST API.
The resulting implementation should be able to be used in compliance with
different regulation requirements.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040966#comment-14040966
 ] 

Steve Loughran commented on HDFS-6134:
--

Maybe the issue with distcp is sometimes you want to get at the raw data 
-backups and copying being examples. This lets admin work on the data without 
needing access to the keys, just as today I can back up the underlying native 
OS disks without understanding HDFS (or any future encryption)

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the healthcare industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException


[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040984#comment-14040984
 ] 

Yongjun Zhang commented on HDFS-6475:
-

HI [~daryn],

Thanks a lot for the comments. Calling 
{{DelegationTokenSecretManager#retrievePassword}} is the sole place I have seen.

And, the following method in AbstractDelegationTokenSecretManager is where 
retrievePassword is called,
{code}
  public synchronized void verifyToken(TokenIdent identifier, byte[] password)
  throws InvalidToken {
byte[] storedPassword = retrievePassword(identifier);
if (!Arrays.equals(password, storedPassword)) {
  throw new InvalidToken(token ( + identifier
  + ) is invalid, password doesn't match);
}
  }
{code}

I wonder whether we can just replace the above retrievePassword call with 
retriableRetrievePassword here. I will give it a try. 

The failed tests are reported in HDFS-6589, related to HDFS-5322. Hi 
[~jingzhao], I put a question in HDFS-6589. I wonder if the failed tests are 
designed to cover real user scenarios?  Thanks for clarifying.

Best regards.


 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
 HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, 
 HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, 
 HDFS-6475.008.patch, HDFS-6475.009.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4629) Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA

2014-06-23 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040995#comment-14040995
 ] 

Steve Loughran commented on HDFS-4629:
--

# the declaration of the xerces lib version MUST go into 
{{hadoop-project/pom.xml}}; all JAR version logic goes in there to avoid 
inconsistencies
# is is going to add yet-another-dependency.
# we may need this import with java 9 anyway, as com.sun is potentially going 
to be inaccessible.


 Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is 
 JVM vendor specific. Breaks IBM JAVA
 -

 Key: HDFS-4629
 URL: https://issues.apache.org/jira/browse/HDFS-4629
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.3-alpha
 Environment: OS:fedora and RHEL (64 bit)
 Platform: x86, POWER, and SystemZ
 JVM Vendor = IBM
Reporter: Amir Sanjar
 Attachments: HDFS-4629-1.patch, HDFS-4629.patch


 Porting to a non-JVM vendor solution by replacing:
 import com.sun.org.apache.xml.internal.serialize.OutputFormat;
 import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
 with 
 import org.apache.xml.serialize.OutputFormat;
 import org.apache.xml.serialize.XMLSerializer;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041008#comment-14041008
]

Aaron T. Myers commented on HDFS-6134:
--

Sanjay, Steve - regarding distcp, Alejandro has already said the following,
which I think addresses what both of you are getting at. Note the second
paragraph:

{quote}
Vanilla distcp will just work with transparent encryption. Data will be
decrypted on read and encrypted on write, assuming both source and target are
in encrypted zones.

The proposal on changing distcp is to enable a second use used case, copy data
from one cluster to another without having to decrypt/encrypt the data while
doing the copy. This is useful when doing copies for disaster recovery, hdfs
admins could do the copy without having to have access to the encryption keys.
{quote}

Sanjay:

bq. Turns out this issue come up in discussion with Owen, and he shares the
concern and suggested that I post the concern. Besides even if Alejandro and
Owen are in agreement, my question is relevant and has not been raised so far
above: Encryption is used to overcome limitations of authorization and
authentication in the system. It is relevant to ask if the use of delegation
tokens to obtain keys adds weakness.

Transparent at-rest encryption is used to address other possible attack
vectors, for example an admin removing hard drives from the cluster and looking
at the data offline, or various attack vectors if network communication can be
intercepted.

I was under the impression that Owen's concern was mostly around performance,
i.e. that he didn't want all of the many tasks/containers in an MR/YARN job to
each request the same encryption key(s) from the KMS at startup. I think that's
quite reasonable, but it doesn't need to be an either/or thing - YARN jobs can
request the appropriate keys upfront to address performance concerns _and_ the
KMS can accept DTs for authentication to enable other use cases.

Regardless, I don't see how being able to request encryption keys via DTs adds
any weakness. The DTs can only be granted via Kerberos-authenticated channels,
and they expire, so they allow no more access than one can get via Kerberos.
Could you perhaps elaborate on the specific concern there?

bq. Aaron .. you are misunderstanding my point. I am not saying that the
discussion on this jira have not been open.snip

OK, good to hear. Sorry if I misinterpreted what you were saying.

bq. I am merely asking for one more meeting where I can quickly come up to
speed on the context that Alejandro, Todd, Yi, Tianyou, Andrew, Atm, share. It
will help me and others better understand the viewpoint that some of you share
due to prevous high bandwidth meetings.

I'm certainly open to another meeting in the abstract to bring folks up to
speed, but I'd still like to know what questions you have that haven't been
addressed so far on the JIRA. So far I think that most of the questions you've
been asking have already been discussed.

Transparent data at rest encryption
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed


[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041020#comment-14041020
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

Maybe I misunderstand this JIRA. If printing FNF exception during printing out 
ls information is normal behavior as what {{/bin/ls}} do, the current {{trunk}} 
works correctly and thus it does not need to be fixed. 

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC

2014-06-23 Thread Jinghui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6579:
---

Attachment: HDFS-6579.patch

 TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC 
 --

 Key: HDFS-6579
 URL: https://issues.apache.org/jira/browse/HDFS-6579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0
Reporter: Jinghui Wang
 Attachments: HDFS-6579.patch


 SocketOutputStream closes its writer if it's partial written. But on PPC, 
 after writing for some time, buf.capacity still equals buf.remaining. The 
 reason might be what's written on PPC is buffered,so the buf.remaining will 
 not change till a flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC

2014-06-23 Thread Jinghui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6579:
---

Attachment: (was: HDFS-6579.patch)

 TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC 
 --

 Key: HDFS-6579
 URL: https://issues.apache.org/jira/browse/HDFS-6579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0
Reporter: Jinghui Wang
 Attachments: HDFS-6579.patch


 SocketOutputStream closes its writer if it's partial written. But on PPC, 
 after writing for some time, buf.capacity still equals buf.remaining. The 
 reason might be what's written on PPC is buffered,so the buf.remaining will 
 not change till a flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC

2014-06-23 Thread Jinghui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041047#comment-14041047
 ] 

Jinghui Wang commented on HDFS-6579:


Thanks for the prompt review. Yes, this is for PPC64 Linux.

 I have modified the patch per you suggestion. However, rather than introduce a 
method that is as extentsive as the getOSType method, I simply added the 
detection for PPC64 since there is no need for detecting other architectures 
yet. Please let me know if a more extensive method is necessary.

 TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC 
 --

 Key: HDFS-6579
 URL: https://issues.apache.org/jira/browse/HDFS-6579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0
Reporter: Jinghui Wang
 Attachments: HDFS-6579.patch


 SocketOutputStream closes its writer if it's partial written. But on PPC, 
 after writing for some time, buf.capacity still equals buf.remaining. The 
 reason might be what's written on PPC is buffered,so the buf.remaining will 
 not change till a flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java

[
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041062#comment-14041062
]

Jing Zhao commented on HDFS-6588:
-

HDFS-5322 should not be related to the failed test. In general, HDFS-5322
simply handles the same issue you're fixing in HDFS-6475, but for the RPC side.
So please feel free to make any change you think necessary.

Investigating removing getTrueCause method in Server.java
-

Key: HDFS-6588
URL: https://issues.apache.org/jira/browse/HDFS-6588
Project: Hadoop HDFS
Issue Type: Bug
Components: security, webhdfs
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
{quote}
What I'm saying is I think the patch adds too much unnecessary code. Filing
an improvement to delete all but a few lines of the code changed in this
patch seems a bit odd. I think you just need to:
- Delete getTrueCause entirely instead of moving it elsewhere
- In saslProcess, just throw the exception instead of running it through
getTrueCause since it's not a InvalidToken wrapping another exception
anymore.
- Keep your 3-line change to unwrap SecurityException in toResponse
{quote}
There are multiple test failures, after making the suggested changes, Filing
this jira to dedicate to the investigation of removing getTrueCause method.
More detail will be put in the first comment.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041065#comment-14041065
 ] 

Jing Zhao commented on HDFS-6562:
-

The new patch looks pretty good to me. +1

[~szetszwo], do you also want to take a look at the patch?

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure


 [ 
https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6587:


Summary: Bug in TestBPOfferService can cause test failure  (was: Bug in 
TestBPOfferService blocks the trunk build)

 Bug in TestBPOfferService can cause test failure
 

 Key: HDFS-6587
 URL: https://issues.apache.org/jira/browse/HDFS-6587
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Attachments: patch_TestBPOfferService.txt


 need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the 
 trunk, e.g. in Build #1781.
 Details: in this test, the utility function waitForBlockReceived() has a bug:
 parameter mockNN is never used but hard-coded mockNN1 is used.
 This bug introduces undeterministic test failure when 
 testBasicFunctionality() calls
 ret = waitForBlockReceived(FAKE_BLOCK, mockNN2);
 and the call finishes before the actual interaction with mockNN2 happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure


 [ 
https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6587:


  Resolution: Fixed
   Fix Version/s: 2.5.0
  3.0.0
Target Version/s: 2.5.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I committed this to trunk and branch-2.

Thanks for the contribution [~timxzl].

 Bug in TestBPOfferService can cause test failure
 

 Key: HDFS-6587
 URL: https://issues.apache.org/jira/browse/HDFS-6587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 3.0.0, 2.5.0

 Attachments: patch_TestBPOfferService.txt


 need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the 
 trunk, e.g. in Build #1781.
 Details: in this test, the utility function waitForBlockReceived() has a bug:
 parameter mockNN is never used but hard-coded mockNN1 is used.
 This bug introduces undeterministic test failure when 
 testBasicFunctionality() calls
 ret = waitForBlockReceived(FAKE_BLOCK, mockNN2);
 and the call finishes before the actual interaction with mockNN2 happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure


 [ 
https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6587:


Component/s: test

 Bug in TestBPOfferService can cause test failure
 

 Key: HDFS-6587
 URL: https://issues.apache.org/jira/browse/HDFS-6587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 3.0.0, 2.5.0

 Attachments: patch_TestBPOfferService.txt


 need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the 
 trunk, e.g. in Build #1781.
 Details: in this test, the utility function waitForBlockReceived() has a bug:
 parameter mockNN is never used but hard-coded mockNN1 is used.
 This bug introduces undeterministic test failure when 
 testBasicFunctionality() calls
 ret = waitForBlockReceived(FAKE_BLOCK, mockNN2);
 and the call finishes before the actual interaction with mockNN2 happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure


 [ 
https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6587:


Affects Version/s: 2.4.0

 Bug in TestBPOfferService can cause test failure
 

 Key: HDFS-6587
 URL: https://issues.apache.org/jira/browse/HDFS-6587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Zhilei Xu
Assignee: Zhilei Xu
  Labels: patch
 Fix For: 3.0.0, 2.5.0

 Attachments: patch_TestBPOfferService.txt


 need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the 
 trunk, e.g. in Build #1781.
 Details: in this test, the utility function waitForBlockReceived() has a bug:
 parameter mockNN is never used but hard-coded mockNN1 is used.
 This bug introduces undeterministic test failure when 
 testBasicFunctionality() calls
 ret = waitForBlockReceived(FAKE_BLOCK, mockNN2);
 and the call finishes before the actual interaction with mockNN2 happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6587) Bug in TestBPOfferService can cause test failure


 [ 
https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6587:


Labels:   (was: patch)

 Bug in TestBPOfferService can cause test failure
 

 Key: HDFS-6587
 URL: https://issues.apache.org/jira/browse/HDFS-6587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Zhilei Xu
Assignee: Zhilei Xu
 Fix For: 3.0.0, 2.5.0

 Attachments: patch_TestBPOfferService.txt


 need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the 
 trunk, e.g. in Build #1781.
 Details: in this test, the utility function waitForBlockReceived() has a bug:
 parameter mockNN is never used but hard-coded mockNN1 is used.
 This bug introduces undeterministic test failure when 
 testBasicFunctionality() calls
 ret = waitForBlockReceived(FAKE_BLOCK, mockNN2);
 and the call finishes before the actual interaction with mockNN2 happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC


 [ 
https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6579:


Assignee: Jinghui Wang
Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

Thanks [~jwang302].

+1 pending Jenkins.

 TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC 
 --

 Key: HDFS-6579
 URL: https://issues.apache.org/jira/browse/HDFS-6579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0, 2.3.0, 2.2.0, 2.0.4-alpha, 2.1.0-beta
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Attachments: HDFS-6579.patch


 SocketOutputStream closes its writer if it's partial written. But on PPC, 
 after writing for some time, buf.capacity still equals buf.remaining. The 
 reason might be what's written on PPC is buffered,so the buf.remaining will 
 not change till a flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6587) Bug in TestBPOfferService can cause test failure


[ 
https://issues.apache.org/jira/browse/HDFS-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041092#comment-14041092
 ] 

Hudson commented on HDFS-6587:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5754 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5754/])
HDFS-6587. Bug in TestBPOfferService can cause test failure. (Contributed by 
Zhilei Xu) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604899)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java


 Bug in TestBPOfferService can cause test failure
 

 Key: HDFS-6587
 URL: https://issues.apache.org/jira/browse/HDFS-6587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Zhilei Xu
Assignee: Zhilei Xu
 Fix For: 3.0.0, 2.5.0

 Attachments: patch_TestBPOfferService.txt


 need to fix a bug in TestBPOfferService#waitForBlockReceived that fails the 
 trunk, e.g. in Build #1781.
 Details: in this test, the utility function waitForBlockReceived() has a bug:
 parameter mockNN is never used but hard-coded mockNN1 is used.
 This bug introduces undeterministic test failure when 
 testBasicFunctionality() calls
 ret = waitForBlockReceived(FAKE_BLOCK, mockNN2);
 and the call finishes before the actual interaction with mockNN2 happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java

[
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041112#comment-14041112
]

Yongjun Zhang commented on HDFS-6588:
-

Hi [~jingzhao], thanks a lot for the comments. Sorry I didn't make it clear.
What I wanted to say was that the getTrueCause method is part of the HDFS-5322
work, the reported tests failed here because of removing getTrueCause(). We
could modify the tests to make them pass, but my worry was that the tests were
set up to capture real user scenario, changing the test setup might make them
no longer reflect real user scenario.

Based on your answer above, however, I guess we could just modify the tests
accordingly after removing the getTrueCause() method. Please correct me if I'm
wrong. Thanks.

Investigating removing getTrueCause method in Server.java
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java


[ 
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041119#comment-14041119
 ] 

Jing Zhao commented on HDFS-6588:
-

Yeah, also the failed tests were not introduced by HDFS-5322 actually.

 Investigating removing getTrueCause method in Server.java
 -

 Key: HDFS-6588
 URL: https://issues.apache.org/jira/browse/HDFS-6588
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, webhdfs
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

 When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
 {quote}
 What I'm saying is I think the patch adds too much unnecessary code. Filing 
 an improvement to delete all but a few lines of the code changed in this 
 patch seems a bit odd. I think you just need to:
 - Delete getTrueCause entirely instead of moving it elsewhere
 - In saslProcess, just throw the exception instead of running it through 
 getTrueCause since it's not a InvalidToken wrapping another exception 
 anymore.
 - Keep your 3-line change to unwrap SecurityException in toResponse
 {quote}
 There are multiple test failures, after making the suggested changes, Filing 
 this jira to dedicate to the investigation of removing getTrueCause method.
 More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6579) TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC


[ 
https://issues.apache.org/jira/browse/HDFS-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041155#comment-14041155
 ] 

Hadoop QA commented on HDFS-6579:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652012/HDFS-6579.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7213//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7213//console

This message is automatically generated.

 TestSocketIOWithTimeout#testSocketIOWithTimeout fails on Power PC 
 --

 Key: HDFS-6579
 URL: https://issues.apache.org/jira/browse/HDFS-6579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.0-beta, 2.0.4-alpha, 2.2.0, 2.3.0, 2.4.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Attachments: HDFS-6579.patch


 SocketOutputStream closes its writer if it's partial written. But on PPC, 
 after writing for some time, buf.capacity still equals buf.remaining. The 
 reason might be what's written on PPC is buffered,so the buf.remaining will 
 not change till a flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-23 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041159#comment-14041159
 ] 

Colin Patrick McCabe commented on HDFS-5546:


I think what Daryn is advocating is that when attempting to recurse into a 
directory, we should catch IOE for the {{listStatus}} operation, not just FNF.

Although this makes sense to me, there is a bit of a fly in the ointment-- if 
we have a glob expression like {{/\*/\*}}, the Globber internally will throw an 
exception if there is a path error while resolving the globs.  For example, if 
you have {{/a/b/c}} and {{/a/r/c}}, and /a/r is inaccessible to you, {{ls 
/\*/\*/c}} will fail with an {{AccessControlException}} before displaying 
anything.

This behavior has existed basically forever in the globber code (it wasn't 
added by the globber rewrite) and unfortunately, there is no good way to fix it 
now.  The problem is that there is no way to indicate that we got an error 
other than throwing an exception, and an exception terminates the whole glob 
operation, even if there were other valid results.  So in the interest of 
consistency, perhaps we should keep things the way they are, and only catch 
FNF?  {{ls /a/b/c /a/r/c}} seems similar conceptually to {{ls /\*/\*/c}}... it 
is tricky to explain why an exception should terminate one but not the other...

Eddy, can you take a look at the internal JIRA that prompted this and see if it 
was user error?  I'm less and less convinced we should change {{ls -R}}...

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException


[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041171#comment-14041171
 ] 

Yongjun Zhang commented on HDFS-6475:
-

HI [~daryn],

Would like to check with you, you mentioned 
{quote}
If it turns out to be a lot more complicated, then perhaps a followup jira is ok
{quote}
Based on the information we have so far, the work involved is to remove 
getTrueCause, and replace the retrievePassword with retriableRetrievePassword, 
changing the interface spec of relevant methods because 
retriableRetrievePassword throws more exceptions,  removing getTrueCause 
method, fixing the test failures reported in HDFS-6588. 

I hope you'd agree that it's appropriate to dedicate HDFS-6588 for the above 
mentioned work, and use the lastest patch I posted for HDFS-6475 to handle the 
SecurityException that UserProvider throws.  Would you please comment again? 
Thanks.

BTW, thanks [~jingzhao] for clarifying things in HDFS-6588 (sorry I had a typo 
in last update as 6589). 


 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
 HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, 
 HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, 
 HDFS-6475.008.patch, HDFS-6475.009.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable

Jing Zhao created HDFS-6593:
---

 Summary: Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


 [ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6593:


Status: Patch Available  (was: Open)

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


 [ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6593:


Attachment: HDFS-6593.000.patch

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java


[ 
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041337#comment-14041337
 ] 

Yongjun Zhang commented on HDFS-6588:
-

Thanks Jing.

I found that the getTrueCause() method interacts with the following code 
{code}
  @Override
  public byte[] retrievePassword(
  DelegationTokenIdentifier identifier) throws InvalidToken {
try {
  // this check introduces inconsistency in the authentication to a
  // HA standby NN.  non-token auths are allowed into the namespace which
  // decides whether to throw a StandbyException.  tokens are a bit
  // different in that a standby may be behind and thus not yet know
  // of all tokens issued by the active NN.  the following check does
  // not allow ANY token auth, however it should allow known tokens in
  namesystem.checkOperation(OperationCategory.READ);
} catch (StandbyException se) {
  // FIXME: this is a hack to get around changing method signatures by
  // tunneling a non-InvalidToken exception as the cause which the
  // RPC server will unwrap before returning to the client
  InvalidToken wrappedStandby = new InvalidToken(StandbyException);
  wrappedStandby.initCause(se);
  throw wrappedStandby;
}
{code}
in DelegationTokenSecretManager.java introduced by HADOOP-9880.

If we remove the getTrueCause() logic, at minimum, still need to retain the 
logic (currently in getTrueCause) to  return the InvalidToken exception that's 
wrapped by SaslException.




 Investigating removing getTrueCause method in Server.java
 -

 Key: HDFS-6588
 URL: https://issues.apache.org/jira/browse/HDFS-6588
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, webhdfs
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

 When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
 {quote}
 What I'm saying is I think the patch adds too much unnecessary code. Filing 
 an improvement to delete all but a few lines of the code changed in this 
 patch seems a bit odd. I think you just need to:
 - Delete getTrueCause entirely instead of moving it elsewhere
 - In saslProcess, just throw the exception instead of running it through 
 getTrueCause since it's not a InvalidToken wrapping another exception 
 anymore.
 - Keep your 3-line change to unwrap SecurityException in toResponse
 {quote}
 There are multiple test failures, after making the suggested changes, Filing 
 this jira to dedicate to the investigation of removing getTrueCause method.
 More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6578) add toString method to DatanodeStorage etc for easier debugging


[ 
https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041338#comment-14041338
 ] 

Arpit Agarwal commented on HDFS-6578:
-

Your original understanding was correct. i.e. 1-3 are valid. I don't want to 
spend more time on the exact wording of one comment and your comment is clearer 
than no comment at all. I will commit your v2 patch.

+1

 add toString method to DatanodeStorage etc for easier debugging
 ---

 Key: HDFS-6578
 URL: https://issues.apache.org/jira/browse/HDFS-6578
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch


 It seems to be nice to add a toString() method for DatanodeStorage class, so 
 we can print out its key info easier while doing debuging.
 Another thing is, in the end of BlockManager#processReport, there is the 
 following message,
 {code}
blockLog.info(BLOCK* processReport: from storage  + 
 storage.getStorageID()
 +  node  + nodeID + , blocks:  + newReport.getNumberOfBlocks()
 + , processing time:  + (endTime - startTime) +  msecs);
 return !node.hasStaleStorages();
 {code}
 We could add node.hasStaleStorages() to the log, and possibly replace 
 storage.getSorateID() with the suggested storage.toString().
 Any comments? thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


[ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041339#comment-14041339
 ] 

Haohui Mai commented on HDFS-6593:
--

The patch looks good to me.

Is it possible to refactor the code of {{FSNameSystem.getSnapshotDiffReport}} 
in this patch as well, so that {{SnapshotDiffInfo}} can be declared as a 
package-local class?

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6578) add toString method to DatanodeStorage etc for easier debugging


 [ 
https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6578:


Issue Type: Improvement  (was: Bug)

 add toString method to DatanodeStorage etc for easier debugging
 ---

 Key: HDFS-6578
 URL: https://issues.apache.org/jira/browse/HDFS-6578
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch


 It seems to be nice to add a toString() method for DatanodeStorage class, so 
 we can print out its key info easier while doing debuging.
 Another thing is, in the end of BlockManager#processReport, there is the 
 following message,
 {code}
blockLog.info(BLOCK* processReport: from storage  + 
 storage.getStorageID()
 +  node  + nodeID + , blocks:  + newReport.getNumberOfBlocks()
 + , processing time:  + (endTime - startTime) +  msecs);
 return !node.hasStaleStorages();
 {code}
 We could add node.hasStaleStorages() to the log, and possibly replace 
 storage.getSorateID() with the suggested storage.toString().
 Any comments? thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6578) add toString method to DatanodeStorage for easier debugging


 [ 
https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6578:


Summary: add toString method to DatanodeStorage for easier debugging  (was: 
add toString method to DatanodeStorage etc for easier debugging)

 add toString method to DatanodeStorage for easier debugging
 ---

 Key: HDFS-6578
 URL: https://issues.apache.org/jira/browse/HDFS-6578
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch


 It seems to be nice to add a toString() method for DatanodeStorage class, so 
 we can print out its key info easier while doing debuging.
 Another thing is, in the end of BlockManager#processReport, there is the 
 following message,
 {code}
blockLog.info(BLOCK* processReport: from storage  + 
 storage.getStorageID()
 +  node  + nodeID + , blocks:  + newReport.getNumberOfBlocks()
 + , processing time:  + (endTime - startTime) +  msecs);
 return !node.hasStaleStorages();
 {code}
 We could add node.hasStaleStorages() to the log, and possibly replace 
 storage.getSorateID() with the suggested storage.toString().
 Any comments? thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6578) add toString method to DatanodeStorage for easier debugging


 [ 
https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6578:


  Resolution: Fixed
   Fix Version/s: 2.5.0
  3.0.0
Target Version/s: 2.5.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I committed this to trunk and branch-2.

Thanks for the improvement [~yzhangal]!

 add toString method to DatanodeStorage for easier debugging
 ---

 Key: HDFS-6578
 URL: https://issues.apache.org/jira/browse/HDFS-6578
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch


 It seems to be nice to add a toString() method for DatanodeStorage class, so 
 we can print out its key info easier while doing debuging.
 Another thing is, in the end of BlockManager#processReport, there is the 
 following message,
 {code}
blockLog.info(BLOCK* processReport: from storage  + 
 storage.getStorageID()
 +  node  + nodeID + , blocks:  + newReport.getNumberOfBlocks()
 + , processing time:  + (endTime - startTime) +  msecs);
 return !node.hasStaleStorages();
 {code}
 We could add node.hasStaleStorages() to the log, and possibly replace 
 storage.getSorateID() with the suggested storage.toString().
 Any comments? thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed


 [ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-5546:


Attachment: HDFS-5546.2.004.patch

This patch captures {{IOException}} instead of {{FNF}} based on the first 
patch's logic, as [~daryn] suggested. 


 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed


[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041352#comment-14041352
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

[~daryn] was right on this one, we should just replace FNF to IOException in 
the first patch. Two test cases to verify the expected behaviors are added 
though.

[~cmccabe] shouldn't the {{globStatus()}} be out of scope for this JIRA? Maybe 
we should open another related JIRA?



 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


 [ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6593:


Attachment: HDFS-6593.001.patch

Thanks for the review, Haohui! Update the patch to address your comments.

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6578) add toString method to DatanodeStorage for easier debugging


[ 
https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041354#comment-14041354
 ] 

Hudson commented on HDFS-6578:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5755 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5755/])
HDFS-6578. add toString method to DatanodeStorage for easier debugging. 
(Contributed by Yongjun Zhang) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604942)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java


 add toString method to DatanodeStorage for easier debugging
 ---

 Key: HDFS-6578
 URL: https://issues.apache.org/jira/browse/HDFS-6578
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch


 It seems to be nice to add a toString() method for DatanodeStorage class, so 
 we can print out its key info easier while doing debuging.
 Another thing is, in the end of BlockManager#processReport, there is the 
 following message,
 {code}
blockLog.info(BLOCK* processReport: from storage  + 
 storage.getStorageID()
 +  node  + nodeID + , blocks:  + newReport.getNumberOfBlocks()
 + , processing time:  + (endTime - startTime) +  msecs);
 return !node.hasStaleStorages();
 {code}
 We could add node.hasStaleStorages() to the log, and possibly replace 
 storage.getSorateID() with the suggested storage.toString().
 Any comments? thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6594) Use inodes to determine membership in an encryption zone

2014-06-23 Thread Charles Lamb (JIRA)

Charles Lamb created HDFS-6594:
--

 Summary: Use inodes to determine membership in an encryption zone
 Key: HDFS-6594
 URL: https://issues.apache.org/jira/browse/HDFS-6594
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb


We should use inodes to determine if a path is in an ez, rather than string 
parsing.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6565) Use jackson instead jetty json in hdfs-client

2014-06-23 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6565:


Attachment: HDFS-6565.patch

 Use jackson instead jetty json in hdfs-client
 -

 Key: HDFS-6565
 URL: https://issues.apache.org/jira/browse/HDFS-6565
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Akira AJISAKA
 Attachments: HDFS-6565.patch


 hdfs-client should use Jackson instead of jetty to parse JSON.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6578) add toString method to DatanodeStorage for easier debugging


[ 
https://issues.apache.org/jira/browse/HDFS-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041395#comment-14041395
 ] 

Yongjun Zhang commented on HDFS-6578:
-

Thanks a lot [~arpitagarwal]!


 add toString method to DatanodeStorage for easier debugging
 ---

 Key: HDFS-6578
 URL: https://issues.apache.org/jira/browse/HDFS-6578
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6578.001.patch, HDFS-6578.002.patch


 It seems to be nice to add a toString() method for DatanodeStorage class, so 
 we can print out its key info easier while doing debuging.
 Another thing is, in the end of BlockManager#processReport, there is the 
 following message,
 {code}
blockLog.info(BLOCK* processReport: from storage  + 
 storage.getStorageID()
 +  node  + nodeID + , blocks:  + newReport.getNumberOfBlocks()
 + , processing time:  + (endTime - startTime) +  msecs);
 return !node.hasStaleStorages();
 {code}
 We could add node.hasStaleStorages() to the log, and possibly replace 
 storage.getSorateID() with the suggested storage.toString().
 Any comments? thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6565) Use jackson instead jetty json in hdfs-client

2014-06-23 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041409#comment-14041409
 ] 

Akira AJISAKA commented on HDFS-6565:
-

Attaching a patch to remove jetty json library from JsonUtil and 
WebHdfsFileSystem.
The way Jackson parse JSON number is different from jetty json:
* Jackson: number - Integer, Long, or BigInteger (smallest applicable)
* jetty json: number - Long

so I changed the code for parsing JSON number
{code}
  (Long) m.get(blockId) // doesn't work if m.get(blockId) is Integer
{code}
to
{code}
 ((Number) m.get(blockId)).longValue() // support all classes extends Number
{code}
In addition, the way Jackson parse JSON array is different from jetty json:
* Jackson: array - ArrayListObject
* jetty json: array - Object[]

so I changed the code for parsing JSON array
{code}
  (Object[]) m.get(locatedBlocks)
{code}
to
{code}
  (ListObject) m.get(locatedBlocks)
{code}


 Use jackson instead jetty json in hdfs-client
 -

 Key: HDFS-6565
 URL: https://issues.apache.org/jira/browse/HDFS-6565
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Akira AJISAKA
 Attachments: HDFS-6565.patch


 hdfs-client should use Jackson instead of jetty to parse JSON.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6565) Use jackson instead jetty json in hdfs-client

2014-06-23 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6565:


Status: Patch Available  (was: Open)

 Use jackson instead jetty json in hdfs-client
 -

 Key: HDFS-6565
 URL: https://issues.apache.org/jira/browse/HDFS-6565
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Akira AJISAKA
 Attachments: HDFS-6565.patch


 hdfs-client should use Jackson instead of jetty to parse JSON.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041422#comment-14041422
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6562:
---

Patch looks good.  Some comments:
- Before the patch, the first if-statement below thows to an exception.  But it 
will return false after the patch.
{code}
-if (srcInode.isSymlink()  
-dst.equals(srcInode.asSymlink().getSymlinkString())) {
-  throw new FileAlreadyExistsException(
-  Cannot rename symlink +src+ to its target +dst);
-}
-
-// dst cannot be directory or a file under src
-if (dst.startsWith(src)  
-dst.charAt(src.length()) == Path.SEPARATOR_CHAR) {
-  NameNode.stateChangeLog.warn(DIR* FSDirectory.unprotectedRenameTo: 
-  + failed to rename  + src +  to  + dst
-  +  because destination starts with src);
+
+try {
+  validateRenameDestination(src, dst, srcInode);
+} catch (IOException ignored) {
   return false;
 }
{code}

- prepare() should be combined with the RenameOperation constructor.  Then all 
the fields except srcChild can be changed to final.

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed


[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041420#comment-14041420
 ] 

Hadoop QA commented on HDFS-5546:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ha.TestZKFailoverController
  org.apache.hadoop.ha.TestZKFailoverControllerStress

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7216//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7216//console

This message is automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041429#comment-14041429
 ] 

Sanjay Radia commented on HDFS-6134:


I believe the transparent encryption will break the HAR file system.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the healthcare industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory


 [ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6562:
-

Attachment: HDFS-6562.005.patch

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, 
 HDFS-6562.005.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041440#comment-14041440
 ] 

Haohui Mai commented on HDFS-6562:
--

Uploaded v6 patch to address [~szetszwo]'s comments.

The changes about the symlinks are intentional. Please correct me if I'm wrong, 
but it looks to me that the old rename prefers to returning {{false}} instead 
of throwing exceptions. We can change this behavior without introducing 
backward compatibility issues since symlink is only available in trunk.

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, 
 HDFS-6562.005.patch, HDFS-6562.006.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory


 [ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6562:
-

Attachment: HDFS-6562.006.patch

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, 
 HDFS-6562.005.patch, HDFS-6562.006.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041449#comment-14041449
]

Sanjay Radia commented on HDFS-6134:

bq. Vanilla distcp will just work with transparent encryption. Data will be
decrypted on read and encrypted on write, assuming both source and target are
in encrypted zones. ...The proposal on changing distcp is to enable a second
use used case.
Alejandro, Aaron the general practice is not to give the admins running
distcp access to keys. Hence, as you suggest, we could change distcp so that
it does not use transparent decryption by default; however, there may be other
such backup tools and applications that customers and other vendors may have
written and we would be breaking them. This may also break the HAR filesystem.

Aaron, you took on a very strong position that transparent
decryption/reencryption is is exactly what one wants. I am missing this -
what are the use cases for distcp where one wants transparent
decryption/reencryption?

Transparent data at rest encryption
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-23 Thread Siqi Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041456#comment-14041456
 ] 

Siqi Li commented on HDFS-6527:
---

When running unit tests in this patch v5, I get the following errors

2014-06-23 13:36:09,516 ERROR hdfs.DFSClient 
(DFSClient.java:closeAllFilesBeingWritten(873)) - Failed to close file 
/testDeleteAddBlockRace
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on /testDeleteAddBlockRace: File does not exist. Holder 
DFSClient_NONMAPREDUCE_1652233532_1 does not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2941)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2762)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:585)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1443)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1265)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:529)
 

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.4.1

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6584) Support archival storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo Nicholas Sze updated HDFS-6584:
--

Attachment: HDFSArchivalStorageDesign20140623.pdf

HDFSArchivalStorageDesign20140623.pdf: design doc.

Support archival storage

Key: HDFS-6584
URL: https://issues.apache.org/jira/browse/HDFS-6584
Project: Hadoop HDFS
Issue Type: New Feature
Components: datanode, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Attachments: HDFSArchivalStorageDesign20140623.pdf

In most of the Hadoop clusters, as more and more data is stored for longer
time, the demand for storage is outstripping the compute. Hadoop needs a cost
effective and easy to manage solution to meet this demand for storage.
Current solution is:
- Delete the old unused data. This comes at operational cost of identifying
unnecessary data and deleting them manually.
- Add more nodes to the clusters. This adds along with storage capacity
unnecessary compute capacity to the cluster.
Hadoop needs a solution to decouple growing storage capacity from compute
capacity. Nodes with higher density and less expensive storage with low
compute power are becoming available and can be used as cold storage in the
clusters. Based on policy the data from hot storage can be moved to cold
storage. Adding more nodes to the cold storage can grow the storage
independent of the compute capacity in the cluster.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6560) Byte array native checksumming on DN side

2014-06-23 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041477#comment-14041477
 ] 

Colin Patrick McCabe commented on HDFS-6560:


{code}
+  public static void verifyChunkedSumsByteArray(int bytesPerSum,
+  int checksumType, byte[] sums, int sumsOffset, byte[] data,
+  int dataOffset, int dataLength, String fileName, long basePos)
+  throws ChecksumException {
+nativeVerifyChunkedSumsByteArray(bytesPerSum, checksumType,
+sums, sumsOffset,
+data, dataOffset, dataLength,
+fileName, basePos);
+  }
{code}
What's the purpose of this wrapper function?  It just passes all its arguments 
directly to the other function. Public functions can have the native annotation 
too.

{code}
+  sums_addr = (*env)-GetPrimitiveArrayCritical(env, j_sums, NULL);
+  data_addr = (*env)-GetPrimitiveArrayCritical(env, j_data, NULL);
+
+  if (unlikely(!sums_addr || !data_addr)) {
+THROW(env, java/lang/OutOfMemoryError,
+  not enough memory for byte arrays in JNI code);
+return;
+  }
{code}

This is going to leak memory if {{GetPrimitiveArrayCritical}} succeeds for 
{{sums_addr}} but not for {{data_addr}}.

 Byte array native checksumming on DN side
 -

 Key: HDFS-6560
 URL: https://issues.apache.org/jira/browse/HDFS-6560
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-3528.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5369) Support negative caching of user-group mapping


[ 
https://issues.apache.org/jira/browse/HDFS-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041484#comment-14041484
 ] 

Lei (Eddy) Xu commented on HDFS-5369:
-

[~andrew.wang] What should be an expected behavior for the negative caching 
here? I am currently thinking of a solution that if {{getGroups()}} returns 
empty list, we assign a much shorter expiration period for the cached item 
(e.g., 30 seconds instead of 4 hours), so that a transient failure might be 
handled. Just wondering whether it is realistic in production?



 Support negative caching of user-group mapping
 --

 Key: HDFS-5369
 URL: https://issues.apache.org/jira/browse/HDFS-5369
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Andrew Wang

 We've seen a situation at a couple of our customers where interactions from 
 an unknown user leads to a high-rate of group mapping calls. In one case, 
 this was happening at a rate of 450 calls per second with the shell-based 
 group mapping, enough to severely impact overall namenode performance and 
 also leading to large amounts of log spam (prints a stack trace each time).
 Let's consider negative caching of group mapping, as well as quashing the 
 rate of this log message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6570) add api that enables checking if a user has certain permissions on a file

2014-06-23 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041492#comment-14041492
]

Colin Patrick McCabe commented on HDFS-6570:

bq. Note that the man page for access clearly spells out the risk of
time-of-check/time-of-use race conditions. This API is only going to be useful
for systems implementing their own authorization enforcement on top of HDFS
files, and only if those systems consider the risk acceptable.

Let's make sure that we spell out the risks in the API. In fact, I wonder if
we should we make this {{\@LimitedPrivate}} between Hive and HDFS. The man
page for the {{access}} system call is pretty blunt on my machine: the use of
this system call should be avoided.

add api that enables checking if a user has certain permissions on a file
-

Key: HDFS-6570
URL: https://issues.apache.org/jira/browse/HDFS-6570
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Chris Nauroth

For some of the authorization modes in Hive, the servers in Hive check if a
given user has permissions on a certain file or directory. For example, the
storage based authorization mode allows hive table metadata to be modified
only when the user has access to the corresponding table directory on hdfs.
There are likely to be such use cases outside of Hive as well.
HDFS does not provide an api for such checks. As a result, the logic to check
if a user has permissions on a directory gets replicated in Hive. This
results in duplicate logic and there introduces possibilities for
inconsistencies in the interpretation of the permission model. This becomes a
bigger problem with the complexity of ACL logic.
HDFS should provide an api that provides functionality that is similar to
access function in unistd.h - http://linux.die.net/man/2/access .

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


[ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041495#comment-14041495
 ] 

Hadoop QA commented on HDFS-6593:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652059/HDFS-6593.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7214//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7214//console

This message is automatically generated.

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5369) Support negative caching of user-group mapping

2014-06-23 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041497#comment-14041497
 ] 

Andrew Wang commented on HDFS-5369:
---

Hey [~eddyxu], 30s sounds okay to me, maybe even a bit longer than that (i.e. 1 
or 2 min). [~kihwal] might be able to make a quick comment about this, since he 
mentioned tight job SLAs in HADOOP-8088. HADOOP-8088 also mentions handling 
error codes indicative of a transient error differently, so let's keep that in 
mind here too.

Would also still be good to squish the stack trace if possible too, since it's 
not very useful.

 Support negative caching of user-group mapping
 --

 Key: HDFS-5369
 URL: https://issues.apache.org/jira/browse/HDFS-5369
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Andrew Wang

 We've seen a situation at a couple of our customers where interactions from 
 an unknown user leads to a high-rate of group mapping calls. In one case, 
 this was happening at a rate of 450 calls per second with the shell-based 
 group mapping, enough to severely impact overall namenode performance and 
 also leading to large amounts of log spam (prints a stack trace each time).
 Let's consider negative caching of group mapping, as well as quashing the 
 rate of this log message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


[ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041501#comment-14041501
 ] 

Haohui Mai commented on HDFS-6593:
--

+1 on the latest patch, pending jenkins.

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6562) Refactor rename() in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041507#comment-14041507
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6562:
---

Thanks for the explanation.  Returning false sounds good.

For the new patch,
- There are two srcChild = srcIIP.getLastINode() in the RenameOperation 
constructor.
- The field srcRefDstSnapshot can be changed to final.

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, 
 HDFS-6562.005.patch, HDFS-6562.006.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory


 [ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6562:
-

Attachment: HDFS-6562.007.patch

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, 
 HDFS-6562.005.patch, HDFS-6562.006.patch, HDFS-6562.007.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6562) Refactor rename() in FSDirectory


 [ 
https://issues.apache.org/jira/browse/HDFS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6562:
--

Priority: Minor  (was: Major)
Hadoop Flags: Reviewed

+1 the new patch looks good.

 Refactor rename() in FSDirectory
 

 Key: HDFS-6562
 URL: https://issues.apache.org/jira/browse/HDFS-6562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-6562.000.patch, HDFS-6562.001.patch, 
 HDFS-6562.002.patch, HDFS-6562.003.patch, HDFS-6562.004.patch, 
 HDFS-6562.005.patch, HDFS-6562.006.patch, HDFS-6562.007.patch


 Currently there are two variants of {{rename()}} sitting in {{FSDirectory}}. 
 Both implementation shares quite a bit of common code.
 This jira proposes to clean up these two variants and extract the common code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side

2014-06-23 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041522#comment-14041522
 ] 

Andrew Wang commented on HDFS-6561:
---

+1 sounds good to me, pretty sure that flushes are normally bigger than 100B. 
Not really a usecase we're optimized for anyway.

 Byte array native checksumming on client side
 -

 Key: HDFS-6561
 URL: https://issues.apache.org/jira/browse/HDFS-6561
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041524#comment-14041524
 ] 

Alejandro Abdelnur commented on HDFS-6134:
--

[~sanjay.radia], 

Can you be a bit more specific on HAR breaking?

Regarding distcp, you want to support both modes: raw copies, without d/e for 
admins running distcp. Regular copies, with e/d to copy data in/out or an 
encryption zone, or to another encryption zone; and this within or across 
clusters.


 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the healthcare industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


[ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041529#comment-14041529
 ] 

Hadoop QA commented on HDFS-6593:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652071/HDFS-6593.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/7215//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDFSClientRetries

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7215//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7215//console

This message is automatically generated.

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-3528) Use native CRC32 in DFS write path


[ 
https://issues.apache.org/jira/browse/HDFS-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041545#comment-14041545
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3528:
---

Is the native library faster than the Java implementation only for CRC32C but 
not CRC32?

 Use native CRC32 in DFS write path
 --

 Key: HDFS-3528
 URL: https://issues.apache.org/jira/browse/HDFS-3528
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client, performance
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: James Thomas

 HDFS-2080 improved the CPU efficiency of the read path by using native 
 SSE-enabled code for CRC verification. Benchmarks of the write path show that 
 it's often CPU bound by checksums as well, so we should make the same 
 improvement there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6593) Move SnapshotDiffInfo out of INodeDirectorySnapshottable


 [ 
https://issues.apache.org/jira/browse/HDFS-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6593:


Attachment: HDFS-6593.002.patch

Fix the failed unit test and javadoc.

 Move SnapshotDiffInfo out of INodeDirectorySnapshottable
 

 Key: HDFS-6593
 URL: https://issues.apache.org/jira/browse/HDFS-6593
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6593.000.patch, HDFS-6593.001.patch, 
 HDFS-6593.002.patch


 Per discussion in HDFS-4667, we can move SnapshotDiffInfo out of 
 INodeDirectorySnapshottable as an individual class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6592) Use Fluent to collect data to append to HDFS. Throw the AlreadyBeingCreatedException exception

2014-06-23 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated HDFS-6592:
---

Description: 
We use Fluent to collect log data. The log data append to the files in HDFS. 

The cluster configuration:

Namenode : namenode1(hostname)

secondnamenode: namenode2
3 datanodes: datanode1, datanode2, datanode3
3 replications

Every few days,  suffere from the following exception:

Exception in nameNode1:

2014-06-22 09:54:41,892 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.append: Failed to create file [file_nameX] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because 
this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
create file [file_nameX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] 
on client [dataNode1], because this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]

Exception in DataNode1:

2014-06-22 09:54:45,771 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
Unable to close file because the last block does not have enough number of 
replicas.
2014-06-22 09:54:45,813 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file [file_nameX] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because 
this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2441)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
 
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
 
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

According to the log, we infer the flow of the exception:

1. Namenode update pipeline with just one datanode

namenode1 log: 2014-06-22 09:54:16,604 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline  
(block=BP-1611177164-datanode1-1399894698024:blk_1074496235_1935947, 
newGenerationStamp=1935951, newLength=98839816, newNodes=[datanode1:50010], 
clientName=DFSClient_NONMAPREDUCE_349196146_2027206)

2. datanode1 throw exception during close.

datanode1 log: 2014-06-22 09:54:26,569 INFO 
org.apache.hadoop.hdfs.DFSClient: Could not complete file_name retrying...

3. The subsequent collected data from  Fluent will triger another DFSClient to 
append to the same file.

namenode1 log: 2014-06-22 09:54:41,892 WARN 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: 
Failed to create file [file_name] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client 
[datanode1], because this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [datanode1]

4. The subsequent DFSClient will triger to recover the Lease every 
LEASE_SOFTLIMIT_PERIOD

namenode1 log: 2014-06-22 09:58:34,722 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover [Lease. 
 Holder: 
DFSClient_NONMAPREDUCE_349196146_2027206, pendingcreates: 1], 
src=file_name client DFSClient_NONMAPREDUCE_349196146_2027206

5. Fail to recover the lease.

namenode1 log:

2014-06-22 09:58:34,722

[jira] [Commented] (HDFS-6430) HTTPFS - Implement XAttr support

2014-06-23 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041561#comment-14041561
 ] 

Alejandro Abdelnur commented on HDFS-6430:
--

[~hitliuyi], my question was if we are testing the behavior of the xattr 
methods when they are switched off.

Other than that LGTM.


 HTTPFS - Implement XAttr support
 

 Key: HDFS-6430
 URL: https://issues.apache.org/jira/browse/HDFS-6430
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: HDFS-6430.1.patch, HDFS-6430.2.patch, HDFS-6430.3.patch, 
 HDFS-6430.4.patch, HDFS-6430.patch


 Add xattr support to HttpFS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6595) Configure the maximum threads allowed for balancing on datanodes

2014-06-23 Thread Benoy Antony (JIRA)

Benoy Antony created HDFS-6595:
--

 Summary: Configure the maximum threads allowed for balancing on 
datanodes
 Key: HDFS-6595
 URL: https://issues.apache.org/jira/browse/HDFS-6595
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Benoy Antony
Assignee: Benoy Antony


Currently datanode allows a max of 5 threads to be used for balancing.
In some cases, , it may make sense to use a different number of threads to the 
purpose of moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6595) Configure the maximum threads allowed for balancing on datanodes

2014-06-23 Thread Benoy Antony (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6595:
---

Attachment: HDFS-6595.patch

Attaching the patch which adds  new configuration - 
_dfs.datanode.balance.max.concurrent.moves_ . The number of threads is set 
based on this parameter 

 Configure the maximum threads allowed for balancing on datanodes
 

 Key: HDFS-6595
 URL: https://issues.apache.org/jira/browse/HDFS-6595
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6595.patch


 Currently datanode allows a max of 5 threads to be used for balancing.
 In some cases, , it may make sense to use a different number of threads to 
 the purpose of moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6595) Configure the maximum threads allowed for balancing on datanodes

2014-06-23 Thread Benoy Antony (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6595:
---

Status: Patch Available  (was: Open)

 Configure the maximum threads allowed for balancing on datanodes
 

 Key: HDFS-6595
 URL: https://issues.apache.org/jira/browse/HDFS-6595
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6595.patch


 Currently datanode allows a max of 5 threads to be used for balancing.
 In some cases, , it may make sense to use a different number of threads to 
 the purpose of moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6525) FsShell supports HDFS TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041572#comment-14041572
 ] 

Zesheng Wu commented on HDFS-6525:
--

Thanks [~daryn], I will update the patch to address your comments immediately.

 FsShell supports HDFS TTL
 -

 Key: HDFS-6525
 URL: https://issues.apache.org/jira/browse/HDFS-6525
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6525.1.patch


 This issue is used to track development of supporting  HDFS TTL for FsShell, 
 for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6565) Use jackson instead jetty json in hdfs-client