[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033472#comment-14033472 ] Vinayakumar B commented on HDFS-6507: - Thanks [~wuzesheng] for working on this. Here are some of the comments on your latest patch. 1. {code}+ inSafeMode = nn.setSafeMode(SafeModeAction.SAFEMODE_GET, false);{code} {{inSafeMode}} value inside {{waitExitSafeMode(..)}} will not be reflected in the below print statement. {code}+System.out.println(Safe mode is + (inSafeMode ? ON : OFF));{code} It will always print the initial state. I feel {{waitExitSafeMode(..)}} can return the latest value and same can be assigned to {{inSafeMode}} in {{setSafeMode(..)}} method. ex: {code}+boolean inSafeMode = haNn.setSafeMode(action, false); +if (waitExitSafe) { + inSafeMode = waitExitSafeMode(haNn, inSafeMode); } -inSafeMode = dfs.setSafeMode(SafeModeAction.SAFEMODE_GET); +System.out.println(Safe mode is + (inSafeMode ? ON : OFF));{code} 2. In case of HA, it will be better if we can print the NameNode address while printing the status of safemode. ex: {code} +System.out.println(Safe mode is + (inSafeMode ? ON : OFF) + in namenode address);}{code} 3. {code}+System.out.println(Save namespace successfully);{code} Message could be Saved namespace successfully in namenode address 4. again, same as #3, all messages for HA can include namenode address, 5. In {{metaSafe(..)}} following dfs.getUri() should be replaced with actual namenode address in case of HA {code}+ for (ClientProtocol haNn : namenodes) { +haNn.metaSave(pathname); +System.out.println(Created metasave file + pathname + in the log + +directory of namenode + dfs.getUri()); + }{code} 6. Message could be changed as below {code}+System.out.println(Refresh service acl successful for namenode address);{code} 7. Message could be changed as below {code}+System.out.println(Refresh user to groups mapping successful for namenode address);{code} Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033480#comment-14033480 ] Zesheng Wu commented on HDFS-6507: -- Thanks [~vinayrpet] for reviewing the patch, all comments are reasonable to me, I will generate a new patch soon to address your comments. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN
Amir Langer created HDFS-6552: - Summary: add DN storage to a BlockInfo will not replace the different storage from same DN Key: HDFS-6552 URL: https://issues.apache.org/jira/browse/HDFS-6552 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0, 2.3.0 Reporter: Amir Langer Priority: Trivial Fix For: Heterogeneous Storage (HDFS-2832) In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: removeStorage(getStorageInfo(idx)); method code: 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN
[ https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Langer updated HDFS-6552: -- Description: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: removeStorage(getStorageInfo(idx)); method code: 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} was: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: removeStorage(getStorageInfo(idx)); method code: 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} add DN storage to a BlockInfo will not replace the different storage from same DN - Key: HDFS-6552 URL: https://issues.apache.org/jira/browse/HDFS-6552 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0, 2.4.0 Reporter: Amir Langer Priority: Trivial Fix For: Heterogeneous Storage (HDFS-2832) In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: removeStorage(getStorageInfo(idx)); method code: 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN
[ https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Langer updated HDFS-6552: -- Description: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {{removeStorage(getStorageInfo(idx));}} method code: {{ 1 boolean addStorage(DatanodeStorageInfo storage) \{ 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) \{ 5 if (getStorageInfo(idx) == storage) \{ // the storage is already there 6return false; 7 \} else \{ 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 \} 13 \} 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20\} }} was: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: removeStorage(getStorageInfo(idx)); method code: 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} add DN storage to a BlockInfo will not replace the different storage from same DN - Key: HDFS-6552 URL: https://issues.apache.org/jira/browse/HDFS-6552 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0, 2.4.0 Reporter: Amir Langer Priority: Trivial Fix For: Heterogeneous Storage (HDFS-2832) In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {{removeStorage(getStorageInfo(idx));}} method code: {{ 1 boolean addStorage(DatanodeStorageInfo storage) \{ 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) \{ 5 if (getStorageInfo(idx) == storage) \{ // the storage is already there 6return false; 7 \} else \{ 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 \} 13 \} 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20\} }} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6551) Rename with OVERWRITE option may throw NPE when the target file/directory is a reference INode
[ https://issues.apache.org/jira/browse/HDFS-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033532#comment-14033532 ] Hadoop QA commented on HDFS-6551: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650728/HDFS-6551.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7143//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7143//console This message is automatically generated. Rename with OVERWRITE option may throw NPE when the target file/directory is a reference INode -- Key: HDFS-6551 URL: https://issues.apache.org/jira/browse/HDFS-6551 Project: Hadoop HDFS Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6551.000.patch The following steps can reproduce the NPE: 1. create a snapshot on / 2. move /foo/file1 to /bar/ 3. rename /foo/file2 to /bar/file1 with the OVERWRITE option After step 2, /bar/file1 is a DstReference inode. In step 3, FSDirectory#unprotectedRename first detaches the DstReference inode from the WithCount inode, then it still calls the cleanSubtree method of the corresponding INodeFile instance, which triggers the NPE. We should follow the same logic in FSDirectory#unprotectedDelete which skips the cleanSubtree call in this scenario. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN
[ https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Langer updated HDFS-6552: -- Description: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {{removeStorage(getStorageInfo(idx));}} method code: {code:java} 1 boolean addStorage(DatanodeStorageInfo storage) \{ 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) \{ 5 if (getStorageInfo(idx) == storage) \{ // the storage is already there 6return false; 7 \} else \{ 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 \} 13 \} 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20\} {code} was: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {{removeStorage(getStorageInfo(idx));}} method code: {{ 1 boolean addStorage(DatanodeStorageInfo storage) \{ 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) \{ 5 if (getStorageInfo(idx) == storage) \{ // the storage is already there 6return false; 7 \} else \{ 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 \} 13 \} 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20\} }} add DN storage to a BlockInfo will not replace the different storage from same DN - Key: HDFS-6552 URL: https://issues.apache.org/jira/browse/HDFS-6552 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0, 2.4.0 Reporter: Amir Langer Priority: Trivial Fix For: Heterogeneous Storage (HDFS-2832) In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {{removeStorage(getStorageInfo(idx));}} method code: {code:java} 1 boolean addStorage(DatanodeStorageInfo storage) \{ 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) \{ 5 if (getStorageInfo(idx) == storage) \{ // the storage is already there 6return false; 7 \} else \{ 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 \} 13 \} 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20\} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN
[ https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Langer updated HDFS-6552: -- Description: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {code:java} removeStorage(getStorageInfo(idx)); {code} method code: {code:java} 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} {code} was: In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {{removeStorage(getStorageInfo(idx));}} method code: {code:java} 1 boolean addStorage(DatanodeStorageInfo storage) \{ 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) \{ 5 if (getStorageInfo(idx) == storage) \{ // the storage is already there 6return false; 7 \} else \{ 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 \} 13 \} 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20\} {code} add DN storage to a BlockInfo will not replace the different storage from same DN - Key: HDFS-6552 URL: https://issues.apache.org/jira/browse/HDFS-6552 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0, 2.4.0 Reporter: Amir Langer Priority: Trivial Fix For: Heterogeneous Storage (HDFS-2832) In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {code:java} removeStorage(getStorageInfo(idx)); {code} method code: {code:java} 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6534: Attachment: HDFS-6534.v2.patch FIx a minor typo causing linux build failed Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-6534.v1.patch, HDFS-6534.v2.patch When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6507: - Attachment: HDFS-6507.3.patch New patch addressed Vinay's review comments. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6505) file is corrupt due to last block is marked as corrupt by mistake
[ https://issues.apache.org/jira/browse/HDFS-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gordon Wang updated HDFS-6505: -- Summary: file is corrupt due to last block is marked as corrupt by mistake (was: Can not close file and file is corrupt due to last block is marked as corrupt) file is corrupt due to last block is marked as corrupt by mistake - Key: HDFS-6505 URL: https://issues.apache.org/jira/browse/HDFS-6505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Gordon Wang After appending a file, client could not close it. Because namenode could not complete the last block in file. The UC status of last block remained as COMMIT and never change. The namenode log was like this. {code} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* checkFileProgress: blk_1073741920_13948{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.28.1.2:50010|RBW]]} has not reached minimal replication 1 {code} After going through the log of namenode, I found a log like this {code} INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1073741920 added as corrupt on 172.28.1.2:50010 by sdw3/172.28.1.3 because client machine reported it {code} But actually, the last block was finished successfully in the data node. Because I could find the log in datanode {code} INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: Transmitted BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13808 (numBytes=50120352) to /172.28.1.3:50010 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.28.1.2:36860, dest: /172.28.1.2:50010, bytes: 51686616, op: HDFS_WRITE, cliID: libhdfs3_client_random_741511239_count_1_pid_215802_tid_140085714196576, offset: 0, srvID: DS-2074102060-172.28.1.2-50010-1401432768690, blockid: BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, duration: 189226453336 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, type=LAST_IN_PIPELINE, downstreams=0:[] terminating {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033657#comment-14033657 ] Vinayakumar B commented on HDFS-6507: - Instead of getting Proxy instances and addresses in separate lists and matching them based on the indexes, we can combine them in single list itself. {{HAUtil#getProxiesForAllNameNodesInNameservice(..)}} could return a list of {{ProxyAndInfo}}. {{ProxyAndInfo}} can have one more field to store address. In all places where needed to loop for all namenodes in HA case, we can loop over list of {{ProxyAndInfo}} and use {{getProxy()}} to get the Proxy instance and {{getAddress()}} to get corresponding address. Any thoughts? Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033680#comment-14033680 ] Hudson commented on HDFS-6539: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #586 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/586/]) HDFS-6539. test_native_mini_dfs is skipped in hadoop-hdfs pom.xml (decstery via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602998) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.5.0 Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033682#comment-14033682 ] Hudson commented on HDFS-6518: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #586 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/586/]) HDFS-6518. TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603016) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033681#comment-14033681 ] Hudson commented on HDFS-6528: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #586 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/586/]) HDFS-6528. Add XAttrs to TestOfflineImageViewer. Contributed by Stephen Chu. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603020) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033688#comment-14033688 ] Zesheng Wu commented on HDFS-6507: -- I checked the related code, ProxyAndInfo is used in 3 places: {{NameNodeProxies#createProxyWithLossyRetryHandler}}, {{NameNodeProxies#createProxy}}, {{NameNodeProxies#createProxyWithLossyRetryHandler}}, in the frist place we can obtain NN address directly, but in the last two places, we can not obtain NN's address directly, we only have NN's URI. In non-HA case, we can get NN address by {{ NameNode.getAddress(nameNodeUri)}}, by in HA case, it seems not easy. How do you think? Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033693#comment-14033693 ] Zesheng Wu commented on HDFS-6507: -- Oh, It seems that NameNode also has a {{ getAddress(URI filesystemURI)}}, this will work? Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6507: Attachment: HDFS-6507.4-inprogress.patch Please check this patch. In this check the finalizeUpgrade(). Similar changes needs to be done for all other commands Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033731#comment-14033731 ] Hadoop QA commented on HDFS-6507: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650769/HDFS-6507.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7144//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7144//console This message is automatically generated. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033733#comment-14033733 ] Vinayakumar B commented on HDFS-6507: - I did not check the patch #4 of yours while uploading, Your latest patch is having changes as I wanted. Few comments are, 1. everywhere change successfully to just successful. To be grammatically correct. :) 2. In finalizeUpgrade also we need to print the message to user about the operation in case of HA. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033735#comment-14033735 ] Vinayakumar B commented on HDFS-6507: - One more query I have. using this implementation we can execute commands successfully when all namenodes of a nameservice are up and running. But what if standby nodes are down for maintainance and these comes first in configurations...? Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033737#comment-14033737 ] Zesheng Wu commented on HDFS-6507: -- OK, let me figure these too out:) Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033738#comment-14033738 ] Zesheng Wu commented on HDFS-6507: -- bq. One more query I have. using this implementation we can execute commands successfully when all namenodes of a nameservice are up and running. But what if standby nodes are down for maintainance and these comes first in configurations...? From my understanding, in this case we can just fail the operation, and users can retry after standby nodes are up. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033748#comment-14033748 ] Vinayakumar B commented on HDFS-6507: - Thats not acceptable. User should be able to do operations even when standby node is down. There should not be any dependency for client/admin commands on standby node. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033754#comment-14033754 ] Zesheng Wu commented on HDFS-6507: -- Mmm, maybe in this case, user can use {{-fs}} generic option to specify a NN to operate? Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033760#comment-14033760 ] Vinayakumar B commented on HDFS-6507: - Oh. Yes. Thats also possible. Better to get others opinion as well. Any thoughts folks? Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6507: - Attachment: HDFS-6507.5.patch Some minor polishes. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033725#comment-14033725 ] Zesheng Wu commented on HDFS-6507: -- It seems that finalizeUpgrade() doesn't output any prompt messages, you mean that we should remove the printed messages? Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6545: - Attachment: HDFS-6545.v2.patch I made the logsync conditional in the new patch. Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-6545.patch, HDFS-6545.v2.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-6545: Assignee: Kihwal Lee Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-6545.patch, HDFS-6545.v2.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033814#comment-14033814 ] Kihwal Lee commented on HDFS-6527: -- +1 for the test improvement. Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033832#comment-14033832 ] Hudson commented on HDFS-6539: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1777 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1777/]) HDFS-6539. test_native_mini_dfs is skipped in hadoop-hdfs pom.xml (decstery via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602998) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.5.0 Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033834#comment-14033834 ] Hudson commented on HDFS-6518: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1777 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1777/]) HDFS-6518. TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603016) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033833#comment-14033833 ] Hudson commented on HDFS-6528: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1777 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1777/]) HDFS-6528. Add XAttrs to TestOfflineImageViewer. Contributed by Stephen Chu. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603020) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033884#comment-14033884 ] Hadoop QA commented on HDFS-6507: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650788/HDFS-6507.4-inprogress.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/7145//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7145//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7145//console This message is automatically generated. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033912#comment-14033912 ] Hudson commented on HDFS-6539: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1804 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1804/]) HDFS-6539. test_native_mini_dfs is skipped in hadoop-hdfs pom.xml (decstery via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602998) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.5.0 Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033914#comment-14033914 ] Hudson commented on HDFS-6518: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1804 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1804/]) HDFS-6518. TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603016) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033913#comment-14033913 ] Hudson commented on HDFS-6528: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1804 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1804/]) HDFS-6528. Add XAttrs to TestOfflineImageViewer. Contributed by Stephen Chu. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603020) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033925#comment-14033925 ] Hadoop QA commented on HDFS-6507: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650788/HDFS-6507.4-inprogress.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/7146//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestCrcCorruption {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7146//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7146//console This message is automatically generated. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6507: - Attachment: HDFS-6507.6.patch The failed test {{TestCrcCorruption}} doesn't relate to this issue I run it on my local machine and passed The javadoc warning is also weird, I run the mvn command on my local machine, and there's no such warning. Just resubmit the patch, and trigger the Jenkins. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, HDFS-6507.6.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033963#comment-14033963 ] Hadoop QA commented on HDFS-6507: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650792/HDFS-6507.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.tools.TestDFSAdmin org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7147//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7147//console This message is automatically generated. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, HDFS-6507.6.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033975#comment-14033975 ] Owen O'Malley commented on HDFS-6134: - The right way to do this is to have the Yarn job submission get the appropriate keys from KMS like it currently gets delegation tokens. Both the delegation tokens and the keys should be put into the job's credential object. That way you don't have all 100,000 containers hitting the KMS at once. It does mean we need a new interface for filesystems that given a list of paths, you ensure the keys are in a credential object. FileInputFormat and FileOutputFormat should check to see if the FileSystem implements that interface and pass in the job's credential object. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033981#comment-14033981 ] Owen O'Malley commented on HDFS-6134: - A follow up on that is that of course KMS will need proxy users so that Oozie will be able to get keys for the users. (If that is desired.) Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033990#comment-14033990 ] Daryn Sharp commented on HDFS-6475: --- Is a static method in InvalidToken really the right place to check and handle other non-InvalidToken exceptions? WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034002#comment-14034002 ] Alejandro Abdelnur commented on HDFS-6134: -- [~owen.omalley], Yarn job submission does not necessary know what files the yarn app will access, and which one of those are encrypted and what keys to fetch. that is the whole point of transparent encryption. the KMS caches keys and easily scales horizontally behind a VIP so it will be able to handle very large number of requests. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034016#comment-14034016 ] Owen O'Malley commented on HDFS-6134: - Alejandro, which use cases don't know their inputs or outputs? Clearly the main ones do know their input and output: * MapReduce * Hive * Pig It is important for the standard cases that we get the encryption keys up front instead of letting the horde of containers do it. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034028#comment-14034028 ] Larry McCay commented on HDFS-6134: --- Hmmm, I agree with Owen. For usecases where these are not inherently known, metadata or some other packaging mechanism will need to identify the keys or file for which keys are required. Additionally, adding getDelegationToken to KeyProvider API is leaking specific provider implementations through the KeyProvider abstraction and should be avoided. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034031#comment-14034031 ] Alejandro Abdelnur commented on HDFS-6134: -- i.e.: if a M/R task opens a side file from HDFS that is not part of the input or output of the MR job. I've seen this quite often. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6536) FileSystem.Cache.closeAll() throws authentication exception at the end of a webhdfs client
[ https://issues.apache.org/jira/browse/HDFS-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034039#comment-14034039 ] Daryn Sharp commented on HDFS-6536: --- The problem should manifest with something as simple as this in the doAs: {code} FileSystem fs = FileSystem.get(conf); fs.getFileStatus(new Path(/)); {code} Webhdfs internally acquires a token and tries to cancel upon close of the fs. The tokens you explicitly acquired should be inconsequential. The issue is TokenAspect doesn't know the ugi context that acquired the token. See if HDFS-6222 solves the problem. Webhdfs will cancel its own token directly instead of via TokenAspect. If it solves the issue, let's dup this jira. FileSystem.Cache.closeAll() throws authentication exception at the end of a webhdfs client -- Key: HDFS-6536 URL: https://issues.apache.org/jira/browse/HDFS-6536 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Priority: Minor With a small client program below, when running as user root which doesn't have kerberos credential, exception is thrown at the end of the client run. The config is HA security enabled, with client config setting {code} property namefs.defaultFS/name valuewebhdfs://ns1/value /property {code} The client program: {code} public class kclient1 { public static void main(String[] args) throws IOException { final Configuration conf = new Configuration(); //a non-root user final UserGroupInformation ugi = UserGroupInformation.getUGIFromTicketCache(/tmp/krb5cc_496, h...@xyz.com); System.out.println(Starting); ugi.doAs(new PrivilegedActionObject() { @Override public Object run() { try { FileSystem fs = FileSystem.get(conf); String renewer = abcdefg; fs.addDelegationTokens( renewer, ugi.getCredentials()); // Just to prove that we connected with right credentials. fs.getFileStatus(new Path(/)); return fs.getDelegationToken(renewer); } catch (Exception e) { e.printStackTrace(); return null; } } }); System.out.println(THE END); } } {code} Output: {code} [root@yjzc5w-1 tmp2]# hadoop --config /tmp2/conf jar kclient1.jar kclient1.kclient1 Starting 14/06/14 20:38:51 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:52 INFO web.WebHdfsFileSystem: Retrying connect to namenode: yjzc5w-2.xyz.com/172.26.3.87:20101. Already tried 0 time(s); retry policy is org.apache.hadoop.io.retry.RetryPolicies$FailoverOnNetworkExceptionRetry@1a92210, delay 0ms. To prove that connection with right credentials to get file status updated updated 7 THE END 14/06/14 20:38:53 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:53 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 14/06/14 20:38:53 INFO fs.FileSystem: FileSystem.Cache.closeAll() threw an exception: java.io.IOException: Authentication failed, url=http://yjzc5w-2.xyz.com:20101/webhdfs/v1/?op=CANCELDELEGATIONTOKENuser.name=roottoken=HAAEaGRmcwRoZGZzAIoBRp2bNByKAUbBp7gcbBQUD6vWmRYJRv03XZj7Jajf8PU8CB8SV0VCSERGUyBkZWxlZ2F0aW9uC2hhLWhkZnM6bnMx [root@yjzc5w-1 tmp2]# {code} We can see the the exception is thrown in the end of the client run. I found that the problem is that at the end of client run, the FileSystem$Cache$ClientFinalizer is run, in which process the tokens stored in the filesystem cache is get cancelled with the following all: {code} final class TokenAspectT extends FileSystem Renewable { @InterfaceAudience.Private public static class TokenManager extends TokenRenewer { @Override public void cancel(Token? token, Configuration conf) throws IOException { getInstance(token, conf).cancelDelegationToken(token); == } {code} where getInstance(token, conf) create a FileSystem as user root, then call cancelDelegationToken to server side. However, server doesn't have root kerberos credential, so
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034040#comment-14034040 ] Larry McCay commented on HDFS-6134: --- [~tucu00] - that is a good example of where additional metadata would have to indicate that a resource that requires a key is required by this deployed application. The idea is to avoid KMS having to deal with hadoop runtime level scale when it can be accommodated at submit time. It is also much better to fail at submit time if the key is not available than at runtime. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN
[ https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034042#comment-14034042 ] Arpit Agarwal commented on HDFS-6552: - Thanks for reporting this [~langera]. It looks like a bug. add DN storage to a BlockInfo will not replace the different storage from same DN - Key: HDFS-6552 URL: https://issues.apache.org/jira/browse/HDFS-6552 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0, 2.4.0 Reporter: Amir Langer Priority: Trivial Fix For: Heterogeneous Storage (HDFS-2832) In BlockInfo - addStorage code looks wrong. At line 10 (below) - we remove the storage we're about to add from the list of storages, then add it. If the aim was to replace the different storage that was there the line should have been: {code:java} removeStorage(getStorageInfo(idx)); {code} method code: {code:java} 1 boolean addStorage(DatanodeStorageInfo storage) { 2 boolean added = true; 3int idx = findDatanode(storage.getDatanodeDescriptor()); 4 if(idx = 0) { 5 if (getStorageInfo(idx) == storage) { // the storage is already there 6return false; 7 } else { 8// The block is on the DN but belongs to a different storage. 9// Update our state. 10removeStorage(storage); 11added = false; // Just updating storage. Return false. 12 } 13 } 14 // find the last null node 15 int lastNode = ensureCapacity(1); 16 setStorageInfo(lastNode, storage); 17 setNext(lastNode, null); 18 setPrevious(lastNode, null); 19 return added; 20} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034076#comment-14034076 ] Jing Zhao commented on HDFS-6545: - The patch looks good to me. +1. Thanks for working on this, [~kihwal]. Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-6545.patch, HDFS-6545.v2.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034097#comment-14034097 ] Jing Zhao commented on HDFS-6527: - Thanks for reviewing the test change, [~kihwal]! I will commit this late today if no further comments. Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client
[ https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034115#comment-14034115 ] Aaron T. Myers commented on HDFS-6549: -- I'm quite confident the test failure is unrelated. It failed with this error: {code} Problem binding to [0.0.0.0:50020] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException {code} Add support for accessing the NFS gateway from the AIX NFS client - Key: HDFS-6549 URL: https://issues.apache.org/jira/browse/HDFS-6549 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-6549.patch We've identified two issues when trying to access the HDFS NFS Gateway from an AIX NFS client: # In the case of COMMITs, the AIX NFS client will always send 4096, or a multiple of the page size, for the offset to be committed, even if fewer bytes than this have ever, or will ever, be written to the file. This will cause a write to a file from the AIX NFS client to hang on close unless the size of that file is a multiple of 4096. # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the same cookie verifier for a given directory seemingly forever after that directory is first accessed over NFS, instead of getting a new cookie verifier for every set of incremental readdir calls. This means that if a directory's mtime ever changes, the FS must be unmounted/remounted before readdir calls on that dir from AIX will ever succeed again. From my interpretation of RFC-1813, the NFS Gateway is in fact doing the correct thing in both cases, but we can introduce simple changes on the NFS Gateway side to be able to optionally work around these incompatibilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6527: - Target Version/s: 2.5.0 (was: 2.4.1) Changing the target version from 2.4.1 to 2.5.0 since 2.4.1 is already cut. Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034136#comment-14034136 ] Kihwal Lee commented on HDFS-6545: -- Thanks for the review [~jingzhao]. I've committed this to trunk and branch-2. Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6545.patch, HDFS-6545.v2.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6545: - Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6545.patch, HDFS-6545.v2.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034149#comment-14034149 ] Hudson commented on HDFS-6545: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5719 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5719/]) HDFS-6545. Finalizing rolling upgrade can make NN unavailable for a long duration. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603239) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6545.patch, HDFS-6545.v2.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034155#comment-14034155 ] Owen O'Malley commented on HDFS-6134: - Alejandro, this is *exactly* equivalent of the delegation token. If a job is opening side files, it needs to make sure it has the right delegation tokens and keys. For delegation tokens, we added an extra config option for listing the extra file systems. The same solution (or listing the extra key versions) would make sense. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034164#comment-14034164 ] Todd Lipcon commented on HDFS-5851: --- Yep, the native checksumming that James is working on is one big part of it. The other half is the work that Trevor Robinson was doing on using DirectByteBuffers on the DN side to avoid some copies to/from byte arrays. Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better
[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034204#comment-14034204 ] Hadoop QA commented on HDFS-6507: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650821/HDFS-6507.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.TestDFSAdmin org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7149//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7149//console This message is automatically generated. Improve DFSAdmin to support HA cluster better - Key: HDFS-6507 URL: https://issues.apache.org/jira/browse/HDFS-6507 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, HDFS-6507.6.patch Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used: 1. ClientProtocol Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems: a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result. b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems. Here I propose the following improvements: a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly. b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs. 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs. 3. ClientDatanodeProtocol Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034218#comment-14034218 ] Brandon Li commented on HDFS-6439: -- Thank you, Aaron, for the review! {quote}add a deprecation delta for the old one so that this change will be backward compatible in that respect.{quote} Are you referring nfs.port.monitoring.enabled? It's pretty new and I don't think it's in any release yet. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs
[ https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034229#comment-14034229 ] Charles Lamb commented on HDFS-6492: LGTM. I only have a few picky little nits. The javadoc you added needs an @returns. final decls. I see a few in args, but generally none in method bodies. I'm whiny about this as you know. I'd be happy if you did a query-replace s/ListXAttr/final ListXAttr. s/verifing/verifying/ I prefer 'for (int i = 0; i numToAdd; i++)' to 'for (int i=0; inumToAdd; i++)'. I think the Java coding standards support that. We discussed (offline) the lack of mkdir support in fseditlog. I just wanted to add it to the Jira for the record. I thought import foo.* was not in the Java coding standard. Support create-time xattrs and atomically setting multiple xattrs - Key: HDFS-6492 URL: https://issues.apache.org/jira/browse/HDFS-6492 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-6492.001.patch, HDFS-6492.002.patch Ongoing work in HDFS-6134 requires being able to set system namespace extended attributes at create and mkdir time, as well as being able to atomically set multiple xattrs at once. There's currently no need to expose this functionality in the client API, so let's not unless we have to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034233#comment-14034233 ] Aaron T. Myers commented on HDFS-6439: -- bq. Are you referring nfs.port.monitoring.enabled? It's pretty new and I don't think it's in any release yet. Yep, that's the one I meant. I'd put it in regardless - can't cause any harm. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3848) A Bug in recoverLeaseInternal method of FSNameSystem class
[ https://issues.apache.org/jira/browse/HDFS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034258#comment-14034258 ] Kihwal Lee commented on HDFS-3848: -- I kicked the precommit again. It looks like the patch is still good, but it will be nice if a test case is added. A Bug in recoverLeaseInternal method of FSNameSystem class -- Key: HDFS-3848 URL: https://issues.apache.org/jira/browse/HDFS-3848 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.1 Reporter: Hooman Peiro Sajjad Labels: patch Attachments: HDFS-3848-1.patch Original Estimate: 1h Remaining Estimate: 1h This is a bug in logic of the method recoverLeaseInternal. In line 1322 it checks if the owner of the file is trying to recreate the file. The condition of the if statement is (leaseFile != null leaseFile.equals(lease)) || lease.getHolder().equals(holder) As it can be seen, there are two operands (conditions) connected with an or operator. The first operand is straight and will be true only if the holder of the file is the new holder. But the problem is the second operand which will be always true since the lease object is the one found by the holder by calling Lease lease = leaseManager.getLease(holder); in line 1315. To fix this I think the if statement only should contain the following the condition: (leaseFile != null leaseFile.getHolder().equals(holder)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
Stephen Chu created HDFS-6553: - Summary: Add missing DeprecationDeltas for NFS Kerberos configurations Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-6553: -- Attachment: HDFS-6553.patch.001 Attaching patch. This adds two DeprecationDeltas for dfs.nfs.keytab.file and dfs.nfs.kerberos.principal. No unit tests added. I manually verified by starting a secure NFS gateway successfully using the older nfs kerberos configs. Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6553: - Target Version/s: 2.5.0 Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3848) A Bug in recoverLeaseInternal method of FSNameSystem class
[ https://issues.apache.org/jira/browse/HDFS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034282#comment-14034282 ] Chen He commented on HDFS-3848: --- I will add the test case. A Bug in recoverLeaseInternal method of FSNameSystem class -- Key: HDFS-3848 URL: https://issues.apache.org/jira/browse/HDFS-3848 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.1 Reporter: Hooman Peiro Sajjad Labels: patch Attachments: HDFS-3848-1.patch Original Estimate: 1h Remaining Estimate: 1h This is a bug in logic of the method recoverLeaseInternal. In line 1322 it checks if the owner of the file is trying to recreate the file. The condition of the if statement is (leaseFile != null leaseFile.equals(lease)) || lease.getHolder().equals(holder) As it can be seen, there are two operands (conditions) connected with an or operator. The first operand is straight and will be true only if the holder of the file is the new holder. But the problem is the second operand which will be always true since the lease object is the one found by the holder by calling Lease lease = leaseManager.getLease(holder); in line 1315. To fix this I think the if statement only should contain the following the condition: (leaseFile != null leaseFile.getHolder().equals(holder)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034283#comment-14034283 ] Aaron T. Myers commented on HDFS-6553: -- Good stuff, Stephen. Thanks a lot for catching this and providing a patch. While you're looking at this code, would you mind also ensuring that we didn't goof and miss any other deprecations in HDFS-6056? Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034286#comment-14034286 ] Stephen Chu commented on HDFS-6553: --- That's a good idea. Let me check for any other deprecations. Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034370#comment-14034370 ] Stephen Chu commented on HDFS-6553: --- Hi ATM, I went through HDFS-6056, which was fixed in 2.5.0. The following configurations seem to not have DeprecationDeltas: dfs.nfs.exports.cache.size - nfs.exports.cache.size dfs.nfs.exports.cache.size was added in HDFS-5136 (2.3.0) They following were added in 2.4.0 in HDFS-6080: dfs.nfs.rtmax - nfs.rtmax dfs.nfs.wtmax - nfs.wtmax dfs.nfs.dtmax - nfs.dtmax Shall I add them to this patch? Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6439: - Attachment: HDFS-6439.005.patch NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.005.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034384#comment-14034384 ] Brandon Li commented on HDFS-6439: -- OK. The new patch addressed the comments. Thanks, Aaron. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.005.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6439: - Attachment: (was: HDFS-6439.005.patch) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6439: - Attachment: HDFS-6439.005.patch NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.005.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-5546: Attachment: HDFS-5546.2.001.patch Hey [~cmccabe] This patch includes a unit test that deletes a fraction of sub-directories in the middle of listing the parent directory. In the end, this patch verifies the rest of the directory are finished even there is one or more FNF in the process. race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Kousuke Saruta Priority: Minor Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-5546: Fix Version/s: 3.0.0 Status: Patch Available (was: Open) This patch catches FileNotFound exception during {{ls}} execution and ignores it, to handle a case that there is deletion in the sub-namespace. Unit tests are included. It is a _best effort_ to finish the `ls` execution. Thus it could not discover newly changes on the directory that is currently being iterated. E.g., the case of renaming {{/foo/bar}} to {{/foo/zoo}}, when running {{ls /foo}} is not handled. That is, in such case, {{/foo/bar}} is considered _deleted_, but the {{/foo/zoo}} is not visible to the current execution. race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034459#comment-14034459 ] Aaron T. Myers commented on HDFS-6439: -- Thanks a lot for addressing my comments, Brandon. Latest patch looks good to me. +1 pending Jenkins. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.005.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6554) Create unencrypted streams interface
Charles Lamb created HDFS-6554: -- Summary: Create unencrypted streams interface Key: HDFS-6554 URL: https://issues.apache.org/jira/browse/HDFS-6554 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb There needs to be an interface to encrypted files that streams the unencrypted data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6555) Create a crypto blob
Charles Lamb created HDFS-6555: -- Summary: Create a crypto blob Key: HDFS-6555 URL: https://issues.apache.org/jira/browse/HDFS-6555 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb We need to create a Crypto Blob for passing around crypto info. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034476#comment-14034476 ] Aaron T. Myers commented on HDFS-6553: -- Yes please, that sounds great. When you do, make sure to hit submit patch so that test-patch will run on the patch. Thanks a lot, Stephen. Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client
[ https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034486#comment-14034486 ] Aaron T. Myers commented on HDFS-6549: -- [~brandonli] - would you mind taking a look at this patch? Thanks a lot. Add support for accessing the NFS gateway from the AIX NFS client - Key: HDFS-6549 URL: https://issues.apache.org/jira/browse/HDFS-6549 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-6549.patch We've identified two issues when trying to access the HDFS NFS Gateway from an AIX NFS client: # In the case of COMMITs, the AIX NFS client will always send 4096, or a multiple of the page size, for the offset to be committed, even if fewer bytes than this have ever, or will ever, be written to the file. This will cause a write to a file from the AIX NFS client to hang on close unless the size of that file is a multiple of 4096. # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the same cookie verifier for a given directory seemingly forever after that directory is first accessed over NFS, instead of getting a new cookie verifier for every set of incremental readdir calls. This means that if a directory's mtime ever changes, the FS must be unmounted/remounted before readdir calls on that dir from AIX will ever succeed again. From my interpretation of RFC-1813, the NFS Gateway is in fact doing the correct thing in both cases, but we can introduce simple changes on the NFS Gateway side to be able to optionally work around these incompatibilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client
[ https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034494#comment-14034494 ] Brandon Li commented on HDFS-6549: -- Sure. I will review it shortly. Add support for accessing the NFS gateway from the AIX NFS client - Key: HDFS-6549 URL: https://issues.apache.org/jira/browse/HDFS-6549 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-6549.patch We've identified two issues when trying to access the HDFS NFS Gateway from an AIX NFS client: # In the case of COMMITs, the AIX NFS client will always send 4096, or a multiple of the page size, for the offset to be committed, even if fewer bytes than this have ever, or will ever, be written to the file. This will cause a write to a file from the AIX NFS client to hang on close unless the size of that file is a multiple of 4096. # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the same cookie verifier for a given directory seemingly forever after that directory is first accessed over NFS, instead of getting a new cookie verifier for every set of incremental readdir calls. This means that if a directory's mtime ever changes, the FS must be unmounted/remounted before readdir calls on that dir from AIX will ever succeed again. From my interpretation of RFC-1813, the NFS Gateway is in fact doing the correct thing in both cases, but we can introduce simple changes on the NFS Gateway side to be able to optionally work around these incompatibilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client
[ https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034497#comment-14034497 ] Aaron T. Myers commented on HDFS-6549: -- Thanks much. Add support for accessing the NFS gateway from the AIX NFS client - Key: HDFS-6549 URL: https://issues.apache.org/jira/browse/HDFS-6549 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-6549.patch We've identified two issues when trying to access the HDFS NFS Gateway from an AIX NFS client: # In the case of COMMITs, the AIX NFS client will always send 4096, or a multiple of the page size, for the offset to be committed, even if fewer bytes than this have ever, or will ever, be written to the file. This will cause a write to a file from the AIX NFS client to hang on close unless the size of that file is a multiple of 4096. # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the same cookie verifier for a given directory seemingly forever after that directory is first accessed over NFS, instead of getting a new cookie verifier for every set of incremental readdir calls. This means that if a directory's mtime ever changes, the FS must be unmounted/remounted before readdir calls on that dir from AIX will ever succeed again. From my interpretation of RFC-1813, the NFS Gateway is in fact doing the correct thing in both cases, but we can introduce simple changes on the NFS Gateway side to be able to optionally work around these incompatibilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6403) Add metrics for log warnings reported by JVM pauses
[ https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6403: Attachment: HDFS-6403.003.patch Add metrics for log warnings reported by JVM pauses --- Key: HDFS-6403 URL: https://issues.apache.org/jira/browse/HDFS-6403 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch, HDFS-6403.003.patch HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed as a metric, then they can be monitored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034503#comment-14034503 ] Jing Zhao commented on HDFS-6480: - The patch looks good to me. Some comments: # I think the change will make more sense after we move the fsimage reference from FSDirectory to FSNamesystem, because only after that can the FSDirectory become a pure in-memory data structure for namespace. But feel free to do it in a separate jira. # We may want to update the javadoc for FSDirectory. # It will be better to rename ready and waitReady etc. to something like fsdirLoaded etc. # TestFSDirectory#testReset should be moved to TestFSNamesystem and renamed to testResetFSDirectory. Move waitForReady() from FSDirectory to FSNamesystem Key: HDFS-6480 URL: https://issues.apache.org/jira/browse/HDFS-6480 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch, HDFS-6480.002.patch Currently FSDirectory implements a barrier in {{waitForReady()}} / {{setReady()}} so that it only serve requests once the FSImage is fully loaded. As a part of the effort to evolve {{FSDirectory}} to a class which focuses on implementing the data structure of the namespace, this jira proposes to move the barrier one level higher to {{FSNamesystem}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034511#comment-14034511 ] Brandon Li commented on HDFS-6553: -- Thank you, Stephen, for finding issue and providing the patch! Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6403) Add metrics for log warnings reported by JVM pauses
[ https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034514#comment-14034514 ] Yongjun Zhang commented on HDFS-6403: - HI [~atm], Thanks a lot for the review and the good comments. I just uploaded version 004 to address your suggestions. Add metrics for log warnings reported by JVM pauses --- Key: HDFS-6403 URL: https://issues.apache.org/jira/browse/HDFS-6403 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch, HDFS-6403.003.patch HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed as a metric, then they can be monitored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6403) Add metrics for log warnings reported by JVM pauses
[ https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034515#comment-14034515 ] Yongjun Zhang commented on HDFS-6403: - I meant version 003. Add metrics for log warnings reported by JVM pauses --- Key: HDFS-6403 URL: https://issues.apache.org/jira/browse/HDFS-6403 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch, HDFS-6403.003.patch HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed as a metric, then they can be monitored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-6553: -- Status: Patch Available (was: Open) Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001, HDFS-6553.patch.002 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-6553: -- Attachment: HDFS-6553.patch.002 Thanks, ATM and Brandon. I updated the patch to include the other properties. I built successfully, and successfully brought up a secure NFS gateway using the old nfs property names. Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001, HDFS-6553.patch.002 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034535#comment-14034535 ] Hadoop QA commented on HDFS-5546: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650888/HDFS-5546.2.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.fs.TestPath org.apache.hadoop.fs.shell.TestPathData The following test timeouts occurred in hadoop-common-project/hadoop-common: org.apache.hadoop.http.TestHttpServer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7151//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7151//console This message is automatically generated. race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6553) Add missing DeprecationDeltas for NFS Kerberos configurations
[ https://issues.apache.org/jira/browse/HDFS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034538#comment-14034538 ] Aaron T. Myers commented on HDFS-6553: -- Latest patch looks good to me. +1 pending Jenkins. Add missing DeprecationDeltas for NFS Kerberos configurations - Key: HDFS-6553 URL: https://issues.apache.org/jira/browse/HDFS-6553 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-6553.patch.001, HDFS-6553.patch.002 HDFS-6056 (Clean up NFS config settings) improved NFS configuration naming and added DeprecationDeltas to ensure that the old NFS configuration properties could still be used. It's currently missing DeprecationDeltas for _dfs.nfs.keytab.file_ and _dfs.nfs.kerberos.principal_. This patch adds those deltas so older configs for secure NFS can still work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6403) Add metrics for log warnings reported by JVM pauses
[ https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034547#comment-14034547 ] Akira AJISAKA commented on HDFS-6403: - Thanks [~yzhangal] for the nice work! Would you please document the new metrics to Metrics.apt.vm? Add metrics for log warnings reported by JVM pauses --- Key: HDFS-6403 URL: https://issues.apache.org/jira/browse/HDFS-6403 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch, HDFS-6403.003.patch HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed as a metric, then they can be monitored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6390) chown/chgrp users/groups blacklist for encrypted files
[ https://issues.apache.org/jira/browse/HDFS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb resolved HDFS-6390. Resolution: Not a Problem This is no longer a problem now that the NN will never handle key material. All access to key material is handled through the KMS access control mechanisms. chown/chgrp users/groups blacklist for encrypted files --- Key: HDFS-6390 URL: https://issues.apache.org/jira/browse/HDFS-6390 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb A blacklist of users and groups that stops an admin from changing the owner/group of the file for encrypted files and directories. This blacklist would typically contain the regular users used by admins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6387) HDFS CLI admin tool for creating deleting an encryption zone
[ https://issues.apache.org/jira/browse/HDFS-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6387: --- Attachment: HDFS-6387.002.patch Rebased patch w/ unit tests. The unit tests will have to be re-run and modified accordingly once we get HDFS-6386 committed. HDFS CLI admin tool for creating deleting an encryption zone -- Key: HDFS-6387 URL: https://issues.apache.org/jira/browse/HDFS-6387 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6387.002.patch, HDFS-6387.1.patch CLI admin tool to create/delete an encryption zone in HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6393) User settable xAttr to stop HDFS admins from reading/chowning a file
[ https://issues.apache.org/jira/browse/HDFS-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb resolved HDFS-6393. Resolution: Not a Problem This is no longer a problem now that the NN will never handle key material. All access to key material is handled through the KMS access control mechanisms. User settable xAttr to stop HDFS admins from reading/chowning a file Key: HDFS-6393 URL: https://issues.apache.org/jira/browse/HDFS-6393 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb A user should be able to set an xAttr on any file in HDFS to stop an HDFS admin user from reading the file. The blacklist for chown/chgrp would also enforced. This will stop an HDFS admin from gaining access to job token files and getting HDFS DelegationTokens that would allow him/her to read an encrypted file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6516) Implement List Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034567#comment-14034567 ] Charles Lamb commented on HDFS-6516: Most of the backend of listEncryptionZones will be implemented under a different Jira. What remains to be done under this one is to make the list of EZs persistent. Implement List Encryption Zones --- Key: HDFS-6516 URL: https://issues.apache.org/jira/browse/HDFS-6516 Project: Hadoop HDFS Issue Type: Sub-task Components: security Reporter: Charles Lamb Assignee: Charles Lamb The list Encryption Zones command (CLI) and backend implementation (FSNamesystem) needs to be implemented. As part of this, the tests in TestEncryptionZonesAPI should be updated to use that to validate the results of the various CreateEZ and DeleteEZ tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HDFS-6386) HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-6386 started by Charles Lamb. HDFS Encryption Zones - Key: HDFS-6386 URL: https://issues.apache.org/jira/browse/HDFS-6386 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: HDFS-6386.012.patch, HDFS-6386.013.patch, HDFS-6386.4.patch, HDFS-6386.5.patch, HDFS-6386.6.patch, HDFS-6386.8.patch Define the required security xAttributes for directories and files within an encryption zone and how they propagate to children. Implement the logic to create/delete encryption zones. -- This message was sent by Atlassian JIRA (v6.2#6252)