date:20140617


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033472#comment-14033472
 ] 

Vinayakumar B commented on HDFS-6507:
-

Thanks [~wuzesheng] for working on this. 
Here are some of the comments on your latest patch.

1.
{code}+  inSafeMode = nn.setSafeMode(SafeModeAction.SAFEMODE_GET, 
false);{code}
{{inSafeMode}} value inside {{waitExitSafeMode(..)}} will not be reflected in 
the below print statement.
{code}+System.out.println(Safe mode is  + (inSafeMode ? ON : 
OFF));{code}
It will always print the initial state. I feel {{waitExitSafeMode(..)}} can 
return the latest value and same can be assigned to {{inSafeMode}} in 
{{setSafeMode(..)}} method.
ex: {code}+boolean inSafeMode = haNn.setSafeMode(action, false);
+if (waitExitSafe) {
+  inSafeMode = waitExitSafeMode(haNn, inSafeMode);
 }
-inSafeMode = dfs.setSafeMode(SafeModeAction.SAFEMODE_GET);
+System.out.println(Safe mode is  + (inSafeMode ? ON : 
OFF));{code}

2. In case of HA, it will be better if we can print the NameNode address while 
printing the status of safemode. 
ex:
{code}
+System.out.println(Safe mode is  + (inSafeMode ? ON : OFF) +  
in namenode address);}{code}

3. {code}+System.out.println(Save namespace successfully);{code}
  Message could be Saved namespace successfully in namenode address

4. again, same as #3, all messages for HA can include namenode address,

5. In {{metaSafe(..)}} following dfs.getUri() should be replaced with actual 
namenode address in case of HA
{code}+  for (ClientProtocol haNn : namenodes) {
+haNn.metaSave(pathname);
+System.out.println(Created metasave file  + pathname +  in the log 
 +
+directory of namenode  + dfs.getUri());
+  }{code}

6. Message could be changed as below
{code}+System.out.println(Refresh service acl successful for namenode 
address);{code}

7. Message could be changed as below
{code}+System.out.println(Refresh user to groups mapping successful 
for namenode address);{code}

 Improve DFSAdmin to support HA cluster better
 -

 Key: HDFS-6507
 URL: https://issues.apache.org/jira/browse/HDFS-6507
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch


 Currently, the commands supported in DFSAdmin can be classified into three 
 categories according to the protocol used:
 1. ClientProtocol
 Commands in this category generally implement by calling the corresponding 
 function of the DFSClient class, and will call the corresponding remote 
 implementation function at the NN side finally. At the NN side, all these 
 operations are classified into five categories: UNCHECKED, READ, WRITE, 
 CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
 allows UNCHECKED operations. In the current implementation of DFSClient, it 
 will connect one NN first, if the first NN is not Active and the operation is 
 not allowed, it will failover to the second NN. So here comes the problem, 
 some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
 refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
 UNCHECKED operations, and when executing these commands in the DFSAdmin 
 command line, they will be sent to a definite NN, no matter it is Active or 
 Standby. This may result in two problems: 
 a. If the first tried NN is standby, and the operation takes effect only on 
 Standby NN, which is not the expected result.
 b. If the operation needs to take effect on both NN, but it takes effect on 
 only one NN. In the future, when there is a NN failover, there may have 
 problems.
 Here I propose the following improvements:
 a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
 operations, we should classify it clearly.
 b. If the command can not be classified as one of the above four operations, 
 or if the command needs to take effect on both NN, we should send the request 
 to both Active and Standby NNs.
 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
 RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
 RefreshCallQueueProtocol
 Commands in this category, including refreshServiceAcl, 
 refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
 refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
 sending the request to remote NN. In the current implementation, these 
 requests will be sent to a definite NN, no matter it is Active or Standby. 
 Here I propose that we sent these requests to both NNs.
 3. ClientDatanodeProtocol
 Commands in this category are

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033480#comment-14033480
]

Zesheng Wu commented on HDFS-6507:
--

Thanks [~vinayrpet] for reviewing the patch, all comments are reasonable to me,
I will generate a new patch soon to address your comments.

Improve DFSAdmin to support HA cluster better
-

Currently, the commands supported in DFSAdmin can be classified into three
categories according to the protocol used:
1. ClientProtocol
Commands in this category generally implement by calling the corresponding
function of the DFSClient class, and will call the corresponding remote
implementation function at the NN side finally. At the NN side, all these
operations are classified into five categories: UNCHECKED, READ, WRITE,
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
allows UNCHECKED operations. In the current implementation of DFSClient, it
will connect one NN first, if the first NN is not Active and the operation is
not allowed, it will failover to the second NN. So here comes the problem,
some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage,
refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as
UNCHECKED operations, and when executing these commands in the DFSAdmin
command line, they will be sent to a definite NN, no matter it is Active or
Standby. This may result in two problems:
a. If the first tried NN is standby, and the operation takes effect only on
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on
only one NN. In the future, when there is a NN failover, there may have
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations,
or if the command needs to take effect on both NN, we should send the request
to both Active and Standby NNs.
2. Refresh protocols: RefreshAuthorizationPolicyProtocol,
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol,
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl,
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and
refreshCallQueue, are implemented by creating a corresponding RPC proxy and
sending the request to remote NN. In the current implementation, these
requests will be sent to a definite NN, no matter it is Active or Standby.
Here I propose that we sent these requests to both NNs.
3. ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN

Amir Langer created HDFS-6552:
-

 Summary: add DN storage to a BlockInfo will not replace the 
different storage from same DN
 Key: HDFS-6552
 URL: https://issues.apache.org/jira/browse/HDFS-6552
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0, 2.3.0
Reporter: Amir Langer
Priority: Trivial
 Fix For: Heterogeneous Storage (HDFS-2832)


In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

removeStorage(getStorageInfo(idx));

method code:

1 boolean addStorage(DatanodeStorageInfo storage) {
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) {
5  if (getStorageInfo(idx) == storage) { // the storage is already there
6return false;
7  } else {
8// The block is on the DN but belongs to a different storage.
9// Update our state.
10removeStorage(storage);
11added = false;  // Just updating storage. Return false.
12 }
13  }
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN


 [ 
https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Langer updated HDFS-6552:
--

Description: 
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

removeStorage(getStorageInfo(idx));

method code:

1 boolean addStorage(DatanodeStorageInfo storage) {
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) {
5  if (getStorageInfo(idx) == storage) { // the storage is already there

6return false;

7  } else {

8// The block is on the DN but belongs to a different storage.

9// Update our state.

10removeStorage(storage);

11added = false;  // Just updating storage. Return false.

12 }
13  }
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20}

  was:
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

removeStorage(getStorageInfo(idx));

method code:

1 boolean addStorage(DatanodeStorageInfo storage) {
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) {
5  if (getStorageInfo(idx) == storage) { // the storage is already there
6return false;
7  } else {
8// The block is on the DN but belongs to a different storage.
9// Update our state.
10removeStorage(storage);
11added = false;  // Just updating storage. Return false.
12 }
13  }
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20}


 add DN storage to a BlockInfo will not replace the different storage from 
 same DN
 -

 Key: HDFS-6552
 URL: https://issues.apache.org/jira/browse/HDFS-6552
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0, 2.4.0
Reporter: Amir Langer
Priority: Trivial
 Fix For: Heterogeneous Storage (HDFS-2832)


 In BlockInfo - addStorage code looks wrong.
 At line 10 (below) - we remove the storage we're about to add from the list 
 of storages, then add it. 
 If the aim was to replace the different storage that was there the line 
 should have been:
 removeStorage(getStorageInfo(idx));
 method code:
 1 boolean addStorage(DatanodeStorageInfo storage) {
 2   boolean added = true;
 3int idx = findDatanode(storage.getDatanodeDescriptor());
 4   if(idx = 0) {
 5  if (getStorageInfo(idx) == storage) { // the storage is already there
 6return false;
 7  } else {
 8// The block is on the DN but belongs to a different storage.
 9// Update our state.
 10removeStorage(storage);
 11added = false;  // Just updating storage. Return false.
 12 }
 13  }
 14  // find the last null node
 15  int lastNode = ensureCapacity(1);
 16  setStorageInfo(lastNode, storage);
 17  setNext(lastNode, null);
 18  setPrevious(lastNode, null);
 19  return added;
 20}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN


 [ 
https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Langer updated HDFS-6552:
--

Description: 
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

{{removeStorage(getStorageInfo(idx));}}

method code:
{{
1 boolean addStorage(DatanodeStorageInfo storage) \{
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) \{
5  if (getStorageInfo(idx) == storage) \{ // the storage is already there
6return false;
7  \} else \{
8// The block is on the DN but belongs to a different storage.
9// Update our state.
10removeStorage(storage);
11added = false;  // Just updating storage. Return false.
12 \}
13  \}
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20\}
}}

  was:
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

removeStorage(getStorageInfo(idx));

method code:

1 boolean addStorage(DatanodeStorageInfo storage) {
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) {
5  if (getStorageInfo(idx) == storage) { // the storage is already there

6return false;

7  } else {

8// The block is on the DN but belongs to a different storage.

9// Update our state.

10removeStorage(storage);

11added = false;  // Just updating storage. Return false.

12 }
13  }
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20}


 add DN storage to a BlockInfo will not replace the different storage from 
 same DN
 -

 Key: HDFS-6552
 URL: https://issues.apache.org/jira/browse/HDFS-6552
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0, 2.4.0
Reporter: Amir Langer
Priority: Trivial
 Fix For: Heterogeneous Storage (HDFS-2832)


 In BlockInfo - addStorage code looks wrong.
 At line 10 (below) - we remove the storage we're about to add from the list 
 of storages, then add it. 
 If the aim was to replace the different storage that was there the line 
 should have been:
 {{removeStorage(getStorageInfo(idx));}}
 method code:
 {{
 1 boolean addStorage(DatanodeStorageInfo storage) \{
 2   boolean added = true;
 3int idx = findDatanode(storage.getDatanodeDescriptor());
 4   if(idx = 0) \{
 5  if (getStorageInfo(idx) == storage) \{ // the storage is already there
 6return false;
 7  \} else \{
 8// The block is on the DN but belongs to a different storage.
 9// Update our state.
 10removeStorage(storage);
 11added = false;  // Just updating storage. Return false.
 12 \}
 13  \}
 14  // find the last null node
 15  int lastNode = ensureCapacity(1);
 16  setStorageInfo(lastNode, storage);
 17  setNext(lastNode, null);
 18  setPrevious(lastNode, null);
 19  return added;
 20\}
 }}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6551) Rename with OVERWRITE option may throw NPE when the target file/directory is a reference INode


[ 
https://issues.apache.org/jira/browse/HDFS-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033532#comment-14033532
 ] 

Hadoop QA commented on HDFS-6551:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650728/HDFS-6551.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7143//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7143//console

This message is automatically generated.

 Rename with OVERWRITE option may throw NPE when the target file/directory is 
 a reference INode
 --

 Key: HDFS-6551
 URL: https://issues.apache.org/jira/browse/HDFS-6551
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6551.000.patch


 The following steps can reproduce the NPE:
 1. create a snapshot on /
 2. move /foo/file1 to /bar/
 3. rename /foo/file2 to /bar/file1 with the OVERWRITE option
 After step 2, /bar/file1 is a DstReference inode. In step 3, 
 FSDirectory#unprotectedRename first detaches the DstReference inode from the 
 WithCount inode, then it still calls the cleanSubtree method of the 
 corresponding INodeFile instance, which triggers the NPE. We should follow 
 the same logic in FSDirectory#unprotectedDelete which skips the cleanSubtree 
 call in this scenario.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN


 [ 
https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Langer updated HDFS-6552:
--

Description: 
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

{{removeStorage(getStorageInfo(idx));}}

method code:
{code:java}
1 boolean addStorage(DatanodeStorageInfo storage) \{
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) \{
5  if (getStorageInfo(idx) == storage) \{ // the storage is already there
6return false;
7  \} else \{
8// The block is on the DN but belongs to a different storage.
9// Update our state.
10removeStorage(storage);
11added = false;  // Just updating storage. Return false.
12 \}
13  \}
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20\}
{code}

  was:
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

{{removeStorage(getStorageInfo(idx));}}

method code:
{{
1 boolean addStorage(DatanodeStorageInfo storage) \{
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) \{
5  if (getStorageInfo(idx) == storage) \{ // the storage is already there
6return false;
7  \} else \{
8// The block is on the DN but belongs to a different storage.
9// Update our state.
10removeStorage(storage);
11added = false;  // Just updating storage. Return false.
12 \}
13  \}
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20\}
}}


 add DN storage to a BlockInfo will not replace the different storage from 
 same DN
 -

 Key: HDFS-6552
 URL: https://issues.apache.org/jira/browse/HDFS-6552
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0, 2.4.0
Reporter: Amir Langer
Priority: Trivial
 Fix For: Heterogeneous Storage (HDFS-2832)


 In BlockInfo - addStorage code looks wrong.
 At line 10 (below) - we remove the storage we're about to add from the list 
 of storages, then add it. 
 If the aim was to replace the different storage that was there the line 
 should have been:
 {{removeStorage(getStorageInfo(idx));}}
 method code:
 {code:java}
 1 boolean addStorage(DatanodeStorageInfo storage) \{
 2   boolean added = true;
 3int idx = findDatanode(storage.getDatanodeDescriptor());
 4   if(idx = 0) \{
 5  if (getStorageInfo(idx) == storage) \{ // the storage is already there
 6return false;
 7  \} else \{
 8// The block is on the DN but belongs to a different storage.
 9// Update our state.
 10removeStorage(storage);
 11added = false;  // Just updating storage. Return false.
 12 \}
 13  \}
 14  // find the last null node
 15  int lastNode = ensureCapacity(1);
 16  setStorageInfo(lastNode, storage);
 17  setNext(lastNode, null);
 18  setPrevious(lastNode, null);
 19  return added;
 20\}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6552) add DN storage to a BlockInfo will not replace the different storage from same DN


 [ 
https://issues.apache.org/jira/browse/HDFS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Langer updated HDFS-6552:
--

Description: 
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

{code:java}
removeStorage(getStorageInfo(idx));
{code}

method code:
{code:java}
1 boolean addStorage(DatanodeStorageInfo storage) {
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) {
5  if (getStorageInfo(idx) == storage) { // the storage is already there
6return false;
7  } else {
8// The block is on the DN but belongs to a different storage.
9// Update our state.
10removeStorage(storage);
11added = false;  // Just updating storage. Return false.
12 }
13  }
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20}
{code}

  was:
In BlockInfo - addStorage code looks wrong.
At line 10 (below) - we remove the storage we're about to add from the list of 
storages, then add it. 
If the aim was to replace the different storage that was there the line should 
have been:

{{removeStorage(getStorageInfo(idx));}}

method code:
{code:java}
1 boolean addStorage(DatanodeStorageInfo storage) \{
2   boolean added = true;
3int idx = findDatanode(storage.getDatanodeDescriptor());
4   if(idx = 0) \{
5  if (getStorageInfo(idx) == storage) \{ // the storage is already there
6return false;
7  \} else \{
8// The block is on the DN but belongs to a different storage.
9// Update our state.
10removeStorage(storage);
11added = false;  // Just updating storage. Return false.
12 \}
13  \}
14  // find the last null node
15  int lastNode = ensureCapacity(1);
16  setStorageInfo(lastNode, storage);
17  setNext(lastNode, null);
18  setPrevious(lastNode, null);
19  return added;
20\}
{code}


 add DN storage to a BlockInfo will not replace the different storage from 
 same DN
 -

 Key: HDFS-6552
 URL: https://issues.apache.org/jira/browse/HDFS-6552
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0, 2.4.0
Reporter: Amir Langer
Priority: Trivial
 Fix For: Heterogeneous Storage (HDFS-2832)


 In BlockInfo - addStorage code looks wrong.
 At line 10 (below) - we remove the storage we're about to add from the list 
 of storages, then add it. 
 If the aim was to replace the different storage that was there the line 
 should have been:
 {code:java}
 removeStorage(getStorageInfo(idx));
 {code}
 method code:
 {code:java}
 1 boolean addStorage(DatanodeStorageInfo storage) {
 2   boolean added = true;
 3int idx = findDatanode(storage.getDatanodeDescriptor());
 4   if(idx = 0) {
 5  if (getStorageInfo(idx) == storage) { // the storage is already there
 6return false;
 7  } else {
 8// The block is on the DN but belongs to a different storage.
 9// Update our state.
 10removeStorage(storage);
 11added = false;  // Just updating storage. Return false.
 12 }
 13  }
 14  // find the last null node
 15  int lastNode = ensureCapacity(1);
 16  setStorageInfo(lastNode, storage);
 17  setNext(lastNode, null);
 18  setPrevious(lastNode, null);
 19  return added;
 20}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts

2014-06-17 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6534:


Attachment: HDFS-6534.v2.patch

FIx a minor typo causing linux build failed

 Fix build on macosx: HDFS parts
 ---

 Key: HDFS-6534
 URL: https://issues.apache.org/jira/browse/HDFS-6534
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-6534.v1.patch, HDFS-6534.v2.patch


 When compiling native code on macosx using clang, compiler find more warning 
 and errors which gcc ignores, those should be fixed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.3.patch

New patch addressed Vinay's review comments.

Improve DFSAdmin to support HA cluster better
-

Key: HDFS-6507
URL: https://issues.apache.org/jira/browse/HDFS-6507
Project: Hadoop HDFS
Issue Type: Improvement
Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6505) file is corrupt due to last block is marked as corrupt by mistake

2014-06-17 Thread Gordon Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gordon Wang updated HDFS-6505:
--

Summary: file is corrupt due to last block is marked as corrupt by mistake  
(was: Can not close file and file is corrupt due to last block is marked as 
corrupt)

 file is corrupt due to last block is marked as corrupt by mistake
 -

 Key: HDFS-6505
 URL: https://issues.apache.org/jira/browse/HDFS-6505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Gordon Wang

 After appending a file, client could not close it. Because namenode could not 
 complete the last block in file. The UC status of last block remained as 
 COMMIT and never change.
 The namenode log was like this.
 {code}
 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
 checkFileProgress: blk_1073741920_13948{blockUCState=COMMITTED, 
 primaryNodeIndex=-1,
 replicas=[ReplicaUnderConstruction[172.28.1.2:50010|RBW]]} has not reached 
 minimal replication 1
 {code}
 After going through the log of namenode, I found a log like this
 {code}
 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: 
 blk_1073741920 added as corrupt on 172.28.1.2:50010 by sdw3/172.28.1.3 
 because client machine reported it
 {code}
 But actually, the last block was finished successfully in the data node. 
 Because I could find the log in datanode
 {code}
 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: 
 Transmitted BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13808 
 (numBytes=50120352) to /172.28.1.3:50010
 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
 /172.28.1.2:36860, dest: /172.28.1.2:50010, bytes: 51686616, op: HDFS_WRITE, 
 cliID: 
 libhdfs3_client_random_741511239_count_1_pid_215802_tid_140085714196576, 
 offset: 0, srvID: DS-2074102060-172.28.1.2-50010-1401432768690, blockid: 
 BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, duration: 
 189226453336
 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: 
 BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, 
 type=LAST_IN_PIPELINE, downstreams=0:[] terminating
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033657#comment-14033657
]

Vinayakumar B commented on HDFS-6507:
-

Instead of getting Proxy instances and addresses in separate lists and matching
them based on the indexes, we can combine them in single list itself.

{{HAUtil#getProxiesForAllNameNodesInNameservice(..)}} could return a list of
{{ProxyAndInfo}}.
{{ProxyAndInfo}} can have one more field to store address.

In all places where needed to loop for all namenodes in HA case, we can loop
over list of {{ProxyAndInfo}} and use {{getProxy()}} to get the Proxy instance
and {{getAddress()}} to get corresponding address.

Any thoughts?

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml


[ 
https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033680#comment-14033680
 ] 

Hudson commented on HDFS-6539:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #586 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/586/])
HDFS-6539. test_native_mini_dfs is skipped in hadoop-hdfs pom.xml (decstery via 
cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602998)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml


 test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 --

 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: HDFS-6539.v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list


[ 
https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033682#comment-14033682
 ] 

Hudson commented on HDFS-6518:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #586 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/586/])
HDFS-6518. TestCacheDirectives#testExceedsCapacity should take FSN read lock 
when accessing pendingCached list. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603016)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 TestCacheDirectives#testExceedsCapacity should take FSN read lock when 
 accessing pendingCached list
 ---

 Key: HDFS-6518
 URL: https://issues.apache.org/jira/browse/HDFS-6518
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Andrew Wang
 Fix For: 2.5.0

 Attachments: HDFS-6518.001.patch


 Observed from 
 https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/
 Test 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
 fails intermittently
 {code}
 Failing for the past 1 build (Since Failed#7080 )
 Took 7.3 sec.
 Stacktrace
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437)
 {code}
 A second run with the same code is successful,
 https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/
 Running it locally is also successful.
  HDFS-6257 mentioned about possible race, maybe the issue is still there.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer


[ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033681#comment-14033681
 ] 

Hudson commented on HDFS-6528:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #586 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/586/])
HDFS-6528. Add XAttrs to TestOfflineImageViewer. Contributed by Stephen Chu. 
(wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603020)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java


 Add XAttrs to TestOfflineImageViewer
 

 Key: HDFS-6528
 URL: https://issues.apache.org/jira/browse/HDFS-6528
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0, 2.5.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, 
 HDFS-6528.003.patch


 We should test that the OfflineImageViewer can run successfully against an 
 fsimage with the new XAttr ops.
 In this patch, we set and remove XAttrs when preparing the fsimage in 
 TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033688#comment-14033688
]

Zesheng Wu commented on HDFS-6507:
--

I checked the related code, ProxyAndInfo is used in 3 places:
{{NameNodeProxies#createProxyWithLossyRetryHandler}},
{{NameNodeProxies#createProxy}},
{{NameNodeProxies#createProxyWithLossyRetryHandler}}, in the frist place we can
obtain NN address directly, but in the last two places, we can not obtain NN's
address directly, we only have NN's URI. In non-HA case, we can get NN address
by {{ NameNode.getAddress(nameNodeUri)}}, by in HA case, it seems not easy. How
do you think?

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033693#comment-14033693
]

Zesheng Wu commented on HDFS-6507:
--

Oh, It seems that NameNode also has a {{ getAddress(URI filesystemURI)}}, this
will work?

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinayakumar B updated HDFS-6507:

Attachment: HDFS-6507.4-inprogress.patch

Please check this patch.
In this check the finalizeUpgrade(). Similar changes needs to be done for all
other commands

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033731#comment-14033731
]

Hadoop QA commented on HDFS-6507:
-

{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12650769/HDFS-6507.3.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/7144//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7144//console

This message is automatically generated.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033733#comment-14033733
]

Vinayakumar B commented on HDFS-6507:
-

I did not check the patch #4 of yours while uploading, Your latest patch is
having changes as I wanted.
Few comments are,

1. everywhere change successfully to just successful. To be grammatically
correct. :)
2. In finalizeUpgrade also we need to print the message to user about the
operation in case of HA.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033735#comment-14033735
]

Vinayakumar B commented on HDFS-6507:
-

One more query I have.

using this implementation we can execute commands successfully when all
namenodes of a nameservice are up and running.

But what if standby nodes are down for maintainance and these comes first in
configurations...?

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033737#comment-14033737
]

Zesheng Wu commented on HDFS-6507:
--

OK, let me figure these too out:)

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033738#comment-14033738
]

Zesheng Wu commented on HDFS-6507:
--

bq. One more query I have.
using this implementation we can execute commands successfully when all
namenodes of a nameservice are up and running.
But what if standby nodes are down for maintainance and these comes first in
configurations...?

From my understanding, in this case we can just fail the operation, and users
can retry after standby nodes are up.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033748#comment-14033748
]

Vinayakumar B commented on HDFS-6507:
-

Thats not acceptable. User should be able to do operations even when standby
node is down. There should not be any dependency for client/admin commands on
standby node.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033754#comment-14033754
]

Zesheng Wu commented on HDFS-6507:
--

Mmm, maybe in this case, user can use {{-fs}} generic option to specify a NN to
operate?

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033760#comment-14033760
]

Vinayakumar B commented on HDFS-6507:
-

Oh. Yes. Thats also possible.

Better to get others opinion as well.
Any thoughts folks?

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.5.patch

Some minor polishes.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033725#comment-14033725
]

Zesheng Wu commented on HDFS-6507:
--

It seems that finalizeUpgrade() doesn't output any prompt messages, you mean
that we should remove the printed messages?

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration

2014-06-17 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6545:
-

Attachment: HDFS-6545.v2.patch

I made the logsync conditional in the new patch.

 Finalizing rolling upgrade can make NN unavailable for a long duration
 --

 Key: HDFS-6545
 URL: https://issues.apache.org/jira/browse/HDFS-6545
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-6545.patch, HDFS-6545.v2.patch


 In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly 
 called.  For name nodes with a big name space, it can take minutes to save a 
 new fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration

2014-06-17 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-6545:


Assignee: Kihwal Lee

 Finalizing rolling upgrade can make NN unavailable for a long duration
 --

 Key: HDFS-6545
 URL: https://issues.apache.org/jira/browse/HDFS-6545
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-6545.patch, HDFS-6545.v2.patch


 In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly 
 called.  For name nodes with a big name space, it can take minutes to save a 
 new fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-17 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033814#comment-14033814
 ] 

Kihwal Lee commented on HDFS-6527:
--

+1 for the test improvement.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml


[ 
https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033832#comment-14033832
 ] 

Hudson commented on HDFS-6539:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1777 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1777/])
HDFS-6539. test_native_mini_dfs is skipped in hadoop-hdfs pom.xml (decstery via 
cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602998)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml


 test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 --

 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: HDFS-6539.v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list


[ 
https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033834#comment-14033834
 ] 

Hudson commented on HDFS-6518:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1777 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1777/])
HDFS-6518. TestCacheDirectives#testExceedsCapacity should take FSN read lock 
when accessing pendingCached list. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603016)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 TestCacheDirectives#testExceedsCapacity should take FSN read lock when 
 accessing pendingCached list
 ---

 Key: HDFS-6518
 URL: https://issues.apache.org/jira/browse/HDFS-6518
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Andrew Wang
 Fix For: 2.5.0

 Attachments: HDFS-6518.001.patch


 Observed from 
 https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/
 Test 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
 fails intermittently
 {code}
 Failing for the past 1 build (Since Failed#7080 )
 Took 7.3 sec.
 Stacktrace
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437)
 {code}
 A second run with the same code is successful,
 https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/
 Running it locally is also successful.
  HDFS-6257 mentioned about possible race, maybe the issue is still there.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer


[ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033833#comment-14033833
 ] 

Hudson commented on HDFS-6528:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1777 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1777/])
HDFS-6528. Add XAttrs to TestOfflineImageViewer. Contributed by Stephen Chu. 
(wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603020)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java


 Add XAttrs to TestOfflineImageViewer
 

 Key: HDFS-6528
 URL: https://issues.apache.org/jira/browse/HDFS-6528
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0, 2.5.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, 
 HDFS-6528.003.patch


 We should test that the OfflineImageViewer can run successfully against an 
 fsimage with the new XAttr ops.
 In this patch, we set and remove XAttrs when preparing the fsimage in 
 TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033884#comment-14033884
]

Hadoop QA commented on HDFS-6507:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12650788/HDFS-6507.4-inprogress.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1
warning messages.
See
https://builds.apache.org/job/PreCommit-HDFS-Build/7145//artifact/trunk/patchprocess/diffJavadocWarnings.txt
for details.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/7145//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7145//console

This message is automatically generated.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml


[ 
https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033912#comment-14033912
 ] 

Hudson commented on HDFS-6539:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1804 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1804/])
HDFS-6539. test_native_mini_dfs is skipped in hadoop-hdfs pom.xml (decstery via 
cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602998)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml


 test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 --

 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: HDFS-6539.v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list


[ 
https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033914#comment-14033914
 ] 

Hudson commented on HDFS-6518:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1804 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1804/])
HDFS-6518. TestCacheDirectives#testExceedsCapacity should take FSN read lock 
when accessing pendingCached list. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603016)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 TestCacheDirectives#testExceedsCapacity should take FSN read lock when 
 accessing pendingCached list
 ---

 Key: HDFS-6518
 URL: https://issues.apache.org/jira/browse/HDFS-6518
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Andrew Wang
 Fix For: 2.5.0

 Attachments: HDFS-6518.001.patch


 Observed from 
 https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/
 Test 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
 fails intermittently
 {code}
 Failing for the past 1 build (Since Failed#7080 )
 Took 7.3 sec.
 Stacktrace
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437)
 {code}
 A second run with the same code is successful,
 https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/
 Running it locally is also successful.
  HDFS-6257 mentioned about possible race, maybe the issue is still there.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer


[ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033913#comment-14033913
 ] 

Hudson commented on HDFS-6528:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1804 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1804/])
HDFS-6528. Add XAttrs to TestOfflineImageViewer. Contributed by Stephen Chu. 
(wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603020)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java


 Add XAttrs to TestOfflineImageViewer
 

 Key: HDFS-6528
 URL: https://issues.apache.org/jira/browse/HDFS-6528
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0, 2.5.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, 
 HDFS-6528.003.patch


 We should test that the OfflineImageViewer can run successfully against an 
 fsimage with the new XAttr ops.
 In this patch, we set and remove XAttrs when preparing the fsimage in 
 TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033925#comment-14033925
]

Hadoop QA commented on HDFS-6507:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12650788/HDFS-6507.4-inprogress.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1
warning messages.
See
https://builds.apache.org/job/PreCommit-HDFS-Build/7146//artifact/trunk/patchprocess/diffJavadocWarnings.txt
for details.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestCrcCorruption

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/7146//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7146//console

This message is automatically generated.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.6.patch

The failed test {{TestCrcCorruption}} doesn't relate to this issue
I run it on my local machine and passed
The javadoc warning is also weird, I run the mvn command on my local machine,
and there's no such warning.

Just resubmit the patch, and trigger the Jenkins.

Improve DFSAdmin to support HA cluster better
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better