[jira] [Updated] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7235:

Attachment: HDFS-7235.004.patch

Upload same patch 004 again since the test failure appears to irrelevant.


> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7235:

Attachment: (was: HDFS-7235.004.patch)

> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3342) SocketTimeoutException in BlockSender.sendChunks could have a better error message

2014-10-22 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-3342:

Attachment: HDFS-3342.002.patch

Couple of the failed tests appear to be flaky, other twos are fixed recently. 
Upload same patch again to trigger new run.


> SocketTimeoutException in BlockSender.sendChunks could have a better error 
> message
> --
>
> Key: HDFS-3342
> URL: https://issues.apache.org/jira/browse/HDFS-3342
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Yongjun Zhang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-3342.001.patch, HDFS-3342.002.patch, 
> HDFS-3342.002.patch
>
>
> Currently, if a client connects to a DN and begins to read a block, but then 
> stops calling read() for a long period of time, the DN will log a 
> SocketTimeoutException "48 millis timeout while waiting for channel to be 
> ready for write." This is because there is no "keepalive" functionality of 
> any kind. At a minimum, we should improve this error message to be an INFO 
> level log which just says that the client likely stopped reading, so 
> disconnecting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3342) SocketTimeoutException in BlockSender.sendChunks could have a better error message

2014-10-22 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-3342:

Attachment: (was: HDFS-3342.002.patch)

> SocketTimeoutException in BlockSender.sendChunks could have a better error 
> message
> --
>
> Key: HDFS-3342
> URL: https://issues.apache.org/jira/browse/HDFS-3342
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Yongjun Zhang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-3342.001.patch, HDFS-3342.002.patch
>
>
> Currently, if a client connects to a DN and begins to read a block, but then 
> stops calling read() for a long period of time, the DN will log a 
> SocketTimeoutException "48 millis timeout while waiting for channel to be 
> ready for write." This is because there is no "keepalive" functionality of 
> any kind. At a minimum, we should improve this error message to be an INFO 
> level log which just says that the client likely stopped reading, so 
> disconnecting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck due to GC

2014-10-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7180:
-
Fix Version/s: 2.6.0

> NFSv3 gateway frequently gets stuck due to GC
> -
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
> java.io.IOException: Bad respon

[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181012#comment-14181012
 ] 

Hadoop QA commented on HDFS-7276:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676524/h7276_20141022.patch
  against trunk revision d71d40a.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8492//console

This message is automatically generated.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181011#comment-14181011
 ] 

Hadoop QA commented on HDFS-7235:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676522/HDFS-7235.004.patch
  against trunk revision b94b8b3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8491//console

This message is automatically generated.

> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck due to GC

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181010#comment-14181010
 ] 

Hudson commented on HDFS-7180:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6321 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6321/])
HDFS-7180. NFSv3 gateway frequently gets stuck due to GC. Contributed by 
Brandon Li (brandonli: rev d71d40a63d198991077d5babd70be5e9787a53f1)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtxCache.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java


> NFSv3 gateway frequently gets stuck due to GC
> -
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-1

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck due to GC

2014-10-22 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181009#comment-14181009
 ] 

Brandon Li commented on HDFS-7180:
--

Thank you, Jing, for the review. I've committed the patch.

> NFSv3 gateway frequently gets stuck due to GC
> -
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-14104305

[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181007#comment-14181007
 ] 

Aaron T. Myers commented on HDFS-7278:
--

bq. Very interesting. I have not encountered such an issue. If you have details 
it would be good to share.

I don't really have any firm details, but I do have a suspicion that we may 
have a bug which results in a block being considered under-replicated (possibly 
even entirely missing, if all replicas were affected) after a failover, when in 
fact all of the replicas of the block are just fine on the DNs in the cluster. 
In the case I will of course share all the details when I figure them out. :)

The latest patch looks pretty good to me. Just a few small comments:

# Seems like we should restrict this command to require super user privileges. 
As it stands I believe any user could connect to the DN to trigger a full BR, 
which though not super harmful doesn't seem right, either.
# I think there may be a small race condition in the test case. Since you 
create a file and then immediately create a spy object to examine calls between 
the DN and NN, and then assert that no calls of blockReceivedAndDeleted were 
made, I think it's possible that the DN RPC to send an immediate incremental BR 
for that file creation might be delayed until after you've created the spy, 
which would cause the test to unnecessarily fail. I think more reliable would 
be to create the spy object before creating the file, and then assert that 
exactly one IBR was sent.
# I suspect that the most common use of this command will be to trigger full 
block reports, not incremental block reports, given that those are sent rather 
frequently in a busy cluster anyway. Perhaps we should change the default 
behavior of the command to send a full BR, and change the optional flag to be 
"-incremental" instead?

+1 once these are addressed. Thanks, Colin.

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181007#comment-14181007
 ] 

Aaron T. Myers edited comment on HDFS-7278 at 10/23/14 4:27 AM:


bq. Very interesting. I have not encountered such an issue. If you have details 
it would be good to share.

I don't really have any firm details, but I do have a suspicion that we may 
have a bug which results in a block being considered under-replicated (possibly 
even entirely missing, if all replicas were affected) after a failover, when in 
fact all of the replicas of the block are just fine on the DNs in the cluster. 
I will of course share all the details when I figure them out. :)

The latest patch looks pretty good to me. Just a few small comments:

# Seems like we should restrict this command to require super user privileges. 
As it stands I believe any user could connect to the DN to trigger a full BR, 
which though not super harmful doesn't seem right, either.
# I think there may be a small race condition in the test case. Since you 
create a file and then immediately create a spy object to examine calls between 
the DN and NN, and then assert that no calls of blockReceivedAndDeleted were 
made, I think it's possible that the DN RPC to send an immediate incremental BR 
for that file creation might be delayed until after you've created the spy, 
which would cause the test to unnecessarily fail. I think more reliable would 
be to create the spy object before creating the file, and then assert that 
exactly one IBR was sent.
# I suspect that the most common use of this command will be to trigger full 
block reports, not incremental block reports, given that those are sent rather 
frequently in a busy cluster anyway. Perhaps we should change the default 
behavior of the command to send a full BR, and change the optional flag to be 
"-incremental" instead?

+1 once these are addressed. Thanks, Colin.


was (Author: atm):
bq. Very interesting. I have not encountered such an issue. If you have details 
it would be good to share.

I don't really have any firm details, but I do have a suspicion that we may 
have a bug which results in a block being considered under-replicated (possibly 
even entirely missing, if all replicas were affected) after a failover, when in 
fact all of the replicas of the block are just fine on the DNs in the cluster. 
In the case I will of course share all the details when I figure them out. :)

The latest patch looks pretty good to me. Just a few small comments:

# Seems like we should restrict this command to require super user privileges. 
As it stands I believe any user could connect to the DN to trigger a full BR, 
which though not super harmful doesn't seem right, either.
# I think there may be a small race condition in the test case. Since you 
create a file and then immediately create a spy object to examine calls between 
the DN and NN, and then assert that no calls of blockReceivedAndDeleted were 
made, I think it's possible that the DN RPC to send an immediate incremental BR 
for that file creation might be delayed until after you've created the spy, 
which would cause the test to unnecessarily fail. I think more reliable would 
be to create the spy object before creating the file, and then assert that 
exactly one IBR was sent.
# I suspect that the most common use of this command will be to trigger full 
block reports, not incremental block reports, given that those are sent rather 
frequently in a busy cluster anyway. Perhaps we should change the default 
behavior of the command to send a full BR, and change the optional flag to be 
"-incremental" instead?

+1 once these are addressed. Thanks, Colin.

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6742) Support sorting datanode list on the new NN webUI

2014-10-22 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181002#comment-14181002
 ] 

Chen He commented on HDFS-6742:
---

Hi [~l201514], I did not have time to work on this, take it if you want. 

> Support sorting datanode list on the new NN webUI
> -
>
> Key: HDFS-6742
> URL: https://issues.apache.org/jira/browse/HDFS-6742
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Chen He
>
> The legacy webUI allows sorting datanode list based on specific column such 
> as hostname. It is handy for admins can find pattern more quickly, especially 
> for big clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck due to GC

2014-10-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7180:
-
Summary: NFSv3 gateway frequently gets stuck due to GC  (was: NFSv3 gateway 
frequently gets stuck)

> NFSv3 gateway frequently gets stuck due to GC
> -
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-1410430543652:blk_107

[jira] [Updated] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-22 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7276:
--
Attachment: h7276_20141022.patch

h7276_20141022.patch: fixes a bug and adds some tests.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180991#comment-14180991
 ] 

Yongjun Zhang commented on HDFS-7235:
-

HI [~cmccabe], I just uploaded a new rev (004) to address your comments. Thanks.


> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180990#comment-14180990
 ] 

Brandon Li commented on HDFS-7180:
--

The javac warning and unit test failures are not introduced by this patch.

> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-141043054365

[jira] [Updated] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7235:

Attachment: HDFS-7235.004.patch

> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7223) Tracing span description of IPC client is too long

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180975#comment-14180975
 ] 

Hadoop QA commented on HDFS-7223:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676483/HDFS-7223-1.patch
  against trunk revision 3b12fd6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8489//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8489//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8489//console

This message is automatically generated.

> Tracing span description of IPC client is too long
> --
>
> Key: HDFS-7223
> URL: https://issues.apache.org/jira/browse/HDFS-7223
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7223-0.patch, HDFS-7223-1.patch
>
>
> Current span description for IPC call is too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id

2014-10-22 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180971#comment-14180971
 ] 

Chen He commented on HDFS-6663:
---

TestDNFencingWithReplication, TestNameEditsConfigs, and TestStandbyCheckpoints 
passed test on my machine. The latest QA does not show any test failure. Not 
sure why it gives me -1.

> Admin command to track file and locations from block id
> ---
>
> Key: HDFS-6663
> URL: https://issues.apache.org/jira/browse/HDFS-6663
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.5.0
>Reporter: Kihwal Lee
>Assignee: Chen He
> Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, 
> HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch
>
>
> A dfsadmin command that allows finding out the file and the locations given a 
> block number will be very useful in debugging production issues.   It may be 
> possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7231) rollingupgrade needs some guard rails

2014-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180312#comment-14180312
 ] 

Suresh Srinivas edited comment on HDFS-7231 at 10/23/14 3:29 AM:
-

Allen, I just rewrote the steps with additional details to clarify:
# Upgrade 2.0.5 cluster to 2.2
# Do not -finalizeUpgrade
# Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1.
# Start namenode -upgrade option.
# Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress
# Leave 2.4.1 DNs running
# Install binaries on NN to 2.2
# Start NN on 2.2 with no upgrade related options

So far things are clear. Then you go on to say, the following:
bq. DNs now do a partial roll-forward, rendering them unable to continue
What do you mean by this?

bq. admins manually repair version files on those broken directories
This as you know is a recipe for disaster :)

Let me ask you a question. Before you go on to 2.4.1, if you do finalize of 
upgrade what happens?


was (Author: sureshms):
Allen, I just rewrote the steps with additional details to clarify:
# Upgrade 2.0.5 cluster to 2.2
# Do not -finalizeUpgrade
# Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1.
# Start namenode -upgrade option.
# Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress
# Leave 2.4.1 DNs running
# Install binaries on NN to 2.2
# Start NN on 2.2 with no upgrade related options

So far things are clear. Then you go on to say, the following:
bq. DNs now do a partial roll-forward, rendering them unable to continue
What do you mean by this?

bq. admins manually repair version files on those broken directories
This is as you know is a recipe for disaster.

Let me ask you a question. Before you go on to 2.4.1, if you do finalize of 
upgrade what happens?

> rollingupgrade needs some guard rails
> -
>
> Key: HDFS-7231
> URL: https://issues.apache.org/jira/browse/HDFS-7231
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180962#comment-14180962
 ] 

Suresh Srinivas commented on HDFS-7278:
---

bq. I think it's a good tool to have in our toolbox to work around possible 
bugs in NN replica accounting.
Very interesting. I have not encountered such an issue. If you have details it 
would be good to share.

This command must be okay to add.

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180958#comment-14180958
 ] 

Hadoop QA commented on HDFS-7278:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676481/HDFS-7278.002.patch
  against trunk revision 3b12fd6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8488//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8488//console

This message is automatically generated.

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180908#comment-14180908
 ] 

Hadoop QA commented on HDFS-6988:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676462/HDFS-6988.03.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8486//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8486//console

This message is automatically generated.

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 3.0.0
>
> Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
> HDFS-6988.03.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block

2014-10-22 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180898#comment-14180898
 ] 

Yongjun Zhang commented on HDFS-7281:
-

Thanks reporting this issue [~mingma]. I happen to notice the same in a fsck 
report today. It's indeed confusing.



> Missing block is marked as corrupted block
> --
>
> Key: HDFS-7281
> URL: https://issues.apache.org/jira/browse/HDFS-7281
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>
> In the situation where the block lost all its replicas, fsck shows the block 
> is missing as well as corrupted. Perhaps it is better not to mark the block 
> corrupted in this case. The reason it is marked as corrupted is 
> numCorruptNodes == numNodes == 0 in the following code.
> {noformat}
> BlockManager
> final boolean isCorrupt = numCorruptNodes == numNodes;
> {noformat}
> Would like to clarify if it is the intent to mark missing block as corrupted 
> or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7281) Missing block is marked as corrupted block

2014-10-22 Thread Ming Ma (JIRA)
Ming Ma created HDFS-7281:
-

 Summary: Missing block is marked as corrupted block
 Key: HDFS-7281
 URL: https://issues.apache.org/jira/browse/HDFS-7281
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma


In the situation where the block lost all its replicas, fsck shows the block is 
missing as well as corrupted. Perhaps it is better not to mark the block 
corrupted in this case. The reason it is marked as corrupted is numCorruptNodes 
== numNodes == 0 in the following code.

{noformat}
BlockManager
final boolean isCorrupt = numCorruptNodes == numNodes;
{noformat}

Would like to clarify if it is the intent to mark missing block as corrupted or 
it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7280) Use netty 4 in WebImageViewer

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180869#comment-14180869
 ] 

Hadoop QA commented on HDFS-7280:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676498/HDFS-7280.000.patch
  against trunk revision f729ecf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8490//console

This message is automatically generated.

> Use netty 4 in WebImageViewer
> -
>
> Key: HDFS-7280
> URL: https://issues.apache.org/jira/browse/HDFS-7280
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7280.000.patch
>
>
> This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7280) Use netty 4 in WebImageViewer

2014-10-22 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7280:
-
Attachment: HDFS-7280.000.patch

> Use netty 4 in WebImageViewer
> -
>
> Key: HDFS-7280
> URL: https://issues.apache.org/jira/browse/HDFS-7280
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7280.000.patch
>
>
> This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7280) Use netty 4 in WebImageViewer

2014-10-22 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7280:
-
Status: Patch Available  (was: Open)

> Use netty 4 in WebImageViewer
> -
>
> Key: HDFS-7280
> URL: https://issues.apache.org/jira/browse/HDFS-7280
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7280.000.patch
>
>
> This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180853#comment-14180853
 ] 

Hadoop QA commented on HDFS-7180:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676474/HDFS-7180.003.patch
  against trunk revision 3b12fd6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1269 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8487//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8487//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8487//console

This message is automatically generated.

> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02

[jira] [Created] (HDFS-7280) Use netty 4 in WebImageViewer

2014-10-22 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-7280:


 Summary: Use netty 4 in WebImageViewer
 Key: HDFS-7280
 URL: https://issues.apache.org/jira/browse/HDFS-7280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180836#comment-14180836
 ] 

Haohui Mai commented on HDFS-5928:
--

The patch looks good. Tested on a non-HA cluster and it looks good to me.

{code}
+{#HAInfo}
+{Namespace} {NamenodeID}
+{/HAInfo}
{code}

Can you move the information into the table below? For example:

{code}
{#HAInfo}
  Namespace:{Namespace}
  Namenode ID:{NamenodeID}
{/HAInfo}
{code}

Can you post a screenshot on a HA cluster setup as well?



> show namespace and namenode ID on NN dfshealth page
> ---
>
> Key: HDFS-5928
> URL: https://issues.apache.org/jira/browse/HDFS-5928
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
> HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-22 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7279:
--
Component/s: webhdfs
 datanode

> Use netty to implement DatanodeWebHdfsMethods
> -
>
> Key: HDFS-7279
> URL: https://issues.apache.org/jira/browse/HDFS-7279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, webhdfs
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the DN implements all related webhdfs functionality using jetty. As 
> the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
> and connection management, DN often suffers from long latency and OOM when 
> its webhdfs component is under sustained heavy load.
> This jira proposes to implement the webhdfs component in DN using netty, 
> which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180824#comment-14180824
 ] 

Hudson commented on HDFS-7277:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6319 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6319/])
HDFS-7277. Remove explicit dependency on netty 3.2 in BKJournal. Contributed by 
Haohui Mai. (wheat9: rev f729ecf9d2b858e9ee97419e788f1a2ac38b15bb)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml


> Remove explicit dependency on netty 3.2 in BKJournal
> 
>
> Key: HDFS-7277
> URL: https://issues.apache.org/jira/browse/HDFS-7277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-7277.000.patch
>
>
> The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
> code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6888) Remove audit logging of getFIleInfo()

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180823#comment-14180823
 ] 

Hadoop QA commented on HDFS-6888:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676436/HDFS-6888-6.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8483//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8483//console

This message is automatically generated.

> Remove audit logging of getFIleInfo()
> -
>
> Key: HDFS-6888
> URL: https://issues.apache.org/jira/browse/HDFS-6888
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Kihwal Lee
>Assignee: Chen He
>  Labels: log
> Attachments: HDFS-6888-2.patch, HDFS-6888-3.patch, HDFS-6888-4.patch, 
> HDFS-6888-5.patch, HDFS-6888-6.patch, HDFS-6888.patch
>
>
> The audit logging of getFileInfo() was added in HDFS-3733.  Since this is a 
> one of the most called method, users have noticed that audit log is now 
> filled with this.  Since we now have HTTP request logging, this seems 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7277:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the 
reviews.

> Remove explicit dependency on netty 3.2 in BKJournal
> 
>
> Key: HDFS-7277
> URL: https://issues.apache.org/jira/browse/HDFS-7277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-7277.000.patch
>
>
> The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
> code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180814#comment-14180814
 ] 

Hadoop QA commented on HDFS-5928:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676430/HDFS-5928.v5.patch
  against trunk revision 70719e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHDFSAcl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8482//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8482//console

This message is automatically generated.

> show namespace and namenode ID on NN dfshealth page
> ---
>
> Key: HDFS-5928
> URL: https://issues.apache.org/jira/browse/HDFS-5928
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
> HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7223) Tracing span description of IPC client is too long

2014-10-22 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7223:
---
Attachment: HDFS-7223-1.patch

Thanks for the comment [~cmccabe]! I updated patch based on your suggestion.

> Tracing span description of IPC client is too long
> --
>
> Key: HDFS-7223
> URL: https://issues.apache.org/jira/browse/HDFS-7223
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7223-0.patch, HDFS-7223-1.patch
>
>
> Current span description for IPC call is too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180784#comment-14180784
 ] 

Haohui Mai commented on HDFS-7279:
--

An alternative option is to upgrade jetty and servlet. The New APIs from both 
jetty and servlet such as asynchronous servlet can amend some of the issues. 
Webhdfs on the DN side, however, is data intensive which does not fit the 
servlet API very well. The servlet / jetty APIs do not give fine-grain control 
on the resources that netty is able to provide. These controls are critical if 
webhdfs needs to survive on heavy workload.

The strategy is proven by the mapreduce client, which already uses netty to 
implement the shuffle functionality. For other URLs on the DNs, I plan to keep 
jetty listening on a local address, but to have a reverse proxy in netty to 
continue the serve these URLs.

> Use netty to implement DatanodeWebHdfsMethods
> -
>
> Key: HDFS-7279
> URL: https://issues.apache.org/jira/browse/HDFS-7279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the DN implements all related webhdfs functionality using jetty. As 
> the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
> and connection management, DN often suffers from long latency and OOM when 
> its webhdfs component is under sustained heavy load.
> This jira proposes to implement the webhdfs component in DN using netty, 
> which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180775#comment-14180775
 ] 

Aaron T. Myers commented on HDFS-7278:
--

I think it's a good tool to have in our toolbox to work around possible bugs in 
NN replica accounting. If an operator suspects such an issue, they might be 
tempted to restart a DN, or all of the DNs in a cluster, in order to trigger 
full block reports. It'd be much lighter weight if instead the operator could 
just manually trigger a full BR instead of having to restart the DN and 
therefore need to scan all the DN data dirs, etc.

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-22 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-7279:


 Summary: Use netty to implement DatanodeWebHdfsMethods
 Key: HDFS-7279
 URL: https://issues.apache.org/jira/browse/HDFS-7279
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently the DN implements all related webhdfs functionality using jetty. As 
the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
and connection management, DN often suffers from long latency and OOM when its 
webhdfs component is under sustained heavy load.

This jira proposes to implement the webhdfs component in DN using netty, which 
can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180770#comment-14180770
 ] 

Suresh Srinivas commented on HDFS-7278:
---

[~cmccabe], can you describe why this is needed so that others have context?

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7278:
---
Status: Patch Available  (was: Open)

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7278:
---
Attachment: HDFS-7278.002.patch

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> 
>
> Key: HDFS-7278
> URL: https://issues.apache.org/jira/browse/HDFS-7278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-7278:
--

 Summary: Add a command that allows sysadmins to manually trigger 
full block reports from a DN
 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


We should add a command that allows sysadmins to manually trigger full block 
reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180759#comment-14180759
 ] 

Jing Zhao commented on HDFS-7180:
-

+1 pending Jenkins

> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
> java.io.IOException: Bad respons

[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7180:
-
Attachment: HDFS-7180.003.patch

Uploaded a new patch to fix the findbugs warning.

> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
> HDFS-7180.003.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
> java.io.IOException:

[jira] [Commented] (HDFS-6742) Support sorting datanode list on the new NN webUI

2014-10-22 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180743#comment-14180743
 ] 

Siqi Li commented on HDFS-6742:
---

[~airbots] Hi Chen, any updates on this jira? It would be extremely helpful 
when dealing with cluster with thousands of nodes

> Support sorting datanode list on the new NN webUI
> -
>
> Key: HDFS-6742
> URL: https://issues.apache.org/jira/browse/HDFS-6742
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Chen He
>
> The legacy webUI allows sorting datanode list based on specific column such 
> as hostname. It is handy for admins can find pattern more quickly, especially 
> for big clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180725#comment-14180725
 ] 

Hadoop QA commented on HDFS-6663:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676423/HDFS-6663-5.patch
  against trunk revision 7b0f9bb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8481//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8481//console

This message is automatically generated.

> Admin command to track file and locations from block id
> ---
>
> Key: HDFS-6663
> URL: https://issues.apache.org/jira/browse/HDFS-6663
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.5.0
>Reporter: Kihwal Lee
>Assignee: Chen He
> Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, 
> HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch
>
>
> A dfsadmin command that allows finding out the file and the locations given a 
> block number will be very useful in debugging production issues.   It may be 
> possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-10-22 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180709#comment-14180709
 ] 

Xiaoyu Yao commented on HDFS-6988:
--

Thanks [~cmccabe] for the confirmation. I just submit a patch for it. 

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 3.0.0
>
> Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
> HDFS-6988.03.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7258) CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level

2014-10-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-7258:


Assignee: Xiaoyu Yao

> CacheReplicationMonitor rescan schedule log should use DEBUG level instead of 
> INFO level
> 
>
> Key: HDFS-7258
> URL: https://issues.apache.org/jira/browse/HDFS-7258
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>
> CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 
> seconds to HDSF NN log as shown below. This should be a DEBUG level log to 
> avoid flooding the namenode log.  
> 2014-10-17 07:52:30,265 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2014-10-17 07:52:30,265 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-10-17 07:53:00,265 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 30001 milliseconds
> 2014-10-17 07:53:00,266 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-10-17 07:53:30,267 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 30001 milliseconds
> 2014-10-17 07:53:30,267 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-10-17 07:54:00,267 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 30001 milliseconds
> 2014-10-17 07:54:00,268 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-10-17 07:54:30,268 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 30001 milliseconds
> 2014-10-17 07:54:30,269 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-10-17 07:55:00,269 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2014-10-17 07:55:00,269 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-10-17 07:55:30,268 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2014-10-17 07:55:30,269 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-10-17 07:56:00,269 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 30001 milliseconds
> 2014-10-17 07:56:00,270 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-10-17 07:56:30,270 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 30001 milliseconds
> 2014-10-17 07:56:30,271 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-10-17 07:57:00,271 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2014-10-17 07:57:00,272 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-10-17 07:57:30,271 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2014-10-17 07:57:30,272 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-10-17 07:58:00,271 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2014-10-17 07:58:00,271 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-10-17 07:58:30,271 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-10-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6988:
-
Fix Version/s: (was: HDFS-6581)
   3.0.0
Affects Version/s: (was: HDFS-6581)
   2.6.0
   Status: Patch Available  (was: In Progress)

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 3.0.0
>
> Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
> HDFS-6988.03.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180703#comment-14180703
 ] 

Jing Zhao commented on HDFS-7277:
-

+1

> Remove explicit dependency on netty 3.2 in BKJournal
> 
>
> Key: HDFS-7277
> URL: https://issues.apache.org/jira/browse/HDFS-7277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-7277.000.patch
>
>
> The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
> code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-10-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6988:
-
Attachment: HDFS-6988.03.patch

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: HDFS-6581
>
> Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
> HDFS-6988.03.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-10-22 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180682#comment-14180682
 ] 

Konstantin Shvachko commented on HDFS-6658:
---

I agree usually people remove data in order to have space to put more. And the 
freed space usually fills up again in a couple of weeks or months.
I don't know if this asnwer is good enough. It is for me, but in the end you 
got a bigger cluster.
It would be nice to find a way to detect fully empty arrays of the BlockList 
and release them once the last reference is removed. That should be good enough 
to avoid a stand-alone thread for garbage collecting or compacting in your 
terms.

> Namenode memory optimization - Block replicas list 
> ---
>
> Key: HDFS-6658
> URL: https://issues.apache.org/jira/browse/HDFS-6658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.1
>Reporter: Amir Langer
>Assignee: Amir Langer
> Attachments: BlockListOptimizationComparison.xlsx, HDFS-6658.patch, 
> Namenode Memory Optimizations - Block replicas list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180667#comment-14180667
 ] 

Hadoop QA commented on HDFS-7277:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676439/HDFS-7277.000.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8485//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8485//console

This message is automatically generated.

> Remove explicit dependency on netty 3.2 in BKJournal
> 
>
> Key: HDFS-7277
> URL: https://issues.apache.org/jira/browse/HDFS-7277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-7277.000.patch
>
>
> The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
> code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7232) Populate hostname in httpfs audit log

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180670#comment-14180670
 ] 

Hadoop QA commented on HDFS-7232:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675587/HDFS-7232.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-httpfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8484//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8484//console

This message is automatically generated.

> Populate hostname in httpfs audit log
> -
>
> Key: HDFS-7232
> URL: https://issues.apache.org/jira/browse/HDFS-7232
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Zoran Dimitrijevic
>Assignee: Zoran Dimitrijevic
>Priority: Trivial
> Attachments: HDFS-7232.patch
>
>
> Currently httpfs audit logs do not log the request's IP address. Since they 
> use 
> hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties 
> which already contains hostname, it would be nice to add code to populate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180626#comment-14180626
 ] 

Yongjun Zhang commented on HDFS-7235:
-

Hi [~cmccabe], Thanks a lot for the side discussion and comment. I will look 
into.


> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7277:
-
Status: Patch Available  (was: Open)

> Remove explicit dependency on netty 3.2 in BKJournal
> 
>
> Key: HDFS-7277
> URL: https://issues.apache.org/jira/browse/HDFS-7277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-7277.000.patch
>
>
> The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
> code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7277:
-
Attachment: HDFS-7277.000.patch

> Remove explicit dependency on netty 3.2 in BKJournal
> 
>
> Key: HDFS-7277
> URL: https://issues.apache.org/jira/browse/HDFS-7277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-7277.000.patch
>
>
> The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
> code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-7277:


 Summary: Remove explicit dependency on netty 3.2 in BKJournal
 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor


The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2486) Review issues with UnderReplicatedBlocks

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180587#comment-14180587
 ] 

Hudson commented on HDFS-2486:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6317 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6317/])
Move HDFS-2486 down to 2.7.0 in CHANGES.txt (wang: rev 
08457e9e57e4fa3c83217fd0a092e926ba7eb135)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Review issues with UnderReplicatedBlocks
> 
>
> Key: HDFS-2486
> URL: https://issues.apache.org/jira/browse/HDFS-2486
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Steve Loughran
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-2486.patch
>
>
> Here are some things I've noted in the UnderReplicatedBlocks class that 
> someone else should review and consider if the code is correct. If not, they 
> are easy to fix.
> remove(Block block, int priLevel) is not synchronized, and as the inner 
> classes are not, there is a risk of race conditions there.
> some of the code assumes that getPriority can return the value LEVEL, and if 
> so does not attempt to queue the blocks. As this return value is not 
> currently possible, those checks can be removed. 
> The queue gives priority to blocks whose replication count is less than a 
> third of its expected count over those that are "normally under replicated". 
> While this is good for ensuring that files scheduled for large replication 
> are replicated fast, it may not be the best strategy for maintaining data 
> integrity. For that it may be better to give whichever blocks have only two 
> replicas priority over blocks that may, for example, already have 3 out of 10 
> copies in the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6824) Additional user documentation for HDFS encryption.

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180588#comment-14180588
 ] 

Hudson commented on HDFS-6824:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6317 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6317/])
HDFS-6824. Additional user documentation for HDFS encryption. (wang: rev 
a36399e09c8c92911df08f78a4b88528b6dd513f)
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/TransparentEncryption.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Additional user documentation for HDFS encryption.
> --
>
> Key: HDFS-6824
> URL: https://issues.apache.org/jira/browse/HDFS-6824
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: TransparentEncryption.html, hdfs-6824.001.patch, 
> hdfs-6824.002.patch
>
>
> We'd like to better document additional things about HDFS encryption: setup 
> and configuration, using alternate access methods (namely WebHDFS and 
> HttpFS), other misc improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6824) Additional user documentation for HDFS encryption.

2014-10-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6824:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Thanks Yi, I committed this to branch-2 and trunk.

> Additional user documentation for HDFS encryption.
> --
>
> Key: HDFS-6824
> URL: https://issues.apache.org/jira/browse/HDFS-6824
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: TransparentEncryption.html, hdfs-6824.001.patch, 
> hdfs-6824.002.patch
>
>
> We'd like to better document additional things about HDFS encryption: setup 
> and configuration, using alternate access methods (namely WebHDFS and 
> HttpFS), other misc improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6888) Remove audit logging of getFIleInfo()

2014-10-22 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6888:
--
Attachment: HDFS-6888-6.patch

update patch against trunk.TestBalancer and TestFailureToReadEdits work fine on 
my machine. TestPipelinesFailover failure is because of HDFS-6694

> Remove audit logging of getFIleInfo()
> -
>
> Key: HDFS-6888
> URL: https://issues.apache.org/jira/browse/HDFS-6888
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Kihwal Lee
>Assignee: Chen He
>  Labels: log
> Attachments: HDFS-6888-2.patch, HDFS-6888-3.patch, HDFS-6888-4.patch, 
> HDFS-6888-5.patch, HDFS-6888-6.patch, HDFS-6888.patch
>
>
> The audit logging of getFileInfo() was added in HDFS-3733.  Since this is a 
> one of the most called method, users have noticed that audit log is now 
> filled with this.  Since we now have HTTP request logging, this seems 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6694) TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms

2014-10-22 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180574#comment-14180574
 ] 

Yongjun Zhang commented on HDFS-6694:
-

HI [~airbots],

Thanks for reporting the issue you ran into. Would you please look into your 
log to see if there are "Too many open files" kind of messages?


> TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
> with various symptoms
> 
>
> Key: HDFS-6694
> URL: https://issues.apache.org/jira/browse/HDFS-6694
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, 
> HDFS-6694.001.dbg.patch, HDFS-6694.002.dbg.patch, 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover-output.txt, 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.txt
>
>
> TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
> with various symptoms. Typical failures are described in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-2486) Review issues with UnderReplicatedBlocks

2014-10-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-2486:
--
Fix Version/s: (was: 3.0.0)
   2.7.0

I merged this down to branch-2 to make a cherry-pick cleaner.

> Review issues with UnderReplicatedBlocks
> 
>
> Key: HDFS-2486
> URL: https://issues.apache.org/jira/browse/HDFS-2486
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Steve Loughran
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-2486.patch
>
>
> Here are some things I've noted in the UnderReplicatedBlocks class that 
> someone else should review and consider if the code is correct. If not, they 
> are easy to fix.
> remove(Block block, int priLevel) is not synchronized, and as the inner 
> classes are not, there is a risk of race conditions there.
> some of the code assumes that getPriority can return the value LEVEL, and if 
> so does not attempt to queue the blocks. As this return value is not 
> currently possible, those checks can be removed. 
> The queue gives priority to blocks whose replication count is less than a 
> third of its expected count over those that are "normally under replicated". 
> While this is good for ensuring that files scheduled for large replication 
> are replicated fast, it may not be the best strategy for maintaining data 
> integrity. For that it may be better to give whichever blocks have only two 
> replicas priority over blocks that may, for example, already have 3 out of 10 
> copies in the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-5928:
--
Attachment: HDFS-5928.v5.patch

> show namespace and namenode ID on NN dfshealth page
> ---
>
> Key: HDFS-5928
> URL: https://issues.apache.org/jira/browse/HDFS-5928
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
> HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6694) TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms

2014-10-22 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180563#comment-14180563
 ] 

Chen He commented on HDFS-6694:
---

the one I got is: 

java.lang.RuntimeException: Deferred
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:130)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestContext.waitFor(MultithreadedTestUtil.java:121)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testPipelineRecoveryStress(TestPipelinesFailover.java:485)
Caused by: java.lang.AssertionError: expected:<100> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:123)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover$PipelineTestThread.doAnAction(TestPipelinesFailover.java:522)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)


Results :

Tests in error: 
  TestPipelinesFailover.testPipelineRecoveryStress:485 » Runtime Deferred

> TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
> with various symptoms
> 
>
> Key: HDFS-6694
> URL: https://issues.apache.org/jira/browse/HDFS-6694
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, 
> HDFS-6694.001.dbg.patch, HDFS-6694.002.dbg.patch, 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover-output.txt, 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.txt
>
>
> TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
> with various symptoms. Typical failures are described in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180541#comment-14180541
 ] 

Haohui Mai commented on HDFS-5928:
--

The key idea is to ensure {{HAInfo}} is null in non-HA clusters. You might need 
some slight tweaks to make it work in all cases, but I think you get the idea.

> show namespace and namenode ID on NN dfshealth page
> ---
>
> Key: HDFS-5928
> URL: https://issues.apache.org/jira/browse/HDFS-5928
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
> HDFS-5928.v4.patch, HDFS-5928.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.

2014-10-22 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180536#comment-14180536
 ] 

Lei (Eddy) Xu commented on HDFS-6877:
-

Thank you for checking in this! [~cmccabe]

> Avoid calling checkDisk when an HDFS volume is removed during a write.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
> HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
> HDFS-6877.007.patch
>
>
> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
> volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id

2014-10-22 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6663:
--
Attachment: HDFS-6663-5.patch

Decommission status of a block contains more details. It will show a block is 
decomissioning or decomissioned.

> Admin command to track file and locations from block id
> ---
>
> Key: HDFS-6663
> URL: https://issues.apache.org/jira/browse/HDFS-6663
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.5.0
>Reporter: Kihwal Lee
>Assignee: Chen He
> Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, 
> HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch
>
>
> A dfsadmin command that allows finding out the file and the locations given a 
> block number will be very useful in debugging production issues.   It may be 
> possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-22 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180524#comment-14180524
 ] 

Siqi Li commented on HDFS-5928:
---

I don't think this is going to work, if the cluster doesn't have HA or 
federation. Also, it's good to let people know what is namespace and what 
namenodeID

> show namespace and namenode ID on NN dfshealth page
> ---
>
> Key: HDFS-5928
> URL: https://issues.apache.org/jira/browse/HDFS-5928
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
> HDFS-5928.v4.patch, HDFS-5928.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180505#comment-14180505
 ] 

Hudson commented on HDFS-6877:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6315 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6315/])
HDFS-6877. Avoid calling checkDisk when an HDFS volume is removed during a 
write. (Lei Xu via Colin P. McCabe) (cmccabe: rev 
7b0f9bb2583cd9b7274f1e31c173c1c6a7ce467b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java


> Avoid calling checkDisk when an HDFS volume is removed during a write.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
> HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
> HDFS-6877.007.patch
>
>
> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
> volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7257) Add the time of last HA state transition to NN's /jmx page

2014-10-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180499#comment-14180499
 ] 

Andrew Wang commented on HDFS-7257:
---

I don't think there are any timezone concerns, considering that the timezone is 
shown as part of the string. However, if you'd prefer that it's not included, 
I'm okay with that. I agree that it can just be converted for usage on the 
webUI.

A final note, it'd also be better to use a standardized date format like ISO 
8601 rather than creating a new one: http://en.wikipedia.org/wiki/ISO_8601

> Add the time of last HA state transition to NN's /jmx page
> --
>
> Key: HDFS-7257
> URL: https://issues.apache.org/jira/browse/HDFS-7257
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7257.001.patch, HDFS-7257.002.patch, 
> HDFS-7257.003.patch
>
>
> It would be useful to some monitoring apps to expose the last HA transition 
> time in the NN's /jmx page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.

2014-10-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6877:
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0  (was: 3.0.0)
  Status: Resolved  (was: Patch Available)

> Avoid calling checkDisk when an HDFS volume is removed during a write.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
> HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
> HDFS-6877.007.patch
>
>
> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
> volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.

2014-10-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180497#comment-14180497
 ] 

Colin Patrick McCabe commented on HDFS-6877:


+1.  Thanks, Eddy.

TestDNFencing failure is HDFS-7226, not related.

> Avoid calling checkDisk when an HDFS volume is removed during a write.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
> HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
> HDFS-6877.007.patch
>
>
> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
> volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180490#comment-14180490
 ] 

Haohui Mai commented on HDFS-5928:
--

The code can be simplified in putting the relevant information in an object. 
For example:

{code}
{#HAInfo}
{namespace}-{nnid}
{/HAInfo}
{code}

In the javascript side:

{code}
var namespace = null, nnid = null;
// parse XML and set namespace and nnid
if (namespace && nnid) {
  HAInfo = {"namespace": namespace, "nnid": nnid}
}
{code}


> show namespace and namenode ID on NN dfshealth page
> ---
>
> Key: HDFS-5928
> URL: https://issues.apache.org/jira/browse/HDFS-5928
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
> HDFS-5928.v4.patch, HDFS-5928.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7254) Add documentation for hot swaping DataNode drives

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180486#comment-14180486
 ] 

Hudson commented on HDFS-7254:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6314 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6314/])
HDFS-7254. Add documentation for hot swaping DataNode drives (Lei Xu via Colin 
P. McCabe) (cmccabe: rev 66e8187ea1dbc6230ab2c633e4f609a7068b75db)
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Add documentation for hot swaping DataNode drives
> -
>
> Key: HDFS-7254
> URL: https://issues.apache.org/jira/browse/HDFS-7254
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
> HDFS-7254.002.patch, HDFS-7254.003.patch
>
>
> Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7254) Add documentation for hot swaping DataNode drives

2014-10-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7254:
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0  (was: 2.6.0)
  Status: Resolved  (was: Patch Available)

> Add documentation for hot swaping DataNode drives
> -
>
> Key: HDFS-7254
> URL: https://issues.apache.org/jira/browse/HDFS-7254
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
> HDFS-7254.002.patch, HDFS-7254.003.patch
>
>
> Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7254) Add documentation for hot swaping DataNode drives

2014-10-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7254:
---
Summary: Add documentation for hot swaping DataNode drives  (was: Add 
documents for hot swap drive)

> Add documentation for hot swaping DataNode drives
> -
>
> Key: HDFS-7254
> URL: https://issues.apache.org/jira/browse/HDFS-7254
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
> HDFS-7254.002.patch, HDFS-7254.003.patch
>
>
> Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7254) Add documents for hot swap drive

2014-10-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180476#comment-14180476
 ] 

Colin Patrick McCabe commented on HDFS-7254:


+1.  Thanks, Eddy.

Test failure is not related because this is only a docs change.

> Add documents for hot swap drive
> 
>
> Key: HDFS-7254
> URL: https://issues.apache.org/jira/browse/HDFS-7254
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
> HDFS-7254.002.patch, HDFS-7254.003.patch
>
>
> Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-22 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180474#comment-14180474
 ] 

Siqi Li commented on HDFS-5928:
---

[~wheat9] I have added the check for both namespace and namenodeID

> show namespace and namenode ID on NN dfshealth page
> ---
>
> Key: HDFS-5928
> URL: https://issues.apache.org/jira/browse/HDFS-5928
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
> HDFS-5928.v4.patch, HDFS-5928.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180466#comment-14180466
 ] 

Colin Patrick McCabe commented on HDFS-7235:


Hi Yongjun,

Thanks for your patience here.  I don't think the current patch is quite ready. 
 I could point to a few things, like this:  {{ReplicaInfo replicaInfo = 
(ReplicaInfo) data.getReplica(}}  We shouldn't be downcasting here.

I think the bigger issue is that the interface in FsDatasetSpi is just not very 
suitable to what we're trying to do.  Rather than trying to hack it, I think we 
should come up with a better interface.

I think we should replace {{FsDatasetSpi#isValid}} with this function:

{code}
  /**
   * Check if a block is valid.
   *
   * @param b   The block to check.
   * @param minLength   The minimum length that the block must have.  May be 0.
   * @param state   If this is null, it is ignored.  If it is non-null, we
   *will check that the replica has this state.
   *
   * @throws FileNotFoundException If the replica is not found or 
there 
   *  was an error locating it.
   * @throws EOFException  If the replica length is too 
short.
   * @throws UnexpectedReplicaStateException   If the replica is not in the 
   * expected state.
   */
  public void checkBlock(ExtendedBlock b, long minLength, ReplicaState state);
{code}

Since this function will throw a clearly marked exception detailing which case 
we're in, we won't have to call multiple functions.  This will be better for 
performance since we're only taking the lock once.  This will also be better 
for clarity, since the current APIs lead to some rather complex code.

We could also get rid of {{FsDatasetSpi#isValidRbw}}, since this function can 
do everything that it can.
Also UnexpectedReplicaStateException could be a new exception under 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/UnexpectedReplicaStateException.java

I think it's fine to change FsDatasetSpi for this (we did it when adding 
caching stuff, and again when adding "trash").

Let me know what you think.  I think it would make things a lot more clear.

> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180382#comment-14180382
 ] 

Hadoop QA commented on HDFS-7180:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676390/HDFS-7180.002.patch
  against trunk revision d67214f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8480//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8480//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8480//console

This message is automatically generated.

> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC

[jira] [Comment Edited] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk

2014-10-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177518#comment-14177518
 ] 

Colin Patrick McCabe edited comment on HDFS-7235 at 10/22/14 7:15 PM:
--

{code}
1787  ReplicaInfo replicaInfo = null;
1788  synchronized(data) {
1789replicaInfo = (ReplicaInfo) data.getReplica( 
block.getBlockPoolId(),
1790block.getBlockId());
1791  }
1792  if (replicaInfo != null 
1793  && replicaInfo.getState() == ReplicaState.FINALIZED 
1794  && !replicaInfo.getBlockFile().exists()) {
{code}
You can't release the lock this way.  Once you release the lock, replicaInfo 
could be mutated at any time.  So you need to do all the check under the lock.

{code}
1795//
1796// Report back to NN bad block caused by non-existent block 
file.
1797// WATCH-OUT: be sure the conditions checked above matches the 
following
1798// method in FsDatasetImpl.java:
1799//   boolean isValidBlock(ExtendedBlock b)
1800// all other conditions need to be true except that 
1801// replicaInfo.getBlockFile().exists() returns false.
1802//
{code}
I don't think we need the "WATCH-OUT" part.  We shouldn't be calling 
{{isValidBlock}}, so why do we care if the check is the same as that check?

I generally agree with this approach and I think we can get this in if that's 
fixed.


was (Author: cmccabe):
{code}
1787  ReplicaInfo replicaInfo = null;
1788  synchronized(data) {
1789replicaInfo = (ReplicaInfo) data.getReplica( 
block.getBlockPoolId(),
1790block.getBlockId());
1791  }
1792  if (replicaInfo != null 
1793  && replicaInfo.getState() == ReplicaState.FINALIZED 
1794  && !replicaInfo.getBlockFile().exists()) {
{code}
You can't release the lock this way.  Once you release the lock, replicaInfo 
could be mutated at any time.  So you need to do all the check under the lock.

{code}
1795//
1796// Report back to NN bad block caused by non-existent block 
file.
1797// WATCH-OUT: be sure the conditions checked above matches the 
following
1798// method in FsDatasetImpl.java:
1799//   boolean isValidBlock(ExtendedBlock b)
1800// all other conditions need to be true except that 
1801// replicaInfo.getBlockFile().exists() returns false.
1802//
{code}
I don't think we need the "WATCH-OUT" part.  We're not calling 
{{isValidBlock}}, so why do we care if the check is the same as that check?

I generally agree with this approach and I think we can get this in if that's 
fixed.

> Can not decommission DN which has invalid block due to bad disk
> ---
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7231) rollingupgrade needs some guard rails

2014-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180312#comment-14180312
 ] 

Suresh Srinivas commented on HDFS-7231:
---

Allen, I just rewrote the steps with additional details to clarify:
# Upgrade 2.0.5 cluster to 2.2
# Do not -finalizeUpgrade
# Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1.
# Start namenode -upgrade option.
# Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress
# Leave 2.4.1 DNs running
# Install binaries on NN to 2.2
# Start NN on 2.2 with no upgrade related options

So far things are clear. Then you go on to say, the following:
bq. DNs now do a partial roll-forward, rendering them unable to continue
What do you mean by this?

bq. admins manually repair version files on those broken directories
This is as you know is a recipe for disaster.

Let me ask you a question. Before you go on to 2.4.1, if you do finalize of 
upgrade what happens?

> rollingupgrade needs some guard rails
> -
>
> Key: HDFS-7231
> URL: https://issues.apache.org/jira/browse/HDFS-7231
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180307#comment-14180307
 ] 

Brandon Li commented on HDFS-7180:
--

The unit test seems tricky to add. I did some file uploading tests to see the 
pending non-sequencial writes were under control. 

> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-196

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180305#comment-14180305
 ] 

Brandon Li commented on HDFS-7180:
--

Nice catch, Jing.
I've uploaded a new patch. It lets dumper notify waiting threads even when 
error happens. I also did some code cleanup.


> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block

[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7180:
-
Attachment: HDFS-7180.002.patch

> NFSv3 gateway frequently gets stuck
> ---
>
> Key: HDFS-7180
> URL: https://issues.apache.org/jira/browse/HDFS-7180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.5.0
> Environment: Linux, Fedora 19 x86-64
>Reporter: Eric Zhiqiang Ma
>Assignee: Brandon Li
>Priority: Critical
> Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
> java.io.IOException: Bad response ERROR for block 
> BP-1960069741-10.0.3.170-1410430543652:blk

[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-22 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180290#comment-14180290
 ] 

Yongjun Zhang commented on HDFS-7226:
-

Thanks a lot [~jingzhao]! Hopefully the next hdfs build will be clean.



> TestDNFencing.testQueueingWithAppend failed often in latest test
> 
>
> Key: HDFS-7226
> URL: https://issues.apache.org/jira/browse/HDFS-7226
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.7.0
>
> Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, 
> HDFS-7226.003.patch
>
>
> Using tool from HADOOP-11045, got the following report:
> {code}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
> PreCommit-HDFS-Build -n 1 
> Recently FAILED builds in url: 
> https://builds.apache.org//job/PreCommit-HDFS-Build
> THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
> as listed below:
> ..
> Among 9 runs examined, all failed tests <#failedRuns: testName>:
> 7: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
> 6: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
> 3: 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
> 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
> 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
> ..
> {code}
> TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
> Creating this jira for TestDNFencing.testQueueingWithAppend.
> Symptom:
> {code}
> Failed
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
> Failing for the past 1 build (Since Failed#8390 )
> Took 2.9 sec.
> Error Message
> expected:<18> but was:<12>
> Stacktrace
> java.lang.AssertionError: expected:<18> but was:<12>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180239#comment-14180239
 ] 

Hudson commented on HDFS-7228:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6311 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6311/])
HDFS-7228. Fix TestDNFencing.testQueueingWithAppend. Contributed by Yongjun 
Zhang. (jing9: rev 1c8d191117de3d2e035bd728bccfde0f4b81296f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Add an SSD policy into the default BlockStoragePolicySuite
> --
>
> Key: HDFS-7228
> URL: https://issues.apache.org/jira/browse/HDFS-7228
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.6.0
>
> Attachments: HDFS-7228.000.patch, HDFS-7228.001.patch, 
> HDFS-7228.002.patch, HDFS-7228.003.patch, HDFS-7228.003.patch
>
>
> Currently in the default BlockStoragePolicySuite, we've defined 4 storage 
> policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined 
> the SSD storage type, it will be useful to also include a SSD related storage 
> policy in the default suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7226:

   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the fix, Yongjun! I've committed this to trunk and branch-2.

> TestDNFencing.testQueueingWithAppend failed often in latest test
> 
>
> Key: HDFS-7226
> URL: https://issues.apache.org/jira/browse/HDFS-7226
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.7.0
>
> Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, 
> HDFS-7226.003.patch
>
>
> Using tool from HADOOP-11045, got the following report:
> {code}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
> PreCommit-HDFS-Build -n 1 
> Recently FAILED builds in url: 
> https://builds.apache.org//job/PreCommit-HDFS-Build
> THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
> as listed below:
> ..
> Among 9 runs examined, all failed tests <#failedRuns: testName>:
> 7: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
> 6: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
> 3: 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
> 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
> 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
> ..
> {code}
> TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
> Creating this jira for TestDNFencing.testQueueingWithAppend.
> Symptom:
> {code}
> Failed
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
> Failing for the past 1 build (Since Failed#8390 )
> Took 2.9 sec.
> Error Message
> expected:<18> but was:<12>
> Stacktrace
> java.lang.AssertionError: expected:<18> but was:<12>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()

2014-10-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180209#comment-14180209
 ] 

Ted Yu commented on HDFS-6291:
--

With image.close() in finally block, the catch block doesn't need to call it, 
right ?

> FSImage may be left unclosed in BootstrapStandby#doRun()
> 
>
> Key: HDFS-6291
> URL: https://issues.apache.org/jira/browse/HDFS-6291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ted Yu
>Priority: Minor
> Attachments: HDFS-6291.patch
>
>
> At around line 203:
> {code}
>   if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) {
> return ERR_CODE_LOGS_UNAVAILABLE;
>   }
> {code}
> If we return following the above check, image is not closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180013#comment-14180013
 ] 

Hudson commented on HDFS-7259:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. 
Contributed by Brandon Li (brandonli: rev 
b6f9d5538cf2b425652687e99503f3d566b2056a)
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java


> Unresponseive NFS mount point due to deferred COMMIT response
> -
>
> Key: HDFS-7259
> URL: https://issues.apache.org/jira/browse/HDFS-7259
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Fix For: 2.6.0
>
> Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch
>
>
> Since the gateway can't commit random write, it caches the COMMIT requests in 
> a queue and send back response only when the data can be committed or stream 
> timeout (failure in the latter case). This could cause problems two patterns:
> (1) file uploading failure 
> (2) the mount dir is stuck on the same client, but other NFS clients can 
> still access NFS gateway.
> The error pattern (2) is because there are too many COMMIT requests pending, 
> so the NFS client can't send any other requests(e.g., for "ls") to NFS 
> gateway with its pending requests limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6581) Write to single replica in memory

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180007#comment-14180007
 ] 

Hudson commented on HDFS-6581:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
Updated CHANGES.txt for HDFS-6581 merge into branch-2.6. (jitendra: rev 
b85919feef64ed8b05b84ab8c372844a815cc139)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs-client, namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: 2.6.0
>
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, 
> HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, 
> HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180006#comment-14180006
 ] 

Hudson commented on HDFS-7215:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li 
(brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea)
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java


> Add JvmPauseMonitor to NFS gateway
> --
>
> Key: HDFS-7215
> URL: https://issues.apache.org/jira/browse/HDFS-7215
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Brandon Li
>Assignee: Brandon Li
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: HDFS-7215.001.patch
>
>
> Like NN/DN, a GC log would help debug issues in NFS gateway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180014#comment-14180014
 ] 

Hudson commented on HDFS-7221:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7221. TestDNFencingWithReplication fails consistently. Contributed by 
Charles Lamb. (wang: rev ac56b0637e55465d3b7f7719c8689bff2a572dc0)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HAStressTestHarness.java


> TestDNFencingWithReplication fails consistently
> ---
>
> Key: HDFS-7221
> URL: https://issues.apache.org/jira/browse/HDFS-7221
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, 
> HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch
>
>
> TestDNFencingWithReplication consistently fails with a timeout, both in 
> jenkins runs and on my local machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180008#comment-14180008
 ] 

Hudson commented on HDFS-7204:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 
4baca311ffb5489fbbe08288502db68875834920)
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh


> balancer doesn't run as a daemon
> 
>
> Key: HDFS-7204
> URL: https://issues.apache.org/jira/browse/HDFS-7204
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HDFS-7204-01.patch, HDFS-7204.patch
>
>
> From HDFS-7184, minor issues with balancer:
> * daemon isn't set to true in hdfs to enable daemonization
> * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179922#comment-14179922
 ] 

Hudson commented on HDFS-7259:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/])
HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. 
Contributed by Brandon Li (brandonli: rev 
b6f9d5538cf2b425652687e99503f3d566b2056a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java


> Unresponseive NFS mount point due to deferred COMMIT response
> -
>
> Key: HDFS-7259
> URL: https://issues.apache.org/jira/browse/HDFS-7259
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Fix For: 2.6.0
>
> Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch
>
>
> Since the gateway can't commit random write, it caches the COMMIT requests in 
> a queue and send back response only when the data can be committed or stream 
> timeout (failure in the latter case). This could cause problems two patterns:
> (1) file uploading failure 
> (2) the mount dir is stuck on the same client, but other NFS clients can 
> still access NFS gateway.
> The error pattern (2) is because there are too many COMMIT requests pending, 
> so the NFS client can't send any other requests(e.g., for "ls") to NFS 
> gateway with its pending requests limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway

2014-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179914#comment-14179914
 ] 

Hudson commented on HDFS-7215:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/])
HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li 
(brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea)
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm


> Add JvmPauseMonitor to NFS gateway
> --
>
> Key: HDFS-7215
> URL: https://issues.apache.org/jira/browse/HDFS-7215
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Brandon Li
>Assignee: Brandon Li
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: HDFS-7215.001.patch
>
>
> Like NN/DN, a GC log would help debug issues in NFS gateway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >