[ 
https://issues.apache.org/jira/browse/HDFS-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079048#comment-14079048
 ] 

Liang Xie commented on HDFS-289:
--------------------------------

We just hit a HBase write performance degradation several days ago, the root 
cause turns out is the slow network to/from special datanode due to switch 
buffer problem. I am now interesting on implement a simple heuristics excluding 
DN feature inside DFSOutputStream. will put more here later:)

> HDFS should blacklist datanodes that are not performing well
> ------------------------------------------------------------
>
>                 Key: HDFS-289
>                 URL: https://issues.apache.org/jira/browse/HDFS-289
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>
> On a large cluster, a few datanodes could be under-performing. There were 
> cases when the network connectivity of a few of these bad datanodes were 
> degraded, resulting in long long times (in the order of two hours) to 
> transfer blocks to and from these datanodes.  
> A similar issue arises when disks a single disk on a datanode fail or change 
> to read-only mode: in this case the entire datanode shuts down. 
> HDFS should detect and handle network and disk performance degradation more 
> gracefully. One option would be to blacklist these datanodes, de-prioritise 
> their use and alert the administrator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to