[ https://issues.apache.org/jira/browse/HDFS-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079048#comment-14079048 ]
Liang Xie commented on HDFS-289: -------------------------------- We just hit a HBase write performance degradation several days ago, the root cause turns out is the slow network to/from special datanode due to switch buffer problem. I am now interesting on implement a simple heuristics excluding DN feature inside DFSOutputStream. will put more here later:) > HDFS should blacklist datanodes that are not performing well > ------------------------------------------------------------ > > Key: HDFS-289 > URL: https://issues.apache.org/jira/browse/HDFS-289 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: dhruba borthakur > > On a large cluster, a few datanodes could be under-performing. There were > cases when the network connectivity of a few of these bad datanodes were > degraded, resulting in long long times (in the order of two hours) to > transfer blocks to and from these datanodes. > A similar issue arises when disks a single disk on a datanode fail or change > to read-only mode: in this case the entire datanode shuts down. > HDFS should detect and handle network and disk performance degradation more > gracefully. One option would be to blacklist these datanodes, de-prioritise > their use and alert the administrator. -- This message was sent by Atlassian JIRA (v6.2#6252)