[ https://issues.apache.org/jira/browse/ACCUMULO-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051850#comment-14051850 ]
Sean Busbey commented on ACCUMULO-2976: --------------------------------------- also it should be relatively easy to roll this into a "maintenance mode" for the existing "take this node offline" admin stuff. > blacklist problematic tservers > ------------------------------ > > Key: ACCUMULO-2976 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2976 > Project: Accumulo > Issue Type: Improvement > Components: master > Reporter: Sean Busbey > Priority: Minor > > It would be nice if the master kept track of tservers that misbehave and > eventually blacklisted them, similar to how HDFS handles datanodes and > MapReduce/YARN handle trackers. > Right now the closest we do is having the Master killing the zoolock for > tservers that are behaving poorly. This causes them to exit if they're not in > a zombie state. > On deployments with a watchdog that relaunches failed processes, this doesn't > help much because the tserver comes back. In the case of i.e. flakey network > failures for the node this just means repeating the process and impacting > cluster performance while the master works out that it should kill the node > again. -- This message was sent by Atlassian JIRA (v6.2#6252)