Have you examined region server logs (for the servers with bas performance) to see if there was some clue ?
Taking a few jstack's may also help reveal something. BTW 0.98.11 has been released. You may want to consider upgrading. Cheers On Mon, Mar 16, 2015 at 6:40 AM, Dejan Menges <[email protected]> wrote: > Hi All, > > We have a strange issue with HBase performance (overall cluster > performance) in case one of datanodes in the cluster unexpectedly goes > down. > > So scenario is like follows: > - Cluster works fine, it's stable. > - One DataNode unexpectedly goes down (PSU issue, network issue, anything) > - Whole HBase cluster goes down (performance becomes so bad that we have to > restart all RegionServers to get it back to life). > > Most funny and latest issue that happened was that we added new node to the > cluster (having 8 x 4T SATA disks) and we left just DataNode running on it > to give it couple of days to get some data. At some point in time, due to > hardware issue, server rebooted (twice during three hours) in moment when > it had maybe 5% of data it would have in a couple of days. Nothing else > beside DataNode was running, and once it went down, it affected literary > everything, and restarting RegionServers in the end fixed it. > > We are using HBase 0.98.0 with Hadoop 2.4.0 >
