Can you post some redacted log files from the period after the data node
failed, up to the restart?

-- 
Sean
On Mar 16, 2015 8:41 AM, "Dejan Menges" <[email protected]> wrote:

> Hi All,
>
> We have a strange issue with HBase performance (overall cluster
> performance) in case one of datanodes in the cluster unexpectedly goes
> down.
>
> So scenario is like follows:
> - Cluster works fine, it's stable.
> - One DataNode unexpectedly goes down (PSU issue, network issue, anything)
> - Whole HBase cluster goes down (performance becomes so bad that we have to
> restart all RegionServers to get it back to life).
>
> Most funny and latest issue that happened was that we added new node to the
> cluster (having 8 x 4T SATA disks) and we left just DataNode running on it
> to give it couple of days to get some data. At some point in time, due to
> hardware issue, server rebooted (twice during three hours) in moment when
> it had maybe 5% of data it would have in a couple of days. Nothing else
> beside DataNode was running, and once it went down, it affected literary
> everything, and restarting RegionServers in the end fixed it.
>
> We are using HBase 0.98.0 with Hadoop 2.4.0
>

Reply via email to