Hello everybody
I've run into this strange problem. We run a 6 RS cluster and suddenly
the client application started reporting errors, region not online. In
the web console all regionserver appeared up. I've run hbck and got
strange results
Number of Tables: 2
Number of live region servers: 6
Number of dead region servers: 12
Cluster was in inconsistent state. With hbase shell status 'detailed' I
got the dead machines
12 dead servers
search-hadoop-eu006.v300.gmx.net,60020,1305025929461
search-hadoop-eu002.v300.gmx.net,60020,1305019508570
search-hadoop-eu004.v300.gmx.net,60020,1305019551236
search-hadoop-eu003.v300.gmx.net,60020,1305025688666
search-hadoop-eu005.v300.gmx.net,60020,1305025841017
search-hadoop-eu006.v300.gmx.net,60020,1306156842070
search-hadoop-eu005.v300.gmx.net,60020,1305019568146
search-hadoop-eu001.v300.gmx.net,60020,1305025543786
search-hadoop-eu004.v300.gmx.net,60020,1305025761173
search-hadoop-eu002.v300.gmx.net,60020,1305025611163
search-hadoop-eu006.v300.gmx.net,60020,1305019572576
search-hadoop-eu003.v300.gmx.net,60020,1305019547053
It appears that all live regionserver are listed as dead also. I tried
hbck -fix and the cluster is now in Ok state but still reports 12
machines dead as above.
I've checked the logs but nothing obvious. Any idea? We use CDH3u0.
Thanks
Daniel