HMaster and HRegionServer going down

Vimal Jain Tue, 04 Jun 2013 23:30:21 -0700

Hi,
I have set up Hbase in pseudo-distributed mode.
It was working fine for 6 days , but suddenly today morning both HMaster
and Hregion process went down.
I checked in logs of both hadoop and hbase.
Please help here.
Here are the snippets :-


*Datanode logs:*
2013-06-05 05:12:51,436 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_1597245478875608321_2818 java.io.EOFException: while trying
to read 2347 bytes
2013-06-05 05:12:51,442 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_1597245478875608321_2818 received exception java.io.EOFException: while
trying to read 2347 bytes
2013-06-05 05:12:51,442 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:50010,
storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 2347 bytes


*HRegion logs:*
2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4694929ms instead of 3000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:51,045 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_1597245478875608321_2818java.net.SocketTimeoutException: 63000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.20.30:44333remote=/
192.168.20.30:50010]
2013-06-05 05:12:51,046 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
11695345ms instead of 10000000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:51,048 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_1597245478875608321_2818 bad datanode[0]
192.168.20.30:50010
2013-06-05 05:12:51,075 WARN org.apache.hadoop.hdfs.DFSClient: Error while
syncing
java.io.IOException: All datanodes 192.168.20.30:50010 are bad. Aborting...
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096)
2013-06-05 05:12:51,110 FATAL
org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting
close of hlog
java.io.IOException: Reflection
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.io.IOException: DFSOutputStream is closed
2013-06-05 05:12:51,180 FATAL
org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting
close of hlog
java.io.IOException: Reflection
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.io.IOException: DFSOutputStream is closed
2013-06-05 05:12:51,183 ERROR
org.apache.hadoop.hbase.regionserver.wal.HLog: Failed close of HLog writer
java.io.IOException: Reflection
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.io.IOException: DFSOutputStream is closed
2013-06-05 05:12:51,184 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
Riding over HLog close failure! error count=1
2013-06-05 05:12:52,557 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
hbase.rummycircle.com,60020,1369877672964:
regionserver:60020-0x13ef31264d00001 regionserver:60020-0x13ef31264d00001
received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
2013-06-05 05:12:52,557 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort:
loaded coprocessors are: []
2013-06-05 05:12:52,621 INFO
org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker
interrupted while waiting for task, exiting: java.lang.InterruptedException
java.io.InterruptedIOException: Aborting compaction of store cfp_info in
region event_data,244630,1369879570539.3ebddcd11a3c22585a690bf40911cb1e.
because user requested stop.
2013-06-05 05:12:53,425 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/rs/hbase.rummycircle.com
,60020,1369877672964
2013-06-05 05:12:55,426 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/rs/hbase.rummycircle.com
,60020,1369877672964
2013-06-05 05:12:59,427 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/rs/hbase.rummycircle.com
,60020,1369877672964
2013-06-05 05:13:07,427 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/rs/hbase.rummycircle.com
,60020,1369877672964
2013-06-05 05:13:07,427 ERROR
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete
failed after 3 retries
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/rs/hbase.rummycircle.com
,60020,1369877672964
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
2013-06-05 05:13:07,436 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
closing file /hbase/.logs/hbase.rummycircle.com,60020,1369877672964/
hbase.rummycircle.com%2C60020%2C1369877672964.1370382721642 :
java.io.IOException: All datanodes 192.168.20.30:50010 are bad. Aborting...
java.io.IOException: All datanodes 192.168.20.30:50010 are bad. Aborting...
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096)


*HMaster logs:*
2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4702394ms instead of 10000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4988731ms instead of 300000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4988726ms instead of 300000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4698291ms instead of 10000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:50,711 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4694502ms instead of 1000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:50,714 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4694492ms instead of 1000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:50,715 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
4695589ms instead of 60000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-06-05 05:12:52,263 FATAL org.apache.hadoop.hbase.master.HMaster:
Master server abort: loaded coprocessors are: []
2013-06-05 05:12:52,465 INFO org.apache.hadoop.hbase.master.ServerManager:
Waiting for region servers count to settle; currently checked in 1, slept
for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500
ms, interval of 1500 ms.
2013-06-05 05:12:52,561 ERROR org.apache.hadoop.hbase.master.HMaster:
Region server hbase.rummycircle.com,60020,1369877672964 reported a fatal
error:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
2013-06-05 05:12:53,970 INFO org.apache.hadoop.hbase.master.ServerManager:
Waiting for region servers count to settle; currently checked in 1, slept
for 1506 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500
ms, interval of 1500 ms.
2013-06-05 05:12:55,476 INFO org.apache.hadoop.hbase.master.ServerManager:
Waiting for region servers count to settle; currently checked in 1, slept
for 3012 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500
ms, interval of 1500 ms.
2013-06-05 05:12:56,981 INFO org.apache.hadoop.hbase.master.ServerManager:
Finished waiting for region servers count to settle; checked in 1, slept
for 4517 ms, expecting minimum of 1, maximum of 2147483647, master is
running.
2013-06-05 05:12:57,019 INFO
org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of
-ROOT-,,0 at address=hbase.rummycircle.com,60020,1369877672964;
java.io.EOFException
2013-06-05 05:17:52,302 WARN
org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs
in [hdfs://
192.168.20.30:9000/hbase/.logs/hbase.rummycircle.com,60020,1369877672964-splitting]
installed = 19 but only 0 done
2013-06-05 05:17:52,321 FATAL org.apache.hadoop.hbase.master.HMaster:
master:60000-0x13ef31264d00000 master:60000-0x13ef31264d00000 received
expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
java.io.IOException: Giving up after tries=1
Caused by: java.lang.InterruptedException: sleep interrupted
2013-06-05 05:17:52,381 ERROR
org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted



-- 
Thanks and Regards,
Vimal Jain

HMaster and HRegionServer going down

Reply via email to