Have you looked at NameNode log ? The snippet you posted seems to imply issue with data block placement.
Cheers On Jun 5, 2013, at 4:12 AM, Vimal Jain <[email protected]> wrote: > I am running Hbase in pseudo distributed mode . So there is only one > machine involved. > I am using Hadoop version - 1.1.2 , Hbase version - 0.94.7 > > > On Wed, Jun 5, 2013 at 4:38 PM, Ted Yu <[email protected]> wrote: > >> How many region servers / data nodes do you have ? >> >> What Hadoop / HBase version are you using ? >> >> Thanks >> >> On Jun 5, 2013, at 3:54 AM, Vimal Jain <[email protected]> wrote: >> >>> Yes.I did check those. >>> But i am not sure if those parameter setting is the issue , as there are >>> some other exceptions in logs ( "DFSOutputStream ResponseProcessor >>> exception " etc . ) >>> >>> >>> On Wed, Jun 5, 2013 at 4:19 PM, Ted Yu <[email protected]> wrote: >>> >>>> There are a few tips under : >>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>> >>>> Can you check ? >>>> >>>> Thanks >>>> >>>> On Jun 5, 2013, at 2:05 AM, Vimal Jain <[email protected]> wrote: >>>> >>>>> I don't think so , as i dont find any issues in data node logs. >>>>> Also there are lot of exceptions like "session expired" , "slept more >>>> than >>>>> configured time" . what are these ? >>>>> >>>>> >>>>> On Wed, Jun 5, 2013 at 2:27 PM, Azuryy Yu <[email protected]> wrote: >>>>> >>>>>> Because your data node 192.168.20.30 broke down. which leads to RS >> down. >>>>>> >>>>>> >>>>>> On Wed, Jun 5, 2013 at 3:19 PM, Vimal Jain <[email protected]> wrote: >>>>>> >>>>>>> Here is the complete log: >>>>>>> >>>>>>> http://bin.cakephp.org/saved/103001 - Hregion >>>>>>> http://bin.cakephp.org/saved/103000 - Hmaster >>>>>>> http://bin.cakephp.org/saved/103002 - Datanode >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 5, 2013 at 11:58 AM, Vimal Jain <[email protected]> >> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I have set up Hbase in pseudo-distributed mode. >>>>>>>> It was working fine for 6 days , but suddenly today morning both >>>>>> HMaster >>>>>>>> and Hregion process went down. >>>>>>>> I checked in logs of both hadoop and hbase. >>>>>>>> Please help here. >>>>>>>> Here are the snippets :- >>>>>>>> >>>>>>>> *Datanode logs:* >>>>>>>> 2013-06-05 05:12:51,436 INFO >>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in >>>>>>> receiveBlock >>>>>>>> for block blk_1597245478875608321_2818 java.io.EOFException: while >>>>>> trying >>>>>>>> to read 2347 bytes >>>>>>>> 2013-06-05 05:12:51,442 INFO >>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock >>>>>>>> blk_1597245478875608321_2818 received exception >> java.io.EOFException: >>>>>>> while >>>>>>>> trying to read 2347 bytes >>>>>>>> 2013-06-05 05:12:51,442 ERROR >>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration( >>>>>>>> 192.168.20.30:50010, >>>>>>>> storageID=DS-1816106352-192.168.20.30-50010-1369314076237, >>>>>>> infoPort=50075, >>>>>>>> ipcPort=50020):DataXceiver >>>>>>>> java.io.EOFException: while trying to read 2347 bytes >>>>>>>> >>>>>>>> >>>>>>>> *HRegion logs:* >>>>>>>> 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4694929ms instead of 3000ms, this is likely due to a long >>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:51,045 WARN org.apache.hadoop.hdfs.DFSClient: >>>>>>>> DFSOutputStream ResponseProcessor exception for block >>>>>>>> blk_1597245478875608321_2818java.net.SocketTimeoutException: 63000 >>>>>> millis >>>>>>>> timeout while waiting for channel to be ready for read. ch : >>>>>>>> java.nio.channels.SocketChannel[connected local=/ >> 192.168.20.30:44333 >>>>>>> remote=/ >>>>>>>> 192.168.20.30:50010] >>>>>>>> 2013-06-05 05:12:51,046 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 11695345ms instead of 10000000ms, this is likely due to a long >>>>>>>> garbage collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:51,048 WARN org.apache.hadoop.hdfs.DFSClient: Error >>>>>>>> Recovery for block blk_1597245478875608321_2818 bad datanode[0] >>>>>>>> 192.168.20.30:50010 >>>>>>>> 2013-06-05 05:12:51,075 WARN org.apache.hadoop.hdfs.DFSClient: Error >>>>>>> while >>>>>>>> syncing >>>>>>>> java.io.IOException: All datanodes 192.168.20.30:50010 are bad. >>>>>>>> Aborting... >>>>>>>> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096) >>>>>>>> 2013-06-05 05:12:51,110 FATAL >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. >>>>>> Requesting >>>>>>>> close of hlog >>>>>>>> java.io.IOException: Reflection >>>>>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>>>>> Caused by: java.io.IOException: DFSOutputStream is closed >>>>>>>> 2013-06-05 05:12:51,180 FATAL >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. >>>>>> Requesting >>>>>>>> close of hlog >>>>>>>> java.io.IOException: Reflection >>>>>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>>>>> Caused by: java.io.IOException: DFSOutputStream is closed >>>>>>>> 2013-06-05 05:12:51,183 ERROR >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog: Failed close of HLog >>>>>>> writer >>>>>>>> java.io.IOException: Reflection >>>>>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>>>>> Caused by: java.io.IOException: DFSOutputStream is closed >>>>>>>> 2013-06-05 05:12:51,184 WARN >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog: Riding over HLog >> close >>>>>>>> failure! error count=1 >>>>>>>> 2013-06-05 05:12:52,557 FATAL >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >>>>>>> server >>>>>>>> hbase.rummycircle.com,60020,1369877672964: >>>>>>>> regionserver:60020-0x13ef31264d00001 >>>>>> regionserver:60020-0x13ef31264d00001 >>>>>>>> received expired from ZooKeeper, aborting >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired >>>>>>>> 2013-06-05 05:12:52,557 FATAL >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer >>>> abort: >>>>>>>> loaded coprocessors are: [] >>>>>>>> 2013-06-05 05:12:52,621 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker >>>>>>>> interrupted while waiting for task, exiting: >>>>>>> java.lang.InterruptedException >>>>>>>> java.io.InterruptedIOException: Aborting compaction of store >> cfp_info >>>>>> in >>>>>>>> region >>>>>> event_data,244630,1369879570539.3ebddcd11a3c22585a690bf40911cb1e. >>>>>>>> because user requested stop. >>>>>>>> 2013-06-05 05:12:53,425 WARN >>>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly >>>>>>> transient >>>>>>>> ZooKeeper exception: >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired for /hbase/rs/ >> hbase.rummycircle.com >>>>>>>> ,60020,1369877672964 >>>>>>>> 2013-06-05 05:12:55,426 WARN >>>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly >>>>>>> transient >>>>>>>> ZooKeeper exception: >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired for /hbase/rs/ >> hbase.rummycircle.com >>>>>>>> ,60020,1369877672964 >>>>>>>> 2013-06-05 05:12:59,427 WARN >>>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly >>>>>>> transient >>>>>>>> ZooKeeper exception: >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired for /hbase/rs/ >> hbase.rummycircle.com >>>>>>>> ,60020,1369877672964 >>>>>>>> 2013-06-05 05:13:07,427 WARN >>>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly >>>>>>> transient >>>>>>>> ZooKeeper exception: >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired for /hbase/rs/ >> hbase.rummycircle.com >>>>>>>> ,60020,1369877672964 >>>>>>>> 2013-06-05 05:13:07,427 ERROR >>>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper >>>>>> delete >>>>>>>> failed after 3 retries >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired for /hbase/rs/ >> hbase.rummycircle.com >>>>>>>> ,60020,1369877672964 >>>>>>>> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:127) >>>>>>>> at >>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>>>>>> 2013-06-05 05:13:07,436 ERROR org.apache.hadoop.hdfs.DFSClient: >>>>>> Exception >>>>>>>> closing file /hbase/.logs/hbase.rummycircle.com >> ,60020,1369877672964/ >>>>>>>> hbase.rummycircle.com%2C60020%2C1369877672964.1370382721642 : >>>>>>>> java.io.IOException: All datanodes 192.168.20.30:50010 are bad. >>>>>>>> Aborting... >>>>>>>> java.io.IOException: All datanodes 192.168.20.30:50010 are bad. >>>>>>>> Aborting... >>>>>>>> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096) >>>>>>>> >>>>>>>> >>>>>>>> *HMaster logs:* >>>>>>>> 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4702394ms instead of 10000ms, this is likely due to a long >>>>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4988731ms instead of 300000ms, this is likely due to a long >>>>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4988726ms instead of 300000ms, this is likely due to a long >>>>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4698291ms instead of 10000ms, this is likely due to a long >>>>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:50,711 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4694502ms instead of 1000ms, this is likely due to a long >>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:50,714 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4694492ms instead of 1000ms, this is likely due to a long >>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:50,715 WARN org.apache.hadoop.hbase.util.Sleeper: >> We >>>>>>>> slept 4695589ms instead of 60000ms, this is likely due to a long >>>>>> garbage >>>>>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >>>>>>>> 2013-06-05 05:12:52,263 FATAL >> org.apache.hadoop.hbase.master.HMaster: >>>>>>>> Master server abort: loaded coprocessors are: [] >>>>>>>> 2013-06-05 05:12:52,465 INFO >>>>>>> org.apache.hadoop.hbase.master.ServerManager: >>>>>>>> Waiting for region servers count to settle; currently checked in 1, >>>>>> slept >>>>>>>> for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of >>>>>> 4500 >>>>>>>> ms, interval of 1500 ms. >>>>>>>> 2013-06-05 05:12:52,561 ERROR >> org.apache.hadoop.hbase.master.HMaster: >>>>>>>> Region server hbase.rummycircle.com,60020,1369877672964 reported a >>>>>> fatal >>>>>>>> error: >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired >>>>>>>> 2013-06-05 05:12:53,970 INFO >>>>>>> org.apache.hadoop.hbase.master.ServerManager: >>>>>>>> Waiting for region servers count to settle; currently checked in 1, >>>>>> slept >>>>>>>> for 1506 ms, expecting minimum of 1, maximum of 2147483647, timeout >> of >>>>>>> 4500 >>>>>>>> ms, interval of 1500 ms. >>>>>>>> 2013-06-05 05:12:55,476 INFO >>>>>>> org.apache.hadoop.hbase.master.ServerManager: >>>>>>>> Waiting for region servers count to settle; currently checked in 1, >>>>>> slept >>>>>>>> for 3012 ms, expecting minimum of 1, maximum of 2147483647, timeout >> of >>>>>>> 4500 >>>>>>>> ms, interval of 1500 ms. >>>>>>>> 2013-06-05 05:12:56,981 INFO >>>>>>> org.apache.hadoop.hbase.master.ServerManager: >>>>>>>> Finished waiting for region servers count to settle; checked in 1, >>>>>> slept >>>>>>>> for 4517 ms, expecting minimum of 1, maximum of 2147483647, master >> is >>>>>>>> running. >>>>>>>> 2013-06-05 05:12:57,019 INFO >>>>>>>> org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification >> of >>>>>>>> -ROOT-,,0 at address=hbase.rummycircle.com,60020,1369877672964; >>>>>>>> java.io.EOFException >>>>>>>> 2013-06-05 05:17:52,302 WARN >>>>>>>> org.apache.hadoop.hbase.master.SplitLogManager: error while >> splitting >>>>>>> logs >>>>>>>> in [hdfs:// >> 192.168.20.30:9000/hbase/.logs/hbase.rummycircle.com,60020,1369877672964-splitting >>>>>>> ] >>>>>>>> installed = 19 but only 0 done >>>>>>>> 2013-06-05 05:17:52,321 FATAL >> org.apache.hadoop.hbase.master.HMaster: >>>>>>>> master:60000-0x13ef31264d00000 master:60000-0x13ef31264d00000 >> received >>>>>>>> expired from ZooKeeper, aborting >>>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>>> KeeperErrorCode = Session expired >>>>>>>> java.io.IOException: Giving up after tries=1 >>>>>>>> Caused by: java.lang.InterruptedException: sleep interrupted >>>>>>>> 2013-06-05 05:17:52,381 ERROR >>>>>>>> org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start >>>>>> master >>>>>>>> java.lang.RuntimeException: HMaster Aborted >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Vimal Jain >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Vimal Jain >>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Vimal Jain >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Vimal Jain > > > > -- > Thanks and Regards, > Vimal Jain
