Re: HMaster and HRegionServer going down

Azuryy Yu Wed, 05 Jun 2013 22:24:04 -0700

And, please check your namenode log.


On Thu, Jun 6, 2013 at 1:20 PM, Azuryy Yu <[email protected]> wrote:

> Can you reproduce the problem? if yes,
>
> add the following in your hbase-env.sh
>
> export HBASE_MASTER_OPTS="-verbose:gc -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -Xloggc:$HBASE_LOG_DIR/hmaster_gc.log
> $HBASE_MASTER_OPTS"
>
> export HBASE_REGIONSERVER_OPTS="-verbose:gc -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -Xloggc:$HBASE_LOG_DIR/hmaster_gc.log
> $HBASE_REGIONSERVER_OPTS"
>
> then, you will got GC log, I just guess this problem was lead with GC.
>
>
>
> On Thu, Jun 6, 2013 at 10:53 AM, Vimal Jain <[email protected]> wrote:
>
>> Hi Azuryy/Ted,
>> Can you please help here...
>> On Jun 5, 2013 7:23 PM, "Kevin O'dell" <[email protected]> wrote:
>>
>> > No!
>> >
>> > Just kidding, you can unsubscribe by going to the Apache site:
>> >
>> > http://hbase.apache.org/mail-lists.html
>> >
>> >
>> > On Wed, Jun 5, 2013 at 9:34 AM, Joseph Coleman <
>> > [email protected]> wrote:
>> >
>> > > Please remove me from this list
>> > >
>> > >
>> > > On 6/5/13 8:32 AM, "Vimal Jain" <[email protected]> wrote:
>> > >
>> > > >Ok.
>> > > >I dont have any batch read/write to hbase.
>> > > >
>> > > >
>> > > >On Wed, Jun 5, 2013 at 6:08 PM, Azuryy Yu <[email protected]>
>> wrote:
>> > > >
>> > > >> gc log cannot get by default. need some configuration. do you have
>> > some
>> > > >> batch read or write to hbase?
>> > > >>
>> > > >> --Send from my Sony mobile.
>> > > >> On Jun 5, 2013 8:25 PM, "Vimal Jain" <[email protected]> wrote:
>> > > >>
>> > > >> > I dont have GC logs.Do you get it by default  or it has to be
>> > > >>configured
>> > > >> ?
>> > > >> > After i came to know about crash , i checked which all processes
>> are
>> > > >> > running using "jps"
>> > > >> > It displayed 4 processes ,
>> "namenode","datanode","secondarynamenode"
>> > > >>and
>> > > >> > "HQuorumpeer".
>> > > >> > So i stopped dfs by running $HADOOP_HOME/bin/stop-dfs.sh and
>> then i
>> > > >> stopped
>> > > >> > hbase by running $HBASE_HOME/bin/stop-hbase.sh
>> > > >> >
>> > > >> >
>> > > >> > On Wed, Jun 5, 2013 at 5:49 PM, Azuryy Yu <[email protected]>
>> > wrote:
>> > > >> >
>> > > >> > > do you have GC log? and what you did during crash? and whats
>> your
>> > gc
>> > > >> > > options?
>> > > >> > >
>> > > >> > > for the dn error, thats net work issue generally, because dn
>> > > >>received
>> > > >> an
>> > > >> > > incomplete packet.
>> > > >> > >
>> > > >> > > --Send from my Sony mobile.
>> > > >> > > On Jun 5, 2013 8:10 PM, "Vimal Jain" <[email protected]> wrote:
>> > > >> > >
>> > > >> > > > Yes.
>> > > >> > > > Thats true.
>> > > >> > > > There are some errors in all 3 logs during same period , i.e.
>> > > >>data ,
>> > > >> > > master
>> > > >> > > > and region.
>> > > >> > > > But i am unable to deduce the exact cause of error.
>> > > >> > > > Can you please help in detecting the problem ?
>> > > >> > > >
>> > > >> > > > So far i am suspecting following :-
>> > > >> > > > I have 1GB heap (default) allocated for all 3 processes ,
>> i.e.
>> > > >> > > > Master,Region,Zookeeper.
>> > > >> > > > Both  Master and Region took more time for GC ( as inferred
>> from
>> > > >> lines
>> > > >> > in
>> > > >> > > > logs like "slept more time than configured one" etc ) .
>> > > >> > > > Due to this there was  zookeeper connection time out for both
>> > > >>Master
>> > > >> > and
>> > > >> > > > Region and hence both went down.
>> > > >> > > >
>> > > >> > > > I am newbie to Hbase and hence may be my findings are not
>> > correct.
>> > > >> > > > I want to be 100 % sure before increasing heap space for both
>> > > >>Master
>> > > >> > and
>> > > >> > > > Region ( Both around 2GB) to solve this.
>> > > >> > > > At present i have restarted the cluster with default heap
>> space
>> > > >>only
>> > > >> (
>> > > >> > > 1GB
>> > > >> > > > ).
>> > > >> > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > On Wed, Jun 5, 2013 at 5:23 PM, Azuryy Yu <
>> [email protected]>
>> > > >> wrote:
>> > > >> > > >
>> > > >> > > > > there have errors in your dats node log, and the error time
>> > > >>match
>> > > >> > with
>> > > >> > > rs
>> > > >> > > > > log error time.
>> > > >> > > > >
>> > > >> > > > > --Send from my Sony mobile.
>> > > >> > > > > On Jun 5, 2013 5:06 PM, "Vimal Jain" <[email protected]>
>> > wrote:
>> > > >> > > > >
>> > > >> > > > > > I don't think so , as i dont find any issues in data node
>> > > >>logs.
>> > > >> > > > > > Also there are lot of exceptions like "session expired" ,
>> > > >>"slept
>> > > >> > more
>> > > >> > > > > than
>> > > >> > > > > > configured time" . what are these ?
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > > On Wed, Jun 5, 2013 at 2:27 PM, Azuryy Yu <
>> > [email protected]
>> > > >
>> > > >> > > wrote:
>> > > >> > > > > >
>> > > >> > > > > > > Because your data node 192.168.20.30 broke down. which
>> > > >>leads to
>> > > >> > RS
>> > > >> > > > > down.
>> > > >> > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > > > On Wed, Jun 5, 2013 at 3:19 PM, Vimal Jain
>> > > >><[email protected]>
>> > > >> > > wrote:
>> > > >> > > > > > >
>> > > >> > > > > > > > Here is the complete log:
>> > > >> > > > > > > >
>> > > >> > > > > > > > http://bin.cakephp.org/saved/103001 - Hregion
>> > > >> > > > > > > > http://bin.cakephp.org/saved/103000 - Hmaster
>> > > >> > > > > > > > http://bin.cakephp.org/saved/103002 - Datanode
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > > > On Wed, Jun 5, 2013 at 11:58 AM, Vimal Jain <
>> > > >> [email protected]>
>> > > >> > > > > wrote:
>> > > >> > > > > > > >
>> > > >> > > > > > > > > Hi,
>> > > >> > > > > > > > > I have set up Hbase in pseudo-distributed mode.
>> > > >> > > > > > > > > It was working fine for 6 days , but suddenly today
>> > > >>morning
>> > > >> > > both
>> > > >> > > > > > > HMaster
>> > > >> > > > > > > > > and Hregion process went down.
>> > > >> > > > > > > > > I checked in logs of both hadoop and hbase.
>> > > >> > > > > > > > > Please help here.
>> > > >> > > > > > > > > Here are the snippets :-
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > *Datanode logs:*
>> > > >> > > > > > > > > 2013-06-05 05:12:51,436 INFO
>> > > >> > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > >>Exception
>> > > >> in
>> > > >> > > > > > > > receiveBlock
>> > > >> > > > > > > > > for block blk_1597245478875608321_2818
>> > > >> java.io.EOFException:
>> > > >> > > > while
>> > > >> > > > > > > trying
>> > > >> > > > > > > > > to read 2347 bytes
>> > > >> > > > > > > > > 2013-06-05 05:12:51,442 INFO
>> > > >> > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > >>writeBlock
>> > > >> > > > > > > > > blk_1597245478875608321_2818 received exception
>> > > >> > > > > java.io.EOFException:
>> > > >> > > > > > > > while
>> > > >> > > > > > > > > trying to read 2347 bytes
>> > > >> > > > > > > > > 2013-06-05 05:12:51,442 ERROR
>> > > >> > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > >> > > > > > DatanodeRegistration(
>> > > >> > > > > > > > > 192.168.20.30:50010,
>> > > >> > > > > > > > >
>> > > >>storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > > >> > > > > > > > infoPort=50075,
>> > > >> > > > > > > > > ipcPort=50020):DataXceiver
>> > > >> > > > > > > > > java.io.EOFException: while trying to read 2347
>> bytes
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > *HRegion logs:*
>> > > >> > > > > > > > > 2013-06-05 05:12:50,701 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4694929ms instead of 3000ms, this is likely
>> due
>> > > >>to a
>> > > >> > long
>> > > >> > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:51,045 WARN
>> > > >> > org.apache.hadoop.hdfs.DFSClient:
>> > > >> > > > > > > > > DFSOutputStream ResponseProcessor exception  for
>> block
>> > > >> > > > > > > > >
>> > > >> blk_1597245478875608321_2818java.net.SocketTimeoutException:
>> > > >> > > > 63000
>> > > >> > > > > > > millis
>> > > >> > > > > > > > > timeout while waiting for channel to be ready for
>> > read.
>> > > >>ch
>> > > >> :
>> > > >> > > > > > > > > java.nio.channels.SocketChannel[connected local=/
>> > > >> > > > > 192.168.20.30:44333
>> > > >> > > > > > > > remote=/
>> > > >> > > > > > > > > 192.168.20.30:50010]
>> > > >> > > > > > > > > 2013-06-05 05:12:51,046 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 11695345ms instead of 10000000ms, this is
>> likely
>> > > >>due
>> > > >> > to a
>> > > >> > > > > long
>> > > >> > > > > > > > > garbage collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:51,048 WARN
>> > > >> > org.apache.hadoop.hdfs.DFSClient:
>> > > >> > > > > Error
>> > > >> > > > > > > > > Recovery for block blk_1597245478875608321_2818 bad
>> > > >> > datanode[0]
>> > > >> > > > > > > > > 192.168.20.30:50010
>> > > >> > > > > > > > > 2013-06-05 05:12:51,075 WARN
>> > > >> > org.apache.hadoop.hdfs.DFSClient:
>> > > >> > > > > Error
>> > > >> > > > > > > > while
>> > > >> > > > > > > > > syncing
>> > > >> > > > > > > > > java.io.IOException: All datanodes
>> > 192.168.20.30:50010
>> > > >>are
>> > > >> > > bad.
>> > > >> > > > > > > > > Aborting...
>> > > >> > > > > > > > >     at
>> > > >> > > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> >>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
>> > > >>Client.java:3096)
>> > > >> > > > > > > > > 2013-06-05 05:12:51,110 FATAL
>> > > >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog:
>> Could
>> > not
>> > > >> > sync.
>> > > >> > > > > > > Requesting
>> > > >> > > > > > > > > close of hlog
>> > > >> > > > > > > > > java.io.IOException: Reflection
>> > > >> > > > > > > > > Caused by:
>> java.lang.reflect.InvocationTargetException
>> > > >> > > > > > > > > Caused by: java.io.IOException: DFSOutputStream is
>> > > >>closed
>> > > >> > > > > > > > > 2013-06-05 05:12:51,180 FATAL
>> > > >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog:
>> Could
>> > not
>> > > >> > sync.
>> > > >> > > > > > > Requesting
>> > > >> > > > > > > > > close of hlog
>> > > >> > > > > > > > > java.io.IOException: Reflection
>> > > >> > > > > > > > > Caused by:
>> java.lang.reflect.InvocationTargetException
>> > > >> > > > > > > > > Caused by: java.io.IOException: DFSOutputStream is
>> > > >>closed
>> > > >> > > > > > > > > 2013-06-05 05:12:51,183 ERROR
>> > > >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog:
>> Failed
>> > > >>close
>> > > >> > of
>> > > >> > > > HLog
>> > > >> > > > > > > > writer
>> > > >> > > > > > > > > java.io.IOException: Reflection
>> > > >> > > > > > > > > Caused by:
>> java.lang.reflect.InvocationTargetException
>> > > >> > > > > > > > > Caused by: java.io.IOException: DFSOutputStream is
>> > > >>closed
>> > > >> > > > > > > > > 2013-06-05 05:12:51,184 WARN
>> > > >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog:
>> Riding
>> > > >>over
>> > > >> > HLog
>> > > >> > > > > close
>> > > >> > > > > > > > > failure! error count=1
>> > > >> > > > > > > > > 2013-06-05 05:12:52,557 FATAL
>> > > >> > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > > >> ABORTING
>> > > >> > > > region
>> > > >> > > > > > > > server
>> > > >> > > > > > > > > hbase.rummycircle.com,60020,1369877672964:
>> > > >> > > > > > > > > regionserver:60020-0x13ef31264d00001
>> > > >> > > > > > > regionserver:60020-0x13ef31264d00001
>> > > >> > > > > > > > > received expired from ZooKeeper, aborting
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired
>> > > >> > > > > > > > > 2013-06-05 05:12:52,557 FATAL
>> > > >> > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > > >> > > RegionServer
>> > > >> > > > > > abort:
>> > > >> > > > > > > > > loaded coprocessors are: []
>> > > >> > > > > > > > > 2013-06-05 05:12:52,621 INFO
>> > > >> > > > > > > > >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker:
>> > > >> > > > SplitLogWorker
>> > > >> > > > > > > > > interrupted while waiting for task, exiting:
>> > > >> > > > > > > > java.lang.InterruptedException
>> > > >> > > > > > > > > java.io.InterruptedIOException: Aborting
>> compaction of
>> > > >> store
>> > > >> > > > > cfp_info
>> > > >> > > > > > > in
>> > > >> > > > > > > > > region
>> > > >> > > > > > >
>> > > >> event_data,244630,1369879570539.3ebddcd11a3c22585a690bf40911cb1e.
>> > > >> > > > > > > > > because user requested stop.
>> > > >> > > > > > > > > 2013-06-05 05:12:53,425 WARN
>> > > >> > > > > > > > >
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> > > >> > > Possibly
>> > > >> > > > > > > > transient
>> > > >> > > > > > > > > ZooKeeper exception:
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/
>> > > >> > > > > > hbase.rummycircle.com
>> > > >> > > > > > > > > ,60020,1369877672964
>> > > >> > > > > > > > > 2013-06-05 05:12:55,426 WARN
>> > > >> > > > > > > > >
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> > > >> > > Possibly
>> > > >> > > > > > > > transient
>> > > >> > > > > > > > > ZooKeeper exception:
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/
>> > > >> > > > > > hbase.rummycircle.com
>> > > >> > > > > > > > > ,60020,1369877672964
>> > > >> > > > > > > > > 2013-06-05 05:12:59,427 WARN
>> > > >> > > > > > > > >
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> > > >> > > Possibly
>> > > >> > > > > > > > transient
>> > > >> > > > > > > > > ZooKeeper exception:
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/
>> > > >> > > > > > hbase.rummycircle.com
>> > > >> > > > > > > > > ,60020,1369877672964
>> > > >> > > > > > > > > 2013-06-05 05:13:07,427 WARN
>> > > >> > > > > > > > >
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> > > >> > > Possibly
>> > > >> > > > > > > > transient
>> > > >> > > > > > > > > ZooKeeper exception:
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/
>> > > >> > > > > > hbase.rummycircle.com
>> > > >> > > > > > > > > ,60020,1369877672964
>> > > >> > > > > > > > > 2013-06-05 05:13:07,427 ERROR
>> > > >> > > > > > > > >
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> > > >> > > ZooKeeper
>> > > >> > > > > > > delete
>> > > >> > > > > > > > > failed after 3 retries
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/
>> > > >> > > > > > hbase.rummycircle.com
>> > > >> > > > > > > > > ,60020,1369877672964
>> > > >> > > > > > > > >     at
>> > > >> > > > > > > > >
>> > > >> > > > >
>> > > >>
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>> > > >> > > > > > > > >     at
>> > > >> > > > > > > >
>> > > >> > > >
>> > > >>org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> > > >> > > > > > > > > 2013-06-05 05:13:07,436 ERROR
>> > > >> > org.apache.hadoop.hdfs.DFSClient:
>> > > >> > > > > > > Exception
>> > > >> > > > > > > > > closing file /hbase/.logs/hbase.rummycircle.com
>> > > >> > > > > ,60020,1369877672964/
>> > > >> > > > > > > > > hbase.rummycircle.com
>> > > >> %2C60020%2C1369877672964.1370382721642
>> > > >> > :
>> > > >> > > > > > > > > java.io.IOException: All datanodes
>> > 192.168.20.30:50010
>> > > >>are
>> > > >> > > bad.
>> > > >> > > > > > > > > Aborting...
>> > > >> > > > > > > > > java.io.IOException: All datanodes
>> > 192.168.20.30:50010
>> > > >>are
>> > > >> > > bad.
>> > > >> > > > > > > > > Aborting...
>> > > >> > > > > > > > >     at
>> > > >> > > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> >>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
>> > > >>Client.java:3096)
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > *HMaster logs:*
>> > > >> > > > > > > > > 2013-06-05 05:12:50,701 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4702394ms instead of 10000ms, this is likely
>> due
>> > > >>to a
>> > > >> > > long
>> > > >> > > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:50,701 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4988731ms instead of 300000ms, this is likely
>> > due
>> > > >>to
>> > > >> a
>> > > >> > > long
>> > > >> > > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:50,701 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4988726ms instead of 300000ms, this is likely
>> > due
>> > > >>to
>> > > >> a
>> > > >> > > long
>> > > >> > > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:50,701 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4698291ms instead of 10000ms, this is likely
>> due
>> > > >>to a
>> > > >> > > long
>> > > >> > > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:50,711 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4694502ms instead of 1000ms, this is likely
>> due
>> > > >>to a
>> > > >> > long
>> > > >> > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:50,714 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4694492ms instead of 1000ms, this is likely
>> due
>> > > >>to a
>> > > >> > long
>> > > >> > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:50,715 WARN
>> > > >> > > > org.apache.hadoop.hbase.util.Sleeper:
>> > > >> > > > > We
>> > > >> > > > > > > > > slept 4695589ms instead of 60000ms, this is likely
>> due
>> > > >>to a
>> > > >> > > long
>> > > >> > > > > > > garbage
>> > > >> > > > > > > > > collecting pause and it's usually bad, see
>> > > >> > > > > > > > >
>> > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >> > > > > > > > > 2013-06-05 05:12:52,263 FATAL
>> > > >> > > > > org.apache.hadoop.hbase.master.HMaster:
>> > > >> > > > > > > > > Master server abort: loaded coprocessors are: []
>> > > >> > > > > > > > > 2013-06-05 05:12:52,465 INFO
>> > > >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager:
>> > > >> > > > > > > > > Waiting for region servers count to settle;
>> currently
>> > > >> checked
>> > > >> > > in
>> > > >> > > > 1,
>> > > >> > > > > > > slept
>> > > >> > > > > > > > > for 0 ms, expecting minimum of 1, maximum of
>> > 2147483647,
>> > > >> > > timeout
>> > > >> > > > of
>> > > >> > > > > > > 4500
>> > > >> > > > > > > > > ms, interval of 1500 ms.
>> > > >> > > > > > > > > 2013-06-05 05:12:52,561 ERROR
>> > > >> > > > > org.apache.hadoop.hbase.master.HMaster:
>> > > >> > > > > > > > > Region server hbase.rummycircle.com
>> > ,60020,1369877672964
>> > > >> > > > reported a
>> > > >> > > > > > > fatal
>> > > >> > > > > > > > > error:
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired
>> > > >> > > > > > > > > 2013-06-05 05:12:53,970 INFO
>> > > >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager:
>> > > >> > > > > > > > > Waiting for region servers count to settle;
>> currently
>> > > >> checked
>> > > >> > > in
>> > > >> > > > 1,
>> > > >> > > > > > > slept
>> > > >> > > > > > > > > for 1506 ms, expecting minimum of 1, maximum of
>> > > >>2147483647,
>> > > >> > > > timeout
>> > > >> > > > > > of
>> > > >> > > > > > > > 4500
>> > > >> > > > > > > > > ms, interval of 1500 ms.
>> > > >> > > > > > > > > 2013-06-05 05:12:55,476 INFO
>> > > >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager:
>> > > >> > > > > > > > > Waiting for region servers count to settle;
>> currently
>> > > >> checked
>> > > >> > > in
>> > > >> > > > 1,
>> > > >> > > > > > > slept
>> > > >> > > > > > > > > for 3012 ms, expecting minimum of 1, maximum of
>> > > >>2147483647,
>> > > >> > > > timeout
>> > > >> > > > > > of
>> > > >> > > > > > > > 4500
>> > > >> > > > > > > > > ms, interval of 1500 ms.
>> > > >> > > > > > > > > 2013-06-05 05:12:56,981 INFO
>> > > >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager:
>> > > >> > > > > > > > > Finished waiting for region servers count to
>> settle;
>> > > >> checked
>> > > >> > in
>> > > >> > > > 1,
>> > > >> > > > > > > slept
>> > > >> > > > > > > > > for 4517 ms, expecting minimum of 1, maximum of
>> > > >>2147483647,
>> > > >> > > > master
>> > > >> > > > > is
>> > > >> > > > > > > > > running.
>> > > >> > > > > > > > > 2013-06-05 05:12:57,019 INFO
>> > > >> > > > > > > > > org.apache.hadoop.hbase.catalog.CatalogTracker:
>> Failed
>> > > >> > > > verification
>> > > >> > > > > > of
>> > > >> > > > > > > > > -ROOT-,,0 at address=hbase.rummycircle.com
>> > > >> > > ,60020,1369877672964;
>> > > >> > > > > > > > > java.io.EOFException
>> > > >> > > > > > > > > 2013-06-05 05:17:52,302 WARN
>> > > >> > > > > > > > > org.apache.hadoop.hbase.master.SplitLogManager:
>> error
>> > > >>while
>> > > >> > > > > splitting
>> > > >> > > > > > > > logs
>> > > >> > > > > > > > > in [hdfs://
>> > > >> > > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >>
>> > >
>> >
>> 192.168.20.30:9000/hbase/.logs/hbase.rummycircle.com,60020,1369877672964-
>> > > >>splitting
>> > > >> > > > > > > > ]
>> > > >> > > > > > > > > installed = 19 but only 0 done
>> > > >> > > > > > > > > 2013-06-05 05:17:52,321 FATAL
>> > > >> > > > > org.apache.hadoop.hbase.master.HMaster:
>> > > >> > > > > > > > > master:60000-0x13ef31264d00000
>> > > >> master:60000-0x13ef31264d00000
>> > > >> > > > > > received
>> > > >> > > > > > > > > expired from ZooKeeper, aborting
>> > > >> > > > > > > > >
>> > > >> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> > > >> > > > > > > > > KeeperErrorCode = Session expired
>> > > >> > > > > > > > > java.io.IOException: Giving up after tries=1
>> > > >> > > > > > > > > Caused by: java.lang.InterruptedException: sleep
>> > > >> interrupted
>> > > >> > > > > > > > > 2013-06-05 05:17:52,381 ERROR
>> > > >> > > > > > > > > org.apache.hadoop.hbase.master.HMasterCommandLine:
>> > > >>Failed
>> > > >> to
>> > > >> > > > start
>> > > >> > > > > > > master
>> > > >> > > > > > > > > java.lang.RuntimeException: HMaster Aborted
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > --
>> > > >> > > > > > > > > Thanks and Regards,
>> > > >> > > > > > > > > Vimal Jain
>> > > >> > > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > > > --
>> > > >> > > > > > > > Thanks and Regards,
>> > > >> > > > > > > > Vimal Jain
>> > > >> > > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > > --
>> > > >> > > > > > Thanks and Regards,
>> > > >> > > > > > Vimal Jain
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > --
>> > > >> > > > Thanks and Regards,
>> > > >> > > > Vimal Jain
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > --
>> > > >> > Thanks and Regards,
>> > > >> > Vimal Jain
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > >--
>> > > >Thanks and Regards,
>> > > >Vimal Jain
>> > >
>> > >
>> >
>> >
>> > --
>> > Kevin O'Dell
>> > Systems Engineer, Cloudera
>> >
>>
>
>

Re: HMaster and HRegionServer going down

Reply via email to