Please remove me from this list
On 6/5/13 8:32 AM, "Vimal Jain" <[email protected]> wrote: >Ok. >I dont have any batch read/write to hbase. > > >On Wed, Jun 5, 2013 at 6:08 PM, Azuryy Yu <[email protected]> wrote: > >> gc log cannot get by default. need some configuration. do you have some >> batch read or write to hbase? >> >> --Send from my Sony mobile. >> On Jun 5, 2013 8:25 PM, "Vimal Jain" <[email protected]> wrote: >> >> > I dont have GC logs.Do you get it by default or it has to be >>configured >> ? >> > After i came to know about crash , i checked which all processes are >> > running using "jps" >> > It displayed 4 processes , "namenode","datanode","secondarynamenode" >>and >> > "HQuorumpeer". >> > So i stopped dfs by running $HADOOP_HOME/bin/stop-dfs.sh and then i >> stopped >> > hbase by running $HBASE_HOME/bin/stop-hbase.sh >> > >> > >> > On Wed, Jun 5, 2013 at 5:49 PM, Azuryy Yu <[email protected]> wrote: >> > >> > > do you have GC log? and what you did during crash? and whats your gc >> > > options? >> > > >> > > for the dn error, thats net work issue generally, because dn >>received >> an >> > > incomplete packet. >> > > >> > > --Send from my Sony mobile. >> > > On Jun 5, 2013 8:10 PM, "Vimal Jain" <[email protected]> wrote: >> > > >> > > > Yes. >> > > > Thats true. >> > > > There are some errors in all 3 logs during same period , i.e. >>data , >> > > master >> > > > and region. >> > > > But i am unable to deduce the exact cause of error. >> > > > Can you please help in detecting the problem ? >> > > > >> > > > So far i am suspecting following :- >> > > > I have 1GB heap (default) allocated for all 3 processes , i.e. >> > > > Master,Region,Zookeeper. >> > > > Both Master and Region took more time for GC ( as inferred from >> lines >> > in >> > > > logs like "slept more time than configured one" etc ) . >> > > > Due to this there was zookeeper connection time out for both >>Master >> > and >> > > > Region and hence both went down. >> > > > >> > > > I am newbie to Hbase and hence may be my findings are not correct. >> > > > I want to be 100 % sure before increasing heap space for both >>Master >> > and >> > > > Region ( Both around 2GB) to solve this. >> > > > At present i have restarted the cluster with default heap space >>only >> ( >> > > 1GB >> > > > ). >> > > > >> > > > >> > > > >> > > > On Wed, Jun 5, 2013 at 5:23 PM, Azuryy Yu <[email protected]> >> wrote: >> > > > >> > > > > there have errors in your dats node log, and the error time >>match >> > with >> > > rs >> > > > > log error time. >> > > > > >> > > > > --Send from my Sony mobile. >> > > > > On Jun 5, 2013 5:06 PM, "Vimal Jain" <[email protected]> wrote: >> > > > > >> > > > > > I don't think so , as i dont find any issues in data node >>logs. >> > > > > > Also there are lot of exceptions like "session expired" , >>"slept >> > more >> > > > > than >> > > > > > configured time" . what are these ? >> > > > > > >> > > > > > >> > > > > > On Wed, Jun 5, 2013 at 2:27 PM, Azuryy Yu <[email protected]> >> > > wrote: >> > > > > > >> > > > > > > Because your data node 192.168.20.30 broke down. which >>leads to >> > RS >> > > > > down. >> > > > > > > >> > > > > > > >> > > > > > > On Wed, Jun 5, 2013 at 3:19 PM, Vimal Jain >><[email protected]> >> > > wrote: >> > > > > > > >> > > > > > > > Here is the complete log: >> > > > > > > > >> > > > > > > > http://bin.cakephp.org/saved/103001 - Hregion >> > > > > > > > http://bin.cakephp.org/saved/103000 - Hmaster >> > > > > > > > http://bin.cakephp.org/saved/103002 - Datanode >> > > > > > > > >> > > > > > > > >> > > > > > > > On Wed, Jun 5, 2013 at 11:58 AM, Vimal Jain < >> [email protected]> >> > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi, >> > > > > > > > > I have set up Hbase in pseudo-distributed mode. >> > > > > > > > > It was working fine for 6 days , but suddenly today >>morning >> > > both >> > > > > > > HMaster >> > > > > > > > > and Hregion process went down. >> > > > > > > > > I checked in logs of both hadoop and hbase. >> > > > > > > > > Please help here. >> > > > > > > > > Here are the snippets :- >> > > > > > > > > >> > > > > > > > > *Datanode logs:* >> > > > > > > > > 2013-06-05 05:12:51,436 INFO >> > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: >>Exception >> in >> > > > > > > > receiveBlock >> > > > > > > > > for block blk_1597245478875608321_2818 >> java.io.EOFException: >> > > > while >> > > > > > > trying >> > > > > > > > > to read 2347 bytes >> > > > > > > > > 2013-06-05 05:12:51,442 INFO >> > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: >>writeBlock >> > > > > > > > > blk_1597245478875608321_2818 received exception >> > > > > java.io.EOFException: >> > > > > > > > while >> > > > > > > > > trying to read 2347 bytes >> > > > > > > > > 2013-06-05 05:12:51,442 ERROR >> > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: >> > > > > > DatanodeRegistration( >> > > > > > > > > 192.168.20.30:50010, >> > > > > > > > > >>storageID=DS-1816106352-192.168.20.30-50010-1369314076237, >> > > > > > > > infoPort=50075, >> > > > > > > > > ipcPort=50020):DataXceiver >> > > > > > > > > java.io.EOFException: while trying to read 2347 bytes >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > *HRegion logs:* >> > > > > > > > > 2013-06-05 05:12:50,701 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4694929ms instead of 3000ms, this is likely due >>to a >> > long >> > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:51,045 WARN >> > org.apache.hadoop.hdfs.DFSClient: >> > > > > > > > > DFSOutputStream ResponseProcessor exception for block >> > > > > > > > > >> blk_1597245478875608321_2818java.net.SocketTimeoutException: >> > > > 63000 >> > > > > > > millis >> > > > > > > > > timeout while waiting for channel to be ready for read. >>ch >> : >> > > > > > > > > java.nio.channels.SocketChannel[connected local=/ >> > > > > 192.168.20.30:44333 >> > > > > > > > remote=/ >> > > > > > > > > 192.168.20.30:50010] >> > > > > > > > > 2013-06-05 05:12:51,046 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 11695345ms instead of 10000000ms, this is likely >>due >> > to a >> > > > > long >> > > > > > > > > garbage collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:51,048 WARN >> > org.apache.hadoop.hdfs.DFSClient: >> > > > > Error >> > > > > > > > > Recovery for block blk_1597245478875608321_2818 bad >> > datanode[0] >> > > > > > > > > 192.168.20.30:50010 >> > > > > > > > > 2013-06-05 05:12:51,075 WARN >> > org.apache.hadoop.hdfs.DFSClient: >> > > > > Error >> > > > > > > > while >> > > > > > > > > syncing >> > > > > > > > > java.io.IOException: All datanodes 192.168.20.30:50010 >>are >> > > bad. >> > > > > > > > > Aborting... >> > > > > > > > > at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFS >>Client.java:3096) >> > > > > > > > > 2013-06-05 05:12:51,110 FATAL >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not >> > sync. >> > > > > > > Requesting >> > > > > > > > > close of hlog >> > > > > > > > > java.io.IOException: Reflection >> > > > > > > > > Caused by: java.lang.reflect.InvocationTargetException >> > > > > > > > > Caused by: java.io.IOException: DFSOutputStream is >>closed >> > > > > > > > > 2013-06-05 05:12:51,180 FATAL >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not >> > sync. >> > > > > > > Requesting >> > > > > > > > > close of hlog >> > > > > > > > > java.io.IOException: Reflection >> > > > > > > > > Caused by: java.lang.reflect.InvocationTargetException >> > > > > > > > > Caused by: java.io.IOException: DFSOutputStream is >>closed >> > > > > > > > > 2013-06-05 05:12:51,183 ERROR >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Failed >>close >> > of >> > > > HLog >> > > > > > > > writer >> > > > > > > > > java.io.IOException: Reflection >> > > > > > > > > Caused by: java.lang.reflect.InvocationTargetException >> > > > > > > > > Caused by: java.io.IOException: DFSOutputStream is >>closed >> > > > > > > > > 2013-06-05 05:12:51,184 WARN >> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Riding >>over >> > HLog >> > > > > close >> > > > > > > > > failure! error count=1 >> > > > > > > > > 2013-06-05 05:12:52,557 FATAL >> > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: >> ABORTING >> > > > region >> > > > > > > > server >> > > > > > > > > hbase.rummycircle.com,60020,1369877672964: >> > > > > > > > > regionserver:60020-0x13ef31264d00001 >> > > > > > > regionserver:60020-0x13ef31264d00001 >> > > > > > > > > received expired from ZooKeeper, aborting >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired >> > > > > > > > > 2013-06-05 05:12:52,557 FATAL >> > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: >> > > RegionServer >> > > > > > abort: >> > > > > > > > > loaded coprocessors are: [] >> > > > > > > > > 2013-06-05 05:12:52,621 INFO >> > > > > > > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: >> > > > SplitLogWorker >> > > > > > > > > interrupted while waiting for task, exiting: >> > > > > > > > java.lang.InterruptedException >> > > > > > > > > java.io.InterruptedIOException: Aborting compaction of >> store >> > > > > cfp_info >> > > > > > > in >> > > > > > > > > region >> > > > > > > >> event_data,244630,1369879570539.3ebddcd11a3c22585a690bf40911cb1e. >> > > > > > > > > because user requested stop. >> > > > > > > > > 2013-06-05 05:12:53,425 WARN >> > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: >> > > Possibly >> > > > > > > > transient >> > > > > > > > > ZooKeeper exception: >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/ >> > > > > > hbase.rummycircle.com >> > > > > > > > > ,60020,1369877672964 >> > > > > > > > > 2013-06-05 05:12:55,426 WARN >> > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: >> > > Possibly >> > > > > > > > transient >> > > > > > > > > ZooKeeper exception: >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/ >> > > > > > hbase.rummycircle.com >> > > > > > > > > ,60020,1369877672964 >> > > > > > > > > 2013-06-05 05:12:59,427 WARN >> > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: >> > > Possibly >> > > > > > > > transient >> > > > > > > > > ZooKeeper exception: >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/ >> > > > > > hbase.rummycircle.com >> > > > > > > > > ,60020,1369877672964 >> > > > > > > > > 2013-06-05 05:13:07,427 WARN >> > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: >> > > Possibly >> > > > > > > > transient >> > > > > > > > > ZooKeeper exception: >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/ >> > > > > > hbase.rummycircle.com >> > > > > > > > > ,60020,1369877672964 >> > > > > > > > > 2013-06-05 05:13:07,427 ERROR >> > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: >> > > ZooKeeper >> > > > > > > delete >> > > > > > > > > failed after 3 retries >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired for /hbase/rs/ >> > > > > > hbase.rummycircle.com >> > > > > > > > > ,60020,1369877672964 >> > > > > > > > > at >> > > > > > > > > >> > > > > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:127) >> > > > > > > > > at >> > > > > > > > >> > > > >>org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> > > > > > > > > 2013-06-05 05:13:07,436 ERROR >> > org.apache.hadoop.hdfs.DFSClient: >> > > > > > > Exception >> > > > > > > > > closing file /hbase/.logs/hbase.rummycircle.com >> > > > > ,60020,1369877672964/ >> > > > > > > > > hbase.rummycircle.com >> %2C60020%2C1369877672964.1370382721642 >> > : >> > > > > > > > > java.io.IOException: All datanodes 192.168.20.30:50010 >>are >> > > bad. >> > > > > > > > > Aborting... >> > > > > > > > > java.io.IOException: All datanodes 192.168.20.30:50010 >>are >> > > bad. >> > > > > > > > > Aborting... >> > > > > > > > > at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFS >>Client.java:3096) >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > *HMaster logs:* >> > > > > > > > > 2013-06-05 05:12:50,701 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4702394ms instead of 10000ms, this is likely due >>to a >> > > long >> > > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:50,701 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4988731ms instead of 300000ms, this is likely due >>to >> a >> > > long >> > > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:50,701 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4988726ms instead of 300000ms, this is likely due >>to >> a >> > > long >> > > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:50,701 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4698291ms instead of 10000ms, this is likely due >>to a >> > > long >> > > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:50,711 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4694502ms instead of 1000ms, this is likely due >>to a >> > long >> > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:50,714 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4694492ms instead of 1000ms, this is likely due >>to a >> > long >> > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:50,715 WARN >> > > > org.apache.hadoop.hbase.util.Sleeper: >> > > > > We >> > > > > > > > > slept 4695589ms instead of 60000ms, this is likely due >>to a >> > > long >> > > > > > > garbage >> > > > > > > > > collecting pause and it's usually bad, see >> > > > > > > > > >> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > > > > > > > 2013-06-05 05:12:52,263 FATAL >> > > > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > > > Master server abort: loaded coprocessors are: [] >> > > > > > > > > 2013-06-05 05:12:52,465 INFO >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager: >> > > > > > > > > Waiting for region servers count to settle; currently >> checked >> > > in >> > > > 1, >> > > > > > > slept >> > > > > > > > > for 0 ms, expecting minimum of 1, maximum of 2147483647, >> > > timeout >> > > > of >> > > > > > > 4500 >> > > > > > > > > ms, interval of 1500 ms. >> > > > > > > > > 2013-06-05 05:12:52,561 ERROR >> > > > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > > > Region server hbase.rummycircle.com,60020,1369877672964 >> > > > reported a >> > > > > > > fatal >> > > > > > > > > error: >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired >> > > > > > > > > 2013-06-05 05:12:53,970 INFO >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager: >> > > > > > > > > Waiting for region servers count to settle; currently >> checked >> > > in >> > > > 1, >> > > > > > > slept >> > > > > > > > > for 1506 ms, expecting minimum of 1, maximum of >>2147483647, >> > > > timeout >> > > > > > of >> > > > > > > > 4500 >> > > > > > > > > ms, interval of 1500 ms. >> > > > > > > > > 2013-06-05 05:12:55,476 INFO >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager: >> > > > > > > > > Waiting for region servers count to settle; currently >> checked >> > > in >> > > > 1, >> > > > > > > slept >> > > > > > > > > for 3012 ms, expecting minimum of 1, maximum of >>2147483647, >> > > > timeout >> > > > > > of >> > > > > > > > 4500 >> > > > > > > > > ms, interval of 1500 ms. >> > > > > > > > > 2013-06-05 05:12:56,981 INFO >> > > > > > > > org.apache.hadoop.hbase.master.ServerManager: >> > > > > > > > > Finished waiting for region servers count to settle; >> checked >> > in >> > > > 1, >> > > > > > > slept >> > > > > > > > > for 4517 ms, expecting minimum of 1, maximum of >>2147483647, >> > > > master >> > > > > is >> > > > > > > > > running. >> > > > > > > > > 2013-06-05 05:12:57,019 INFO >> > > > > > > > > org.apache.hadoop.hbase.catalog.CatalogTracker: Failed >> > > > verification >> > > > > > of >> > > > > > > > > -ROOT-,,0 at address=hbase.rummycircle.com >> > > ,60020,1369877672964; >> > > > > > > > > java.io.EOFException >> > > > > > > > > 2013-06-05 05:17:52,302 WARN >> > > > > > > > > org.apache.hadoop.hbase.master.SplitLogManager: error >>while >> > > > > splitting >> > > > > > > > logs >> > > > > > > > > in [hdfs:// >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >>192.168.20.30:9000/hbase/.logs/hbase.rummycircle.com,60020,1369877672964- >>splitting >> > > > > > > > ] >> > > > > > > > > installed = 19 but only 0 done >> > > > > > > > > 2013-06-05 05:17:52,321 FATAL >> > > > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > > > master:60000-0x13ef31264d00000 >> master:60000-0x13ef31264d00000 >> > > > > > received >> > > > > > > > > expired from ZooKeeper, aborting >> > > > > > > > > >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> > > > > > > > > KeeperErrorCode = Session expired >> > > > > > > > > java.io.IOException: Giving up after tries=1 >> > > > > > > > > Caused by: java.lang.InterruptedException: sleep >> interrupted >> > > > > > > > > 2013-06-05 05:17:52,381 ERROR >> > > > > > > > > org.apache.hadoop.hbase.master.HMasterCommandLine: >>Failed >> to >> > > > start >> > > > > > > master >> > > > > > > > > java.lang.RuntimeException: HMaster Aborted >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Thanks and Regards, >> > > > > > > > > Vimal Jain >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Thanks and Regards, >> > > > > > > > Vimal Jain >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > Thanks and Regards, >> > > > > > Vimal Jain >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Thanks and Regards, >> > > > Vimal Jain >> > > > >> > > >> > >> > >> > >> > -- >> > Thanks and Regards, >> > Vimal Jain >> > >> > > > >-- >Thanks and Regards, >Vimal Jain
