all the nodes are running but master does not run region-server; master was limited to run nameNode, quorum, and HMaster functionality. you mean to run region server on Master node as well?
On Thu, Aug 8, 2013 at 2:48 PM, Jimmy Xiang <[email protected]> wrote: > Can you start the master as well (besides region servers)? > > > On Thu, Aug 8, 2013 at 2:41 PM, oc tsdb <[email protected]> wrote: > > > I am using hbase-0.92 > > > > Region server was not running on any of the nodes. > > > > Restarted the cluster. It started region server on all nodes except > > HMaster but still unresponsive. > > > > processes running on master are > > TSDMain > > HMaster > > SecondaryNameNode > > NameNode > > JobTracker > > HQuorumPeer > > > > processes running on all other nodes are > > DataNode > > TaskTracker > > RegionServer > > TSDMain > > > > This time, I see the error messages in the attached log. > > > > Could you please suggest if I can recover/restore the data and get the > > cluster up. > > > > Thanks & Regards, > > VSR > > > > > > > > On Thu, Aug 8, 2013 at 1:40 PM, Ted Yu <[email protected]> wrote: > > > >> Can you tell us the version of HBase you're using ? > >> > >> Do you find something in region server logs on the 4 remaining nodes ? > >> > >> Cheers > >> > >> On Thu, Aug 8, 2013 at 1:36 PM, oc tsdb <[email protected]> wrote: > >> > >> > Hi, > >> > > >> > I am running a cluster with 6 nodes; > >> > Two of 6 nodes in my cluster went down (due to other application > >> failure) > >> > and came back after some time (had to do a power reboot). > >> > When these nodes are back I use to get "WARN > >> org.apache.hadoop.DFSClient: > >> > Failed to connect to , add to deadnodes and continue". > >> > Now these messages are stopped and getting continuous debug message as > >> > follows. > >> > > >> > 2013-08-08 12:57:36,628 DEBUG org.apache.hadoop.hbase. > >> > master.SplitLogManager: total tasks = 14 unassigned = 14 > >> > 2013-08-08 12:57:37,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-3.corp.oc.com%2C60020%2C1375466447768-splitting% > 2Fmb-3.corp.oc.com > >> > %252C60020%252C1375466447768.1375631802971 > >> > ver = 0 > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-6.corp.oc.com%2C60020%2C1375466460755-splitting% > 2Fmb-6.corp.oc.com > >> > %252C60020%252C1375466460755.1375623787557 > >> > ver = 0 > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-6.corp.oc.com%2C60020%2C1375466460755-splitting% > 2Fmb-6.corp.oc.com > >> > %252C60020%252C1375466460755.1375619231059 > >> > ver = 3 > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-2.corp.oc.com%2C60020%2C1375466479427-splitting% > 2Fmb-2.corp.oc.com > >> > %252C60020%252C1375466479427.1375639017535 > >> > ver = 0 > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-6.corp.oc.com%2C60020%2C1375466460755-splitting% > 2Fmb-6.corp.oc.com > >> > %252C60020%252C1375466460755.1375623021175 > >> > ver = 0 > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-3.corp.oc.com%2C60020%2C1375466447768-splitting% > 2Fmb-3.corp.oc.com > >> > %252C60020%252C1375466447768.1375630425141 > >> > ver = 0 > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: resubmitting > unassigned > >> > task(s) after timeout > >> > 2013-08-08 12:57:37,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-6.corp.oc.com%2C60020%2C1375466460755-splitting% > 2Fmb-6.corp.oc.com > >> > %252C60020%252C1375466460755.1375620714514 > >> > ver = 3 > >> > 2013-08-08 12:57:37,630 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-6.corp.oc.com%2C60020%2C1375924525310-splitting% > 2Fmb-6.corp.oc.com > >> > %252C60020%252C1375924525310.1375924529658 > >> > ver = 0 > >> > 2013-08-08 12:57:37,630 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-4.corp.oc.com%2C60020%2C1375466551673-splitting% > 2Fmb-4.corp.oc.com > >> > %252C60020%252C1375466551673.1375641592581 > >> > ver = 0 > >> > 2013-08-08 12:57:37,630 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-5.corp.oc.com%2C60020%2C1375924528073-splitting% > 2Fmb-5.corp.oc.com > >> > %252C60020%252C1375924528073.1375924532442 > >> > ver = 0 > >> > 2013-08-08 12:57:37,630 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-6.corp.oc.com%2C60020%2C1375466460755-splitting% > 2Fmb-6.corp.oc.com > >> > %252C60020%252C1375466460755.1375622290167 > >> > ver = 3 > >> > 2013-08-08 12:57:37,630 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-5.corp.oc.com%2C60020%2C1375466463385-splitting% > 2Fmb-5.corp.oc.com > >> > %252C60020%252C1375466463385.1375638183425 > >> > ver = 0 > >> > 2013-08-08 12:57:37,630 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-5.corp.oc.com%2C60020%2C1375466463385-splitting% > 2Fmb-5.corp.oc.com > >> > %252C60020%252C1375466463385.1375639599559 > >> > ver = 0 > >> > 2013-08-08 12:57:37,630 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired > >> > /hbase/splitlog/hdfs%3A%2F%2Fmb-1.corp.oc.com > %3A54310%2Fhbase%2F.logs% > >> > 2Fmb-5.corp.oc.com%2C60020%2C1375466463385-splitting% > 2Fmb-5.corp.oc.com > >> > %252C60020%252C1375466463385.1375641710787 > >> > ver = 3 > >> > 2013-08-08 12:57:37,633 INFO > >> > org.apache.hadoop.hbase.master.SplitLogManager: task > >> > /hbase/splitlog/RESCAN0000006975 entered state done mb-1.corp.oc.com > >> > ,60000,1375924508669 > >> > 2013-08-08 12:57:37,633 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: > >> deleted > >> > /hbase/splitlog/RESCAN0000006975 > >> > 2013-08-08 12:57:37,633 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: deleted task without > in > >> > memory state /hbase/splitlog/RESCAN0000006975 > >> > 2013-08-08 12:57:38,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:39,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:40,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:41,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:42,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:43,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:44,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:45,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:46,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:47,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:48,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:49,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:50,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:51,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:52,628 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:53,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:54,487 DEBUG > >> > > >> > > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > >> > Lookedup root region location, > >> > > >> > > >> > connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@24ddb5c9 > >> > ; > >> > serverName= > >> > 2013-08-08 12:57:54,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:55,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:56,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:57,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:58,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:57:59,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > 2013-08-08 12:58:00,629 DEBUG > >> > org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 14 > >> unassigned > >> > = 14 > >> > > >> > > >> > The cluster is unresponsive. I cannot access 4242 port on any of the > >> > cluster nodes. > >> > When I try to run tsdb command "tsdb uig grep metrics .", i am getting > >> > following error messages > >> > ERROR [main-EventThread] HBaseClient: The znode for the -ROOT- > region > >> > doesn't exist! > >> > ERROR [main-EventThread] HBaseClient: The znode for the -ROOT- > region > >> > doesn't exist! > >> > > >> > Could you please suggest me what I can do to stop it. > >> > > >> > Thanks in Advance. > >> > > >> > Regards, > >> > OC. > >> > > >> > > > > >
