Re: HMaster and HRegionServer quit automately in a few seconds after HBase started.

Ted Yu Tue, 15 Jul 2014 07:09:35 -0700

bq. 2014-07-15 20:27:21,471 INFO  [main-SendThread(localhost:2181)]

master tried to connect to localhost.


Please take a look at http://hbase.apache.org/book.html#trouble.zookeeper


On Tue, Jul 15, 2014 at 6:13 AM, psy <[email protected]> wrote:

> Hi, everyone. I'm a student and I'm a beginner to HBase. This days I
> meet a problem when I tried to run HBase in three machines. Hadoop run's
> well, but when I start HBase, the "HMaster" in master node and
> "HRegionServer" in slave nodes quit after a few seconds. In the master
> node, jps is like this:
>
>         hadoop@psyDebian:/opt$ jps
>         5416 NameNode
>         5647 SecondaryNameNode
>         5505 DataNode
>         398 Jps
>         32745 HMaster
>         32670 HQuorumPeer
>
> and just for a while, it is like this:
>
>         hadoop@psyDebian:/opt$ jps
>         5416 NameNode
>         5647 SecondaryNameNode
>         5505 DataNode
>         423 Jps
>         32670 HQuorumPeer
>
> the master log:
>
> hadoop@psyDebian:/opt$ tail -n
> 30 /opt/hbase/logs/hbase-hadoop-master-psyDebian.log
> 2014-07-15 20:27:21,470 INFO  [main-SendThread(localhost:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 2014-07-15 20:27:21,471 INFO  [main-SendThread(localhost:2181)]
> zookeeper.ClientCnxn: Socket connection established to
> localhost/127.0.0.1:2181, initiating session
> 2014-07-15 20:27:21,471 INFO  [main-SendThread(localhost:2181)]
> zookeeper.ClientCnxn: Unable to read additional data from server
> sessionid 0x0, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2014-07-15 20:27:21,572 WARN  [main] zookeeper.RecoverableZooKeeper:
> Possibly transient ZooKeeper,
> quorum=centos1:2181,psyDebian:2181,centos2:2181,
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase
> 2014-07-15 20:27:21,572 ERROR [main] zookeeper.RecoverableZooKeeper:
> ZooKeeper create failed after 4 attempts
> 2014-07-15 20:27:21,572 ERROR [main] master.HMasterCommandLine: Master
> exiting
> java.lang.RuntimeException: Failed construction of Master: class
> org.apache.hadoop.hbase.master.HMaster
>         at
> org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2789)
>         at
>
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:186)
>         at
>
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:135)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at
>
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
>         at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2803)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:489)
>         at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:468)
>         at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1241)
>         at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1219)
>         at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:174)
>         at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:167)
>         at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:481)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:534)
>         at
> org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2784)
>
>
>
> the "out" log:
> hadoop@psyDebian:/opt$
> tail /opt/hbase/logs/hbase-hadoop-master-psyDebian.out
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/opt/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
>
> the "zookeeper" log:
> 2014-07-15 20:48:20,572 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
> quorum.FollowerZooKeeperServer: Shutting down
> 2014-07-15 20:48:20,573 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
> server.ZooKeeperServer: shutting down
> 2014-07-15 20:48:20,573 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
> quorum.QuorumPeer: LOOKING
> 2014-07-15 20:48:20,574 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
> quorum.QuorumPeer: acceptedEpoch not found! Creating with a reasonable
> default of 0. This should only happen when you are upgrading your
> installation
> 2014-07-15 20:48:20,625 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
> quorum.FastLeaderElection: New election. My id =  0, proposed zxid=0x0
> 2014-07-15 20:48:20,626 INFO  [WorkerReceiver[myid=0]]
> quorum.FastLeaderElection: Notification: 0 (n.leader), 0x0 (n.zxid),
> 0x57 (n.round), LOOKING (n.state), 0 (n.sid), 0x0 (n.peerEPoch), LOOKING
> (my state)
> 2014-07-15 20:48:20,627 INFO  [WorkerReceiver[myid=0]]
> quorum.FastLeaderElection: Notification: 2 (n.leader), 0x0 (n.zxid),
> 0x55 (n.round), LEADING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING
> (my state)
> 2014-07-15 20:48:20,627 INFO  [WorkerReceiver[myid=0]]
> quorum.FastLeaderElection: Notification: 1 (n.leader), 0x0 (n.zxid),
> 0x56 (n.round), LEADING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING
> (my state)
> 2014-07-15 20:48:20,827 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
> quorum.FastLeaderElection: Notification time out: 400
> 2014-07-15 20:48:20,827 INFO  [WorkerReceiver[myid=0]]
> quorum.FastLeaderElection: Notification: 0 (n.leader), 0x0 (n.zxid),
> 0x57 (n.round), LOOKING (n.state), 0 (n.sid), 0x0 (n.peerEPoch), LOOKING
> (my state)
> 2014-07-15 20:48:20,828 INFO  [WorkerReceiver[myid=0]]
> quorum.FastLeaderElection: Notification: 2 (n.leader), 0x0 (n.zxid),
> 0x55 (n.round), LEADING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING
> (my state)
> 2014-07-15 20:48:20,828 INFO  [WorkerReceiver[myid=0]]
> quorum.FastLeaderElection: Notification: 1 (n.leader), 0x0 (n.zxid),
> 0x56 (n.round), LEADING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING
> (my state)
> 2014-07-15 20:48:21,229 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
> quorum.FastLeaderElection: Notification time out: 800
>
>
> These are my configuration files:
>
> core-site.xml:
> <configuration>
>         <property>
>                 <name>fs.default.name</name>
>                 <value>hdfs://psyDebian:9000</value>
>         </property>
>
>         <property>
>                 <name>hadoop.tmp.dir</name>
>                 <value>/home/hadoop/hadoop_tmp</value>
>         </property>
> </configuration>
>
> hdfs-site.xml:
> <configuration>
>         <property>
>                 <name>dfs.datanode.data.dir</name>
>                 <value>/home/hadoop/hadoop_tmp/dfs/data</value>
>         </property>
>
>         <property>
>                 <name>dfs.namenode.name.dir</name>
>                 <value>/home/hadoop/hadoop_tmp/dfs/name</value>
>         </property>
>
>         <property>
>                 <name>dfs.replication</name>
>                 <value>3</value>
>         </property>
> </configuration>
>
> hbase-site.xml:
> <configuration>
>         <property>
>                 <name>hbase.rootdir</name>
>                 <value>hdfs://psyDebian:9000/hbase</value>
>         </property>
>
>         <property>
>                 <name>hbase.cluster.distributed</name>
>                 <value>true</value>
>         </property>
>
>         <property>
>                 <name>hbase.master</name>
>                 <value>psyDebian:60000</value>
>         </property>
>
>         <property>
>                 <name>hbase.zookeeper.quorum</name>
>                 <value>psyDebian,centos1,centos2</value>
>         </property>
>
>         <property>
>                 <name>hbase.zookeeper.property.dataDir</name>
>                 <value>/home/hadoop/zookeeper_tmp</value>
>         </property>
>
>         <property>
>                 <name>zookeeper.session.timeout</name>
>                 <value>90000</value>
>         </property>
>
>         <property>
>                 <name>hbase.reginserver.restart.on.zk.expire</name>
>                 <value>true</value>
>         </property>
> </configuration>
>
>
>
> The master node is Debian 7.5, and two slaves are both centos 6.5.
> Hadoop is 2.2.0 and Hbase is 0.98.3. The time of three machines are
> synchronized and firewalls(iptables) are closed. Java's version is
> java-1.6.0-openjdk. I'm not very familiar with HBase so I can't
> understand the ERRORS from the logs, and I didn't get any useful
> information from the Internet these days. could you help me? or tells me
> what should I do to find out the reason of this problem?
> Thank you so much.
>
>
>

Re: HMaster and HRegionServer quit automately in a few seconds after HBase started.

Reply via email to