Is there any complain in HDFS log ? ________________________________________ 发件人: [email protected] [[email protected]] 发送时间: 2012年10月16日 4:35 收件人: [email protected] 主题: RE: could not start HMaster
No, I don't think so. This is a dedicated testing machine and no automatic cleaning up on the /tmp folder... Thanks, YuLing -----Original Message----- From: Jimmy Xiang [mailto:[email protected]] Sent: Monday, October 15, 2012 1:32 PM To: [email protected] Subject: Re: could not start HMaster Is your /tmp folder cleaned up automatically and some files are gone? Thanks, Jimmy On Mon, Oct 15, 2012 at 12:26 PM, <[email protected]> wrote: > Hi, > > I set up a single node HBase server on top of Hadoop and it has been working > fine with most of my testing scenarios such as creating tables and inserting > data. Just during the weekend, I accidentally left a testing script running > that inserts about 67 rows every min for three days. Today when I looked at > the environment, I found out that HBase master could not be started anymore. > Digging into the logs, I could see that starting from the second day, HBase > first got an exception as follows: > > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > Roll > /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350155105992, > entries=7981, filesize=3754556. for > /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350158707364 > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > moving old hlog file > /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 > whose highest sequenceid is 4 to > /tmp/hbase-root/hbase/.oldlogs/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 > 2012-10-13 13:05:07,379 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > sflow-linux02.santanet.dell.com,47137,1348606516541: IOE in log roller > java.io.FileNotFoundException: File > file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 > does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:213) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:287) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:428) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.archiveLogFile(HLog.java:825) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:708) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:603) > at > org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) > at java.lang.Thread.run(Thread.java:662) > > Then SplitLogManager kept splitting the logs for about two days: > 2012-10-13 13:05:09,061 WARN org.apache.zookeeper.server.NIOServerCnxn: > caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid > 0x139ff3656b30003, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224) > at java.lang.Thread.run(Thread.java:662) > 2012-10-13 13:05:09,061 INFO org.apache.zookeeper.server.NIOServerCnxn: > Closed socket connection for client /127.0.0.1:52573 which had sessionid > 0x139ff3656b30003 > 2012-10-13 13:05:09,082 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2012-10-13 13:05:09,085 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs > for sflow-linux02.santanet.dell.com,47137,1348606516541 > 2012-10-13 13:05:09,086 INFO org.apache.hadoop.hbase.master.SplitLogManager: > dead splitlog worker sflow-linux02.santanet.dell.com,47137,1348606516541 > 2012-10-13 13:05:09,101 INFO org.apache.hadoop.hbase.master.SplitLogManager: > started splitting logs in > [file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541-splitting] > 2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases: > RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker > closing leases > 2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases: > RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker > closed leases > 2012-10-13 13:08:09,275 INFO org.apache.hadoop.hbase.master.SplitLogManager: > task /hbase/splitlog/RESCAN0000000028 entered state done > sflow-linux02.santanet.dell.com,37015,1348606516151 > 2012-10-13 13:11:09,730 INFO org.apache.hadoop.hbase.master.SplitLogManager: > task /hbase/splitlog/RESCAN0000000029 entered state done > sflow-linux02.santanet.dell.com,37015,1348606516151 > 2012-10-13 13:14:10,171 INFO org.apache.hadoop.hbase.master.SplitLogManager: > task /hbase/splitlog/RESCAN0000000030 entered state done > sflow-linux02.santanet.dell.com,37015,1348606516151 > > When I tried to re-start HBase server today, the following exception occurs: > 2012-10-15 11:54:10,122 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server localhost.localdomain/127.0.0.1:2181, > sessionid = 0x13a65c6a8090002, negotiated timeout = 40000 > 2012-10-15 11:54:10,124 INFO org.apache.hadoop.hbase.master.SplitLogManager: > found 0 orphan tasks and 0 rescan nodes > 2012-10-15 11:54:10,238 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.util.FileSystemVersionException: File system needs to > be upgraded. You have version null and I want version 7. Run the > '${HBASE_HOME}/bin/hbase migrate' script. > at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:245) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:347) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127) > > > Just wondering what happened and is there any way to recover from this > situation? Is re-installation of HBase my only choice at this moment? > > Thanks very much, > > YuLing
