Hi,
I set up a single node HBase server on top of Hadoop and it has been working
fine with most of my testing scenarios such as creating tables and inserting
data. Just during the weekend, I accidentally left a testing script running
that inserts about 67 rows every min for three days. Today when I looked at the
environment, I found out that HBase master could not be started anymore.
Digging into the logs, I could see that starting from the second day, HBase
first got an exception as follows:
2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350155105992,
entries=7981, filesize=3754556. for
/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350158707364
2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
moving old hlog file
/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442
whose highest sequenceid is 4 to
/tmp/hbase-root/hbase/.oldlogs/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442
2012-10-13 13:05:07,379 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
sflow-linux02.santanet.dell.com,47137,1348606516541: IOE in log roller
java.io.FileNotFoundException: File
file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:213)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163)
at
org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:287)
at
org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:428)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.archiveLogFile(HLog.java:825)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:708)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:603)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
at java.lang.Thread.run(Thread.java:662)
Then SplitLogManager kept splitting the logs for about two days:
2012-10-13 13:05:09,061 WARN org.apache.zookeeper.server.NIOServerCnxn: caught
end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x139ff3656b30003, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-10-13 13:05:09,061 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed
socket connection for client /127.0.0.1:52573 which had sessionid
0x139ff3656b30003
2012-10-13 13:05:09,082 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
2012-10-13 13:05:09,085 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs
for sflow-linux02.santanet.dell.com,47137,1348606516541
2012-10-13 13:05:09,086 INFO org.apache.hadoop.hbase.master.SplitLogManager:
dead splitlog worker sflow-linux02.santanet.dell.com,47137,1348606516541
2012-10-13 13:05:09,101 INFO org.apache.hadoop.hbase.master.SplitLogManager:
started splitting logs in
[file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541-splitting]
2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases:
RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker
closing leases
2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases:
RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker
closed leases
2012-10-13 13:08:09,275 INFO org.apache.hadoop.hbase.master.SplitLogManager:
task /hbase/splitlog/RESCAN0000000028 entered state done
sflow-linux02.santanet.dell.com,37015,1348606516151
2012-10-13 13:11:09,730 INFO org.apache.hadoop.hbase.master.SplitLogManager:
task /hbase/splitlog/RESCAN0000000029 entered state done
sflow-linux02.santanet.dell.com,37015,1348606516151
2012-10-13 13:14:10,171 INFO org.apache.hadoop.hbase.master.SplitLogManager:
task /hbase/splitlog/RESCAN0000000030 entered state done
sflow-linux02.santanet.dell.com,37015,1348606516151
When I tried to re-start HBase server today, the following exception occurs:
2012-10-15 11:54:10,122 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server localhost.localdomain/127.0.0.1:2181,
sessionid = 0x13a65c6a8090002, negotiated timeout = 40000
2012-10-15 11:54:10,124 INFO org.apache.hadoop.hbase.master.SplitLogManager:
found 0 orphan tasks and 0 rescan nodes
2012-10-15 11:54:10,238 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled
exception. Starting shutdown.
org.apache.hadoop.hbase.util.FileSystemVersionException: File system needs to
be upgraded. You have version null and I want version 7. Run the
'${HBASE_HOME}/bin/hbase migrate' script.
at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:245)
at
org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:347)
at
org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
Just wondering what happened and is there any way to recover from this
situation? Is re-installation of HBase my only choice at this moment?
Thanks very much,
YuLing