We have a 6 node cluster, 5 with region serves. 2 of the region servers have
been stable for days, but 3 of them keep crashing. Here are the logs around
around when the crash occurs. (btw, we are shoving approximately the twitter
firehose into hbase via flume) I'm an hbase newbie, but I have been reading.
Not sure what else is needed to help debug. When the problem occurs, it is
concurrent with issues that appear all the way down in the data node. Also,
interesting to note, all 3 servers seem to fail differently somewhat:
Servers (6):
00.hadoop is the master
01.hadoop, 03.hadoop (there is no 02.hadoop - flaky machine that got taken out)
- are stable Region Servers
04-06.hadoop crash at different times, very different logs, Region Servers
Master Attributes:
HBase Version 0.89.20100924+28, r
Hadoop Version 0.20.2+737, r98c55c28258aa6f42250569bd7fa431ac657bdbd
HBase Root Directory hdfs://00.hadoop.****:54310/hbase
Load average
103.0
Average number of regions per regionserver. Naive computation.
Regions On FS
577
(This is after recently restarting 04-06)
Address Start Code Load
01.hadoop:60030 1291268910638 requests=223, regions=118, usedHeap=519,
maxHeap=987
03.hadoop:60030 1291269219610 requests=23, regions=111, usedHeap=862,
maxHeap=987
04.hadoop:60030 1291910665912 requests=169, regions=82, usedHeap=194,
maxHeap=888
05.hadoop:60030 1291909584060 requests=232, regions=110, usedHeap=477,
maxHeap=888
06.hadoop60030 1291909723787 requests=99, regions=94, usedHeap=394,
maxHeap=888
Total:
servers: 5
requests=746, regions=515
Following are various logs around the time of the failures for 04-06
************************************************************************************
04.hadoop
************************************************************************************
REGION SERVER ->
2010-12-09 05:01:58,640 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
article,a83858a08f2270d319f75a7b43c756c2453988e7,1291811929868.58f6d9fc80
c78f3ca490b0280b4f1226. in 0sec
2010-12-09 05:04:30,898 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
LRU Stats: total=1.46 MB, free=176.33 MB, max=177.79 MB, blocks=0,
accesses=200583, hits=0, hitRatio=
0.00%%, evictions=0, evicted=0, evictedPerRun=NaN
2010-12-09 05:04:59,612 INFO
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs --
HDFS-200
2010-12-09 05:04:59,618 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/hbase/.logs/04.hadoop.****,60020,1291858770125/10.100.154.103%3A60020.1291870585253,
entries=5318
6, filesize=63776311. New hlog
/hbase/.logs/04.hadoop.****,60020,1291858770125/10.100.154.103%3A60020.1291871099582
2010-12-09 05:08:02,033 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Found 1 hlogs to remove out of total 6; oldest outstanding sequenceid is
32891801 from region article
,4b2039b791e894dd479b90661ca97087f61645d3,1291813201039.7dff47c63bb5648bb1ee6670c60553a5.
2010-12-09 05:08:02,034 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC
Server handler 13 on 60020 took 182416ms appending an edit to hlog; editcount=0
2010-12-09 05:08:02,066 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
moving old hlog file
/hbase/.logs/04.hadoop.****,60020,1291858770125/10.100.154.103%3A60020.12918679476
65 whose highest sequenceid is 32880227 to
/hbase/.oldlogs/10.100.154.103%3A60020.1291867947665
2010-12-09 05:08:02,066 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 182936ms for sessionid
0x12ca58c08ff0048, closing socket conn
ection and attempting reconnect
2010-12-09 05:08:02,089 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: Aborting region server
serverName=04.hadoop.****,60020,1291858770125, load=(requests=24, regions=
64, usedHeap=548, maxHeap=888): Unhandled exception
org.apache.hadoop.hbase.YouAreDeadException:
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently
processing 04.hadoop.****,60020,1291858770125 as dead s
erver
at
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:217)
at
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:271)
at
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:744)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:561)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.doRun(HRegionServer.java:635)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.access$000(HRegionServer.java:126)
at
org.apache.hadoop.hbase.regionserver.HRegionServer$1.run(HRegionServer.java:518)
at
org.apache.hadoop.hbase.regionserver.HRegionServer$1.run(HRegionServer.java:516)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:337)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1046)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:516)
at java.lang.Thread.run(Thread.java:662)
2010-12-09 05:08:02,090 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=64, stores=64, storefiles=136, storefileIndexSize=27,
memsto
reSize=353, compactionQueueSize=0, usedHeap=549, maxHeap=888,
blockCacheSize=1530552, blockCacheFree=184893160, blockCacheCount=0,
blockCacheHitRatio=0
2010-12-09 05:08:02,090 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server
on 60020
2010-12-09 05:08:02,090 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 0 on 60020: exiting2010-12-09 05:08:02,090 INFO
org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60020
MASTER HBASE ->
2010-12-09 05:05:06,072 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1
.META. region(s) scanned
2010-12-09 05:05:40,004 INFO org.apache.hadoop.hbase.master.ServerManager:
04.hadoop.jive,60020,1291858770125 znode expired
2010-12-09 05:05:40,005 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=04.hadoop.jive,60020,1291858770125 to dead servers, added shutdown
processing operation
2010-12-09 05:05:40,005 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing todo:
ProcessServerShutdown of 04.hadoop.jive,60020,12918587701252010-12-09
05:05:40,005 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Process
shutdown of server 04.hadoop.jive,60020,1291858770125: logSplit: false,
rootRescanned: f
alse, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2010-12-09 05:05:40,008 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Splitting 7 hlog(s) in
hdfs://00.hadoop.jive:54310/hbase/.logs/04.hadoop.jive,60020,12918587701252010-12-09
05:05:40,008 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting
hlog 1 of 7:
hdfs://00.hadoop.jive:54310/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.1
00.154.103%3A60020.1291867947665, length=637577092010-12-09 05:05:40,008 INFO
org.apache.hadoop.hbase.util.FSUtils: Recovering
filehdfs://00.hadoop.jive:54310/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020
.1291867947665
NAMENODE ->
2010-12-09 05:08:02,471 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 10.100.159.13:50010 is added to
blk_1531008743226086399_251615 size 63757709
2010-12-09 05:08:02,473 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9
on 54310, call
complete(/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020.1291867947665,
DFSClient_hb_m_10.194.194.79:60000_1291788452343) from 10.194.194.79:44117:
error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
on
/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020.1291867947665
File does not exist. [Lease. Holder:
DFSClient_hb_m_10.194.194.79:60000_1291788452343, pendingcreates: 1]
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020.1291867947665
File does not exist. [Lease. Holder:
DFSClient_hb_m_10.194.194.79:60000_1291788452343, pendingcreates: 1]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1488)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1479)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1534)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1522)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:610)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1313)
2010-12-09 05:08:04,206 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 10.99.97.106:50010 is added to
blk_-734280257049179934_251614 size 7330
DATANODE ->
2010-12-09 05:08:02,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
IOException in BlockReceiver.lastNodeRun: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100) at
sun.nio.ch.IOUtil.write(IOUtil.java:71)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:133)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:840)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
at java.lang.Thread.run(Thread.java:662)
2010-12-09 05:08:02,213 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
checkDiskError: exception:
2010-12-09 05:08:02,213 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
checkDiskError: exception: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:71)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:133)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:840)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
at java.lang.Thread.run(Thread.java:662)
2010-12-09 05:08:02,221 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder blk_-8817504198034990390_251613 0 Exception
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:71)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:133)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:840)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
at java.lang.Thread.run(Thread.java:662)
2010-12-09 05:08:02,221 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder 0 for block blk_-8817504198034990390_251613 terminating
2010-12-09 05:08:02,224 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
writeBlock blk_-8817504198034990390_251613 received exception
java.io.IOException: Connection reset by peer
2010-12-09 05:08:02,224 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Scheduling block blk_4505306716668305567_250752 file
/dist1/data/hadoop-data/current/subdir19/blk_4505306716668305567 for deletion
2010-12-09 05:08:02,226 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.100.154.103:50010,
storageID=DS-866555524-10.100.154.103-50010-1291262739506, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
at sun.nio.ch.IOUtil.read(IOUtil.java:210)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:267)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
2010-12-09 05:08:02,226 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Deleted block blk_1422641301942366074_250732 at file
/dist1/data/hadoop-data/current/subdir19/blk_1422641301942366074
2010-12-09 05:08:02,227 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Deleted block blk_4505306716668305567_250752 at file
/dist1/data/hadoop-data/current/subdir19/blk_4505306716668305567
************************************************************************************
05.hadoop
************************************************************************************
REGION SERVER ->
2010-12-09 04:29:25,047 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compaction size of data: 250.5m; Skipped 1 file(s), size: 206896367
2010-12-09 04:29:25,047 INFO org.apache.hadoop.hbase.regionserver.Store:
Started compaction of 2 file(s) in data of
article,060c4b75067378c045eead9edec2d4fb7d8f3e5e,1291812563166.95e92e97823e11e57850720a63433a62.
into
hdfs://00.hadoop.****:54310/hbase/article/95e92e97823e11e57850720a63433a62/.tmp,
sequenceid=33050221
2010-12-09 04:29:26,328 INFO org.apache.hadoop.hbase.regionserver.Store:
Completed compaction of 2 file(s) in data of
article,060c4b75067378c045eead9edec2d4fb7d8f3e5e,1291812563166.95e92e97823e11e57850720a63433a62.;
new storefile is
hdfs://00.hadoop.****:54310/hbase/article/95e92e97823e11e57850720a63433a62/data/2787768957794924068;
store size is 250.5m
2010-12-09 04:29:26,328 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
article,060c4b75067378c045eead9edec2d4fb7d8f3e5e,1291812563166.95e92e97823e11e57850720a63433a62.
in 1sec
2010-12-09 04:29:26,328 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
article,ed0cdc7ae6db73432e2ca47c973d545bee11ddd8,1291813400028.85a8d31d099f409dd36f3bcc4be1eba6.
2010-12-09 04:29:26,330 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
article,ed0cdc7ae6db73432e2ca47c973d545bee11ddd8,1291813400028.85a8d31d099f409dd36f3bcc4be1eba6.
in 0sec
2010-12-09 04:32:33,081 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
2010-12-09 04:32:33,081 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server
on 60020
2010-12-09 04:32:33,089 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream
ResponseProcessor exception for block
blk_2012253533858803341_250248java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2621)
2010-12-09 04:32:33,089 INFO org.apache.zookeeper.ClientCnxn: Unable to read
additional data from server sessionid 0x12ca58c08ff0049, likely server has
closed socket, closing sock
et connection and attempting reconnect
2010-12-09 04:32:33,089 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC
Server handler 5 on 60020 took 75446ms appending an edit to hlog;
editcount=50505
2010-12-09 04:32:33,090 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 18 on 60020: exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block blk_2012253533858803341_250248 waiting for responder to exit.
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 1 on 60020: exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 10 on 60020: exiting
2010-12-09 04:32:33,140 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 0 on 60020: exiting2010-12-09 04:32:33,140 INFO
org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 5 on 60020: exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 8 on 60020: exiting2010-12-09 04:32:33,140 INFO
org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 4 on 60020: exiting
2010-12-09 04:32:33,140 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 3 on 60020: exiting2010-12-09 04:32:33,142 INFO
org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 7 on 60020: exiting
2010-12-09 04:32:33,142 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 9 on 60020: exiting2010-12-09 04:32:33,143 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2010-12-09 04:32:33,143 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 2 on 60020: exiting
2010-12-09 04:32:33,143 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2010-12-09 04:32:33,143 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 6 on 60020: exiting
2010-12-09 04:32:33,144 INFO org.mortbay.log: Stopped
[email protected]:60030
NAMENODE ->
2010-12-09 04:32:08,408 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.startFile: failed to create file
/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79,
because this file is already being created by NN_Recovery on 10.100.159.13
2010-12-09 04:32:08,408 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6
on 54310, call
append(/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555,
DFSClient_hb_m_10.194.194.79:60000_1291788452343) from 10.194.194.79:59118:
error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file /hba
se/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79,
because this file is already being created by NN_Recovery on 10.100.159.13
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create
file
/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79,
because this file is already being created by NN_Recovery on 10.100.159.13
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1194)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1282)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:541)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1313)2010-12-09
04:32:09,411 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.startFile: failed to create file
/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A6
0020.1291868481555 for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on
client 10.194.194.79, because this file is already being created by NN_Recovery
on 10.100.159.132010-12-09 04:32:09,411 INFO org.apache.hadoop.ipc.Server: IPC
Server handler 9 on 54310, call
append(/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868
481555, DFSClient_hb_m_10.194.194.79:60000_1291788452343) from
10.194.194.79:59118: error:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create
file
/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79,
because this file is
already being created by NN_Recovery on
10.100.159.13org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file
/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClie
nt_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79, because this
file is already being created by NN_Recovery on 10.100.159.13 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1194)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1282)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:541)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)
DATANODE
2010-12-09 04:32:09,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder blk_2012253533858803341_250248 1 : Thread is interrupted.
2010-12-09 04:32:09,534 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.100.159.13:50010,
storageID=DS-145997007-10.100.159.13-50010-1291262743182, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.IOException: Interrupted receiveBlock
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:579)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
2010-12-09 04:32:09,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder 1 for block blk_2012253533858803341_250248 terminating
************************************************************************************
06.hadoop
************************************************************************************
REGION SERVER ->
2010-12-09 06:17:47,215 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,430 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,549 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,557 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,563 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,703 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,703 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,703 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,705 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,729 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
A bunch of these ^
Then a little later::
....
2010-12-09 06:23:23,561 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested
for region article,993143fca3613014990cc6975cacdd62f4147e03,12918133247
53.19413f33fc660536d9d4f315a1283440. because: regionserver60020.cacheFlusher
2010-12-09 06:23:27,058 INFO
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs --
HDFS-200
2010-12-09 06:23:27,080 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/hbase/.logs/06.hadoop.****,60020,1291864620851/10.99.97.106%3A60020.1291875047947,
entries=53315,
filesize=63781925. New hlog
/hbase/.logs/06.hadoop.****,60020,1291864620851/10.99.97.106%3A60020.1291875807006
2010-12-09 06:23:27,080 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Found 1 hlogs to remove out of total 8; oldest outstanding sequenceid is
33526881 from region article
,1221b0013c61fd7446cb822f698d0bcafa487342,1291872610788.bb2aa6ae8c694a096a00e7bb91fc46c3.
2010-12-09 06:23:27,080 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
moving old hlog file
/hbase/.logs/06.hadoop.****,60020,1291864620851/10.99.97.106%3A60020.1291872064424
whose highest sequenceid is 33506639 to
/hbase/.oldlogs/10.99.97.106%3A60020.1291872064424
2010-12-09 06:25:39,162 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC
Server handler 12 on 60020 took 119034ms appending an edit to hlog;
editcount=3683
2010-12-09 06:25:39,162 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:71)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2509)
2010-12-09 06:25:39,163 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream
ResponseProcessor exception for block
blk_1141147913164105086_258331java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2621)
2010-12-09 06:25:39,163 INFO org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block blk_1141147913164105086_258331 waiting for responder to exit.
2010-12-09 06:25:39,164 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream
ResponseProcessor exception for block
blk_-7076574401277381592_258324java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2621)
2010-12-09 06:25:39,164 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block blk_-7076574401277381592_258324 bad datanode[0] 10.99.97.106:50010
2010-12-09 06:25:39,164 INFO org.apache.zookeeper.ClientCnxn: Unable to read
additional data from server sessionid 0x12ca58c08ff004b, likely server has
closed socket, closing socket connection and attempting reconnect
2010-12-09 06:25:39,164 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block blk_-7076574401277381592_258324 in pipeline 10.99.97.106:50010,
10.100.154.103:50010: bad datanode 10.99.97.106:50010
2010-12-09 06:25:39,165 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block blk_1141147913164105086_258331 bad datanode[0] 10.99.97.106:50010
2010-12-09 06:25:39,167 INFO org.apache.zookeeper.ClientCnxn: Unable to read
additional data from server sessionid 0x12ca58c08ff004c, likely server has
closed socket, closing socket connection and attempting reconnect
2010-12-09 06:25:39,169 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: /10.100.154.103:50020. Already tried 0 time(s).
2010-12-09 06:25:39,169 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block blk_1141147913164105086_258331 in pipeline 10.99.97.106:50010,
10.100.159.13:50010: bad datanode 10.99.97.106:50010
2010-12-09 06:25:39,209 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server
on 60020
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 24 on 60020: exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 1 on 60020: exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 15 on 60020: exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 0 on 60020: exiting
2010-12-09 06:25:39,214 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 3 on 60020: exiting
2010-12-09 06:25:39,214 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 60020
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 2 on 60020: exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 7 on 60020: exiting
2010-12-09 06:25:39,213 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 21 on 60020: exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 4 on 60020: exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 5 on 60020: exiting
2010-12-09 06:25:39,216 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 8 on 60020: exiting
2010-12-09 06:25:39,216 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 9 on 60020: exiting
2010-12-09 06:25:39,216 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 6 on 60020: exiting
2010-12-09 06:25:39,216 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2010-12-09 06:25:39,223 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery
attempt #0 from primary datanode 10.100.159.13:50010