I've deploied a "2+4" cluster which has been normally running for a long
time.
The cluster has got more than 40T data.When I initiatively shut the hbase
service
and try to restart it,the regionserver will be dead.
The log of regionserver shows that all the regions are opened. But in the
logs of the datanode can see WARN and ERROR logs.
Bellow is the log for details:
2014-11-07 14:47:21,584 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/10.230.63.12:50010, dest: /10.230.63.9:39405, bytes: 4696, op: HDFS_READ,
cliID:
DFSClient_hb_rs_salve1,60020,1415342303886_-2037622978_29, offset: 31996928,
srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid:
BP-1731746090-10.230.63.3- 1406195669990:blk_1078709392_4968828, duration:
7978822
2014-11-07 14:47:21,596 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: exception:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/10.230.63.12:50010 remote=/10.230.63.11:41511]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
at java.lang.Thread.run(Thread.java:744)
2014-11-07 14:47:21,599 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/10.230.63.12:50010, dest: /10.230.63.11:41511, bytes: 726528, op: HDFS_READ,
cliID: DFSClient_hb_rs_salve3,60020,1415342303807_1094119849_29, offset: 0,
srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid:
BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168, duration:
480190668115
2014-11-07 14:47:21,599 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.230.63.12,
datanodeUuid=bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, infoPort=50075,
ipcPort=50020, storageInfo=lv=-55;cid=cluster12;nsid=395652542;c=0):Got
exception while serving
BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168 to
/10.230.63.11:41511
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/10.230.63.12:50010 remote=/10.230.63.11:41511]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
at java.lang.Thread.run(Thread.java:744)
2014-11-07 14:47:21,600 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
salve4:50010:DataXceiver error processing READ_BLOCK operation src:
/10.230.63.11:41511 dest: /10.230.63.12:50010
I personally think it was caused on the load on open stage,where the disk
IO of the cluster can
be very high and the pressure can be huge.
I wonder what results in reading error while reading hfile,and what leads
to timeout.
Are there any solutions that can control the speed of loading on open and
reduce
pressure of the cluster?
I need help !
Thanks!
[email protected]