I've deploied a "2+4" cluster which has been normally running for a long 
time.
The cluster has got more than 40T data.When I initiatively shut the hbase 
service
and try to restart it,the regionserver will be dead.

    The log of regionserver shows that all the regions are opened. But in the 
logs of the datanode can see WARN and ERROR logs. 
    Bellow is the log for details:
    
    2014-11-07 14:47:21,584 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/10.230.63.12:50010, dest: /10.230.63.9:39405, bytes: 4696, op: HDFS_READ, 
cliID:                     
DFSClient_hb_rs_salve1,60020,1415342303886_-2037622978_29, offset: 31996928, 
srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid: 
BP-1731746090-10.230.63.3-    1406195669990:blk_1078709392_4968828, duration: 
7978822 
    2014-11-07 14:47:21,596 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: exception: 
    java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/10.230.63.12:50010     remote=/10.230.63.11:41511] 
    at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 
    at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
 
    at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
 
    at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
 
    at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
 
    at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
 
    at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
 
    at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
 
    at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) 
    at java.lang.Thread.run(Thread.java:744) 
2014-11-07 14:47:21,599 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/10.230.63.12:50010, dest: /10.230.63.11:41511, bytes: 726528, op: HDFS_READ, 
cliID: DFSClient_hb_rs_salve3,60020,1415342303807_1094119849_29, offset: 0, 
srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid: 
BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168, duration: 
480190668115 
2014-11-07 14:47:21,599 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.230.63.12, 
datanodeUuid=bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, infoPort=50075, 
ipcPort=50020, storageInfo=lv=-55;cid=cluster12;nsid=395652542;c=0):Got 
exception while serving 
BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168 to 
/10.230.63.11:41511 
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/10.230.63.12:50010 remote=/10.230.63.11:41511] 
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
 
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
 
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
 
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
 
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
 
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
 
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
 
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) 
at java.lang.Thread.run(Thread.java:744) 
2014-11-07 14:47:21,600 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
salve4:50010:DataXceiver error processing READ_BLOCK operation src: 
/10.230.63.11:41511 dest: /10.230.63.12:50010


    I personally think it was caused on the load on open stage,where the disk 
IO of the cluster can 
be very high and the pressure can be huge.

    I wonder what results in reading error while reading hfile,and what leads 
to timeout.
Are there any solutions that can control the speed of loading on open and 
reduce 
pressure of the cluster?

I need help ! 

Thanks!




[email protected]

Reply via email to