Hi all !
I'm getting trouble with my HBase as the following error appears more
and more often (each 2 to 15 mins on each node):
2012-06-25 10:25:30,646 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.120.0.5:50010,
storageID=DS-1339564791-127.0.0.1-50010-1296151113818, infoPort=50075,
ipcPort=50020):Got exception while serving
blk_4839251368515801234_555101 to /10.120.0.5:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.120.0.5:50010
remote=/10.120.0.5:42564]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)
2012-06-25 10:25:30,646 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.120.0.5:50010,
storageID=DS-1339564791-127.0.0.1-50010-1296151113818, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.120.0.5:50010
remote=/10.120.0.5:42564]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)
You might have guessed that local machine is 10.120.0.5. Unsuprisingly,
process on port 50010 is the datanode. Port 42564 is changing depending
on the error instance, and seems to correspond to the regionserver
process. If I ask for processes connected to port 50010 using an 'lsof
-i :50010', I have an impressive number of sockets (#400). Is it normal ?
I need to add that current load (requests, IOs, CPU, ...) is rather slow.
I can't find any other error in namenode or regionserver logs.
All the best,
Frédéric.