Hi all !

I'm getting trouble with my HBase as the following error appears more and more often (each 2 to 15 mins on each node):

2012-06-25 10:25:30,646 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.120.0.5:50010, storageID=DS-1339564791-127.0.0.1-50010-1296151113818, infoPort=50075, ipcPort=50020):Got exception while serving blk_4839251368515801234_555101 to /10.120.0.5: java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.120.0.5:50010 remote=/10.120.0.5:42564] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

2012-06-25 10:25:30,646 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.120.0.5:50010, storageID=DS-1339564791-127.0.0.1-50010-1296151113818, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.120.0.5:50010 remote=/10.120.0.5:42564] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)



You might have guessed that local machine is 10.120.0.5. Unsuprisingly, process on port 50010 is the datanode. Port 42564 is changing depending on the error instance, and seems to correspond to the regionserver process. If I ask for processes connected to port 50010 using an 'lsof -i :50010', I have an impressive number of sockets (#400). Is it normal ?

I need to add that current load (requests, IOs, CPU, ...) is rather slow.

I can't find any other error in namenode or regionserver logs.

All the best,

Frédéric.

Reply via email to