Hi, I'm seeing this exception on every HDFS node once in a while on one cluster:
2015-05-26 13:37:31,831 INFO datanode.DataNode (BlockSender.java:sendPacket(566)) - Failed to send data: java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/ 172.22.5.34:31684] 2015-05-26 13:37:31,831 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.22.5.34:50010, dest: / 172.22.5.34:31684, bytes: 12451840, op: HDFS_READ, cliID: DFSClient_hb_rs_my-hadoop-node-fqdn,60020,1432041913240_-1351889511_35, offset: 47212032, srvID: 9bfc58b8-94b0-40a5-ba33-6d712fa1faa2, blockid: BP-1988583858-172.22.5.40-1424448407690:blk_1105314202_31576629, duration: 10486866121 2015-05-26 13:37:31,831 WARN datanode.DataNode (DataXceiver.java:readBlock(541)) - DatanodeRegistration(172.22.5.34, datanodeUuid=9bfc58b8-94b0-40a5-ba33-6d712fa1faa2, infoPort=50075, ipcPort=8010, storageInfo=lv=-55;cid=CID-962af1ea-201a-4d27-ae80-e4a7b712f1ac;nsid=109597947;c=0):Got exception while serving BP-1988583858-172.22.5.40-1424448407690:blk_1105314202_31576629 to / 172.22.5.34:31684 java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/ 172.22.5.34:31684] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) 2015-05-26 13:37:31,831 ERROR datanode.DataNode (DataXceiver.java:run(250)) - my-hadoop-node-fqdn:50010:DataXceiver error processing READ_BLOCK operation src: /172.22.5.34:31684 dst: /172.22.5.34:50010 java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/ 172.22.5.34:31684] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) ...and it's basically only complaining about itself. On same node there's HDFS, RegionServer and Yarn. I'm struggling little bit how to interpret this. Funny thing is that this is our live cluster, the one where we are writing everything. Thinking if it's possible that HBase flush size (256M) is problem while block size is 128M. Any advice where to look is welcome! Thanks, Dejan
