All - this issue showed up when I was tearing down a spark context and creating a new one. Often, I was unable to then write to HDFS due to this error. I subsequently switched to a different implementation where instead of tearing down and re initializing the spark context I'd instead submit a separate request to YARN. On Fri, May 15, 2015 at 2:35 PM Puneet Kapoor <puneet.cse.i...@gmail.com> wrote:
> I am seeing this on hadoop 2.4.0 version. > > Thanks for your suggestions, i will try those and let you know if they > help ! > > On Sat, May 16, 2015 at 1:57 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > >> What version of Hadoop are you seeing this on? >> >> >> On 15 May 2015, at 20:03, Puneet Kapoor <puneet.cse.i...@gmail.com> >> wrote: >> >> Hey, >> >> Did you find any solution for this issue, we are seeing similar logs in >> our Data node logs. Appreciate any help. >> >> >> >> >> >> 2015-05-15 10:51:43,615 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> NttUpgradeDN1:50010:DataXceiver error processing WRITE_BLOCK operation >> src: /192.168.112.190:46253 dst: /192.168.151.104:50010 >> java.net.SocketTimeoutException: 60000 millis timeout while waiting for >> channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected local=/192.168.151.104:50010 >> remote=/192.168.112.190:46253] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) >> at java.io.BufferedInputStream.fill(Unknown Source) >> at java.io.BufferedInputStream.read1(Unknown Source) >> at java.io.BufferedInputStream.read(Unknown Source) >> at java.io.DataInputStream.read(Unknown Source) >> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:742) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) >> at java.lang.Thread.run(Unknown Source) >> >> >> That's being logged @ error level in DN. It doesn't mean the DN has >> crashed, only that it timed out waiting for data: something has gone wrong >> elsewhere. >> >> https://issues.apache.org/jira/browse/HDFS-693 >> >> >> there's a couple of properties you can do to extend timeouts >> >> <property> >> >> <name>dfs.socket.timeout</name> >> >> <value>20000</value> >> >> </property> >> >> >> <property> >> >> <name>dfs.datanode.socket.write.timeout</name> >> >> <value>20000</value> >> >> </property> >> >> >> >> You can also increase the number of data node tranceiver threads to >> handle data IO across the network >> >> >> <property> >> <name>dfs.datanode.max.xcievers</name> >> <value>4096</value> >> </property> >> >> Yes, that property has that explicit spellinng, it's easy to get wrong >> >> >