Also, just to add, datanode transfer-threads is set to 16384 and handler count to 10.
Thanks, Hariharan On Sat, Jan 21, 2017 at 1:38 AM, Hariharan <[email protected]> wrote: > I had an application (hive command) fail with the following error: > > 2017-01-20 16:40:24,279 <some part of stack trace stripped> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: All datanodes 10.152.76.225:50010 are bad. > Aborting... > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:860) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.UnionOperator.process(UnionOperator.java:137) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) > > Investigating into the node affected (10.152.76.225), there was an > error just before the one logged above: > 2017-01-20 16:40:23,743 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for > BP-1380197209-10.152.76.187-1484669381166:blk_1074615677_876595 > java.net.SocketTimeoutException: 60000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/10.152.76.225:50010 > remote=/10.152.76.226:33610] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:734) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) > > And following further into the chain (10.152.76.226), there was an > error about a minute before the stack above: > 2017-01-20 16:39:00,223 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > ip-10-152-76-226.us-west-2.compute.internal:50010:DataXceiver error > processing WRITE_BLOCK operation src: /10.152.76.226:46168 dst: > /10.152.76.226:50010 > java.net.SocketTimeoutException: 60000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/10.152.76.226:50010 > remote=/10.152.76.226:46168] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:734) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) > > Note that in this stack the src and dest are both same, so even a n/w > error seems improbable. > I checked the obvious possibilities - disk space (~1.8TB available), > ulimit (set to 32768) on both nodes and they seemed ok. I'm at a loss > as to why this error happens. It happens quite frequently as well. Any > ideas on why this would happen, or what I should check next? > > Environment: > Hadoop 2.6 > Centos Linux > Running on AWS > > Thanks, > Hariharan --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
