Gopal V created HDFS-13010: ------------------------------ Summary: DataNode: Listen queue is always 128 Key: HDFS-13010 URL: https://issues.apache.org/jira/browse/HDFS-13010 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Gopal V
DFS write-heavy workloads are failing with {code} 18/01/11 05:02:34 INFO mapreduce.Job: Task Id : attempt_1515660475578_0007_m_000387_0, Status : FAILED Error: java.io.IOException: Could not get block locations. Source file "/tmp/tpcds-generate/10000/_temporary/1/_temporary/attempt_1515660475578_0007_m_000387_0/inventory/data-m-00387" - Aborting...block==null at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) {code} This was tracked to {code} Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253) at org.apache.hadoop.hdfs.DataStreamer$StreamerStreams.<init>(DataStreamer.java:162) at org.apache.hadoop.hdfs.DataStreamer.transfer(DataStreamer.java:1450) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1407) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1598) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1499) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) {code} {code} # ss -tl | grep 50010 LISTEN 0 128 *:50010 *:* {code} However, the system is configured with a much higher somaxconn {code} # sysctl -a | grep somaxconn net.core.somaxconn = 16000 {code} Yet, the SNMP counters show connections being refused with {{127 times the listen queue of a socket overflowed}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org