Sean Chow created HDFS-15402: -------------------------------- Summary: Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode Key: HDFS-15402 URL: https://issues.apache.org/jira/browse/HDFS-15402 Project: Hadoop HDFS Issue Type: Bug Components: metrics Affects Versions: 3.1.3 Reporter: Sean Chow
We access {{http://127.0.0.1:50075/jmx}} to get datanode metrics periodically. But there is too much CLOSE-WAIT socket state that lead the normal webhdfs request failed. {code:java} $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10 CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281 $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 6729 lsof -i:37296 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 101015 hdfs 3044u IPv4 271157177 0t0 TCP localhost:50075->localhost:37296 (CLOSE_WAIT) {code} The pid 101015 is the datanode's process id. I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of them have the same issue. When the metric retriving script stop, the number of CLOSE-WAIT does not increase anymore. The version apache-hadoop-2.9.2 does not have this issue with the same retriving metric script. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org