[ https://issues.apache.org/jira/browse/HDFS-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129949#comment-17129949 ]
Sean Chow edited comment on HDFS-15402 at 6/10/20, 2:01 AM: ------------------------------------------------------------ My clients use webhdfs put file a lot. I think this issue is not caused by webhdfs, but the jmx endpoint. Occasionally the http port can not be accessed: {code:java} $ curl http://127.0.0.1:50075/jmx > a curl: (7) couldn't connect to host {code} After restart datanodes, the CLOSE-WAIT disappears. Currently I have no clue for this because no Exception could be found. was (Author: seanlook): My clients use webhdfs put file a lot. I think this issue is not caused by webhdfs, but the jmx endpoint. Occasionally the http port can not be accessed: {code:java} $ curl http://127.0.0.1:50075/jmx > a curl: (7) couldn't connect to host {code} After restart datanodes, the CLOSE-WAIT disappears. Currently I have no clue for this because not Exception could be found. > Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode > -------------------------------------------------------------------- > > Key: HDFS-15402 > URL: https://issues.apache.org/jira/browse/HDFS-15402 > Project: Hadoop HDFS > Issue Type: Bug > Components: metrics > Affects Versions: 3.1.3 > Reporter: Sean Chow > Priority: Major > > We access {{http://127.0.0.1:50075/jmx}} to get datanode metrics > periodically. But there is too much CLOSE-WAIT socket state that lead the > normal webhdfs request failed. > > {code:java} > $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10 > CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 > CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 > CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 > CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 > CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281 > $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l > 6729 > lsof -i:37296 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > java 101015 hdfs 3044u IPv4 271157177 0t0 TCP > localhost:50075->localhost:37296 (CLOSE_WAIT) > {code} > > The pid 101015 is the datanode's process id. > I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of > them have the same issue. When the metric retriving script stop, the number > of CLOSE-WAIT does not increase anymore. > The version apache-hadoop-2.9.2 does not have this issue with the same > retriving metric script. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org