[ 
https://issues.apache.org/jira/browse/HDFS-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129949#comment-17129949
 ] 

Sean Chow edited comment on HDFS-15402 at 6/10/20, 2:01 AM:
------------------------------------------------------------

My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because no Exception could be found.


was (Author: seanlook):
My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:

 
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because not Exception could be found.

> Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode
> --------------------------------------------------------------------
>
>                 Key: HDFS-15402
>                 URL: https://issues.apache.org/jira/browse/HDFS-15402
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 3.1.3
>            Reporter: Sean Chow
>            Priority: Major
>
> We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
> periodically. But there is too much CLOSE-WAIT socket state that lead the 
> normal webhdfs request failed.
>  
> {code:java}
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
> 6729
> lsof -i:37296
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 101015 hdfs 3044u IPv4 271157177 0t0 TCP 
> localhost:50075->localhost:37296 (CLOSE_WAIT)
> {code}
>  
> The pid 101015 is the datanode's process id.
> I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
> them have the same issue. When the metric retriving script stop, the number 
> of CLOSE-WAIT does not increase anymore.
>  The version apache-hadoop-2.9.2 does not have this issue with the same 
> retriving metric script.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to