[
https://issues.apache.org/jira/browse/YARN-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719655#comment-17719655
]
hansonhe edited comment on YARN-4754 at 5/5/23 6:56 AM:
--------------------------------------------------------
My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1.
(1)sh1-int-data-bigdata-dw-inv-prod-1:run timline server,there are so many
FIN_WAIT2 |TIME_WAIT
root@sh1-int-data-bigdata-dw-inv-prod-1 ~ $ netstat -anp|grep 8188
tcp 0 0 10.2.51.214:8188 0.0.0.0:* LISTEN
8949/java
tcp 0 0 10.2.51.214:8188 10.2.51.215:52490 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52498 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52538 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52552 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52556 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34080 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52540 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34098 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52562 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34074 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34076 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34092 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52496 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34070 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34068 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34096 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52508 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52494 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52510 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52520 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.214:58984 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52542 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52536 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34078 TIME_WAIT
-
tcp 0 0 10.2.51.214:58986 10.2.51.214:8188 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34072 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52512 FIN_WAIT2
-
tcp 1 0 10.2.51.214:58984 10.2.51.214:8188 CLOSE_WAIT
27743/java
tcp 0 0 10.2.51.214:8188 10.2.51.22:34082 TIME_WAIT
-
(2)sh1-int-data-bigdata-dw-inv-prod-2:run ResourceManager Server,there are so
many CLOSE_WAIT, even the number increase to more than 10 thousands.
root@sh1-int-data-bigdata-dw-inv-prod-2 ~ $ netstat -anp|grep 8188
tcp 1 0 10.2.51.215:52496 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52520 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52540 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52494 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52542 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52522 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52510 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52536 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52498 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52556 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52538 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52562 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52564 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52490 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52502 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52512 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52508 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52548 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52552 10.2.51.214:8188 CLOSE_WAIT
20846/java
(3) @ [~Ying Zhang] .I have test this way,but it doesn't work.
if (resp != null) {
msg += " HTTP error code: " + resp.getStatus();
//if (LOG.isDebugEnabled()){
String output = resp.getEntity(String.class);
LOG.debug("HTTP error code: " + resp.getStatus() + " Server
response : \n" + output);
//}
(4) I use arthas tool to monitor ResourceManager process.It find the instances
of ClientResponse called getStatusInfo().getStatusCode() are all equal to
ClientResponse.Status.OK and return value @Integer[200].
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express 'instances.length' -limit 20000
@Integer[3823]
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express
'instances[117].getStatusInfo().getStatusCode()' -limit 20000
@Integer[200]
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express
'instances[120].getStatusInfo().getStatusCode()' -limit 20000
@Integer[200]
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express
'instances[2010].getStatusInfo().getStatusCode()' -limit 20000
@Integer[200]
(5)So I still cann't find the solution @[~varun_saxena] @[~Naganarasimha]
My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1.
When run sql on hive,the number is always increase.
Are there any patches have solved it ? Thanks!!!
was (Author: JIRAUSER290582):
My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1.
(1)sh1-int-data-bigdata-dw-inv-prod-1:run timline server,there are so many
FIN_WAIT2 |TIME_WAIT
root@sh1-int-data-bigdata-dw-inv-prod-1 ~ $ netstat -anp|grep 8188
tcp 0 0 10.2.51.214:8188 0.0.0.0:* LISTEN
8949/java
tcp 0 0 10.2.51.214:8188 10.2.51.215:52490 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52498 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52538 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52552 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52556 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34080 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52540 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34098 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52562 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34074 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34076 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34092 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52496 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34070 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34068 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34096 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52508 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52494 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52510 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52520 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.214:58984 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52542 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52536 FIN_WAIT2
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34078 TIME_WAIT
-
tcp 0 0 10.2.51.214:58986 10.2.51.214:8188 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.22:34072 TIME_WAIT
-
tcp 0 0 10.2.51.214:8188 10.2.51.215:52512 FIN_WAIT2
-
tcp 1 0 10.2.51.214:58984 10.2.51.214:8188 CLOSE_WAIT
27743/java
tcp 0 0 10.2.51.214:8188 10.2.51.22:34082 TIME_WAIT
-
(2)sh1-int-data-bigdata-dw-inv-prod-2:run ResourceManager Server,there are so
many CLOSE_WAIT, even the number increase to more than 10 thousands.
root@sh1-int-data-bigdata-dw-inv-prod-2 ~ $ netstat -anp|grep 8188
tcp 1 0 10.2.51.215:52496 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52520 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52540 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52494 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52542 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52522 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52510 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52536 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52498 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52556 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52538 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52562 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52564 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52490 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52502 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52512 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 1 0 10.2.51.215:52508 10.2.51.214:8188 CLOSE_WAIT
20846/java
tcp 0 0 10.2.51.215:52548 10.2.51.214:8188 TIME_WAIT
-
tcp 1 0 10.2.51.215:52552 10.2.51.214:8188 CLOSE_WAIT
20846/java
(3) @ [~Ying Zhang] .I have test this way,but it doesn't work.
if (resp != null) {
msg += " HTTP error code: " + resp.getStatus();
//if (LOG.isDebugEnabled()) {
String output = resp.getEntity(String.class);
LOG.debug("HTTP error code: " + resp.getStatus()
+ " Server response : \n" + output);
//}
(4) I use arthas tool to monitor ResourceManager process.It find the instances
of ClientResponse called getStatusInfo().getStatusCode() are all equal to
ClientResponse.Status.OK and return value @Integer[200].
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express 'instances.length' -limit 20000
@Integer[3823]
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express
'instances[117].getStatusInfo().getStatusCode()' -limit 20000
@Integer[200]
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express
'instances[120].getStatusInfo().getStatusCode()' -limit 20000
@Integer[200]
[arthas@20846]$ vmtool -action getInstances --className
com.sun.jersey.api.client.ClientResponse --classLoaderClass
sun.misc.Launcher$AppClassLoader express
'instances[2010].getStatusInfo().getStatusCode()' -limit 20000
@Integer[200]
(5)So I still cann't find the solution @[~varun_saxena] @[~Naganarasimha]
My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1.
When run sql on hive,the number is always increase.
Are there any patches have solved it ? Thanks!!!
> Too many connection opened to TimelineServer while publishing entities
> ----------------------------------------------------------------------
>
> Key: YARN-4754
> URL: https://issues.apache.org/jira/browse/YARN-4754
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Rohith Sharma K S
> Priority: Critical
> Attachments: ConnectionLeak.rar
>
>
> It is observed that there are too many connections are kept opened to
> TimelineServer while publishing entities via SystemMetricsPublisher. This
> cause sometimes resource shortage for other process or RM itself
> {noformat}
> tcp 0 0 10.18.99.110:3999 10.18.214.60:59265
> ESTABLISHED 115302/java
> tcp 0 0 10.18.99.110:25001 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25002 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25003 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25004 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25005 :::* LISTEN
> 115302/java
> tcp 1 0 10.18.99.110:48866 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48137 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:47553 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48424 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48139 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48096 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:47558 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:49270 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]