[ https://issues.apache.org/jira/browse/YARN-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719655#comment-17719655 ]
hansonhe edited comment on YARN-4754 at 5/5/23 6:56 AM: -------------------------------------------------------- My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1. (1)sh1-int-data-bigdata-dw-inv-prod-1:run timline server,there are so many FIN_WAIT2 |TIME_WAIT root@sh1-int-data-bigdata-dw-inv-prod-1 ~ $ netstat -anp|grep 8188 tcp 0 0 10.2.51.214:8188 0.0.0.0:* LISTEN 8949/java tcp 0 0 10.2.51.214:8188 10.2.51.215:52490 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52498 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52538 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52552 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52556 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34080 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52540 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34098 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52562 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34074 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34076 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34092 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52496 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34070 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34068 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34096 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52508 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52494 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52510 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52520 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.214:58984 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52542 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52536 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34078 TIME_WAIT - tcp 0 0 10.2.51.214:58986 10.2.51.214:8188 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34072 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52512 FIN_WAIT2 - tcp 1 0 10.2.51.214:58984 10.2.51.214:8188 CLOSE_WAIT 27743/java tcp 0 0 10.2.51.214:8188 10.2.51.22:34082 TIME_WAIT - (2)sh1-int-data-bigdata-dw-inv-prod-2:run ResourceManager Server,there are so many CLOSE_WAIT, even the number increase to more than 10 thousands. root@sh1-int-data-bigdata-dw-inv-prod-2 ~ $ netstat -anp|grep 8188 tcp 1 0 10.2.51.215:52496 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52520 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52540 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52494 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52542 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52522 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52510 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52536 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52498 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52556 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52538 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52562 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52564 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52490 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52502 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52512 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52508 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52548 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52552 10.2.51.214:8188 CLOSE_WAIT 20846/java (3) @ [~Ying Zhang] .I have test this way,but it doesn't work. if (resp != null) { msg += " HTTP error code: " + resp.getStatus(); //if (LOG.isDebugEnabled()){ String output = resp.getEntity(String.class); LOG.debug("HTTP error code: " + resp.getStatus() + " Server response : \n" + output); //} (4) I use arthas tool to monitor ResourceManager process.It find the instances of ClientResponse called getStatusInfo().getStatusCode() are all equal to ClientResponse.Status.OK and return value @Integer[200]. [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances.length' -limit 20000 @Integer[3823] [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances[117].getStatusInfo().getStatusCode()' -limit 20000 @Integer[200] [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances[120].getStatusInfo().getStatusCode()' -limit 20000 @Integer[200] [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances[2010].getStatusInfo().getStatusCode()' -limit 20000 @Integer[200] (5)So I still cann't find the solution @[~varun_saxena] @[~Naganarasimha] My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1. When run sql on hive,the number is always increase. Are there any patches have solved it ? Thanks!!! was (Author: JIRAUSER290582): My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1. (1)sh1-int-data-bigdata-dw-inv-prod-1:run timline server,there are so many FIN_WAIT2 |TIME_WAIT root@sh1-int-data-bigdata-dw-inv-prod-1 ~ $ netstat -anp|grep 8188 tcp 0 0 10.2.51.214:8188 0.0.0.0:* LISTEN 8949/java tcp 0 0 10.2.51.214:8188 10.2.51.215:52490 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52498 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52538 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52552 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52556 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34080 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52540 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34098 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52562 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34074 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34076 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34092 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52496 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34070 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34068 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34096 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52508 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52494 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52510 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52520 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.214:58984 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52542 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.215:52536 FIN_WAIT2 - tcp 0 0 10.2.51.214:8188 10.2.51.22:34078 TIME_WAIT - tcp 0 0 10.2.51.214:58986 10.2.51.214:8188 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.22:34072 TIME_WAIT - tcp 0 0 10.2.51.214:8188 10.2.51.215:52512 FIN_WAIT2 - tcp 1 0 10.2.51.214:58984 10.2.51.214:8188 CLOSE_WAIT 27743/java tcp 0 0 10.2.51.214:8188 10.2.51.22:34082 TIME_WAIT - (2)sh1-int-data-bigdata-dw-inv-prod-2:run ResourceManager Server,there are so many CLOSE_WAIT, even the number increase to more than 10 thousands. root@sh1-int-data-bigdata-dw-inv-prod-2 ~ $ netstat -anp|grep 8188 tcp 1 0 10.2.51.215:52496 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52520 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52540 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52494 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52542 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52522 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52510 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52536 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52498 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52556 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52538 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52562 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52564 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52490 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52502 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52512 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 1 0 10.2.51.215:52508 10.2.51.214:8188 CLOSE_WAIT 20846/java tcp 0 0 10.2.51.215:52548 10.2.51.214:8188 TIME_WAIT - tcp 1 0 10.2.51.215:52552 10.2.51.214:8188 CLOSE_WAIT 20846/java (3) @ [~Ying Zhang] .I have test this way,but it doesn't work. if (resp != null) { msg += " HTTP error code: " + resp.getStatus(); //if (LOG.isDebugEnabled()) { String output = resp.getEntity(String.class); LOG.debug("HTTP error code: " + resp.getStatus() + " Server response : \n" + output); //} (4) I use arthas tool to monitor ResourceManager process.It find the instances of ClientResponse called getStatusInfo().getStatusCode() are all equal to ClientResponse.Status.OK and return value @Integer[200]. [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances.length' -limit 20000 @Integer[3823] [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances[117].getStatusInfo().getStatusCode()' -limit 20000 @Integer[200] [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances[120].getStatusInfo().getStatusCode()' -limit 20000 @Integer[200] [arthas@20846]$ vmtool -action getInstances --className com.sun.jersey.api.client.ClientResponse --classLoaderClass sun.misc.Launcher$AppClassLoader express 'instances[2010].getStatusInfo().getStatusCode()' -limit 20000 @Integer[200] (5)So I still cann't find the solution @[~varun_saxena] @[~Naganarasimha] My Product Environment: hadoop-3.1.4 have same problems when use TimelineV1. When run sql on hive,the number is always increase. Are there any patches have solved it ? Thanks!!! > Too many connection opened to TimelineServer while publishing entities > ---------------------------------------------------------------------- > > Key: YARN-4754 > URL: https://issues.apache.org/jira/browse/YARN-4754 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Rohith Sharma K S > Priority: Critical > Attachments: ConnectionLeak.rar > > > It is observed that there are too many connections are kept opened to > TimelineServer while publishing entities via SystemMetricsPublisher. This > cause sometimes resource shortage for other process or RM itself > {noformat} > tcp 0 0 10.18.99.110:3999 10.18.214.60:59265 > ESTABLISHED 115302/java > tcp 0 0 10.18.99.110:25001 :::* LISTEN > 115302/java > tcp 0 0 10.18.99.110:25002 :::* LISTEN > 115302/java > tcp 0 0 10.18.99.110:25003 :::* LISTEN > 115302/java > tcp 0 0 10.18.99.110:25004 :::* LISTEN > 115302/java > tcp 0 0 10.18.99.110:25005 :::* LISTEN > 115302/java > tcp 1 0 10.18.99.110:48866 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp 1 0 10.18.99.110:48137 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp 1 0 10.18.99.110:47553 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp 1 0 10.18.99.110:48424 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp 1 0 10.18.99.110:48139 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp 1 0 10.18.99.110:48096 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp 1 0 10.18.99.110:47558 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp 1 0 10.18.99.110:49270 10.18.99.110:8188 > CLOSE_WAIT 115302/java > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org