[
https://issues.apache.org/jira/browse/YARN-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175325#comment-15175325
]
Rohith Sharma K S commented on YARN-4754:
-----------------------------------------
As a result of above sometimes RM itself wont get resources to publish which
causes entity publish fails.
Exception trace-
{noformat}
2016-03-01 11:34:34,325 ERROR
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher:
Error when publishing entity [YARN_APPLICATION,application_1456545891178_0950]
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: Too
many open files
at
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:235)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:184)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:246)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:481)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:324)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:321)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1711)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:306)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:456)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationACLsUpdatedEvent(SystemMetricsPublisher.java:320)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:232)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:473)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:468)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:189)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:117)
at java.lang.Thread.run(Thread.java:745)
{noformat}
> Too many connection opened to TimelineServer while publishing entities
> ----------------------------------------------------------------------
>
> Key: YARN-4754
> URL: https://issues.apache.org/jira/browse/YARN-4754
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Rohith Sharma K S
> Priority: Critical
>
> It is observed that there are too many connections are kept opened to
> TimelineServer while publishing entities via SystemMetricsPublisher. This
> cause sometimes resource shortage for other process or RM itself
> {noformat}
> tcp 0 0 10.18.99.110:3999 10.18.214.60:59265
> ESTABLISHED 115302/java
> tcp 0 0 10.18.99.110:25001 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25002 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25003 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25004 :::* LISTEN
> 115302/java
> tcp 0 0 10.18.99.110:25005 :::* LISTEN
> 115302/java
> tcp 1 0 10.18.99.110:48866 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48137 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:47553 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48424 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48139 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:48096 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:47558 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> tcp 1 0 10.18.99.110:49270 10.18.99.110:8188
> CLOSE_WAIT 115302/java
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)