[
https://issues.apache.org/jira/browse/YARN-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091187#comment-16091187
]
Vrushali C commented on YARN-6767:
----------------------------------
Here is my observation in one case. I started up a job and then killed the NM
of the node that the AM was running on. The job ran successfully and I also
have an history file.
I see the following error messages in the timeline service context in the AM
log.
{code}
2017-07-18 06:31:55,772 ERROR [pool-8-thread-1]
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: TimelineClient has
reached to max retry times : 30 for service address: hostname:port
2017-07-18 06:31:55,773 ERROR [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Failed to
process Event JOB_FINISHED for the job : job_1500067716904_0256
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:425)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:121)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1289)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:590)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:339)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: TimelineClient has reached to max retry times :
30 for service address: hostname:port
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.checkRetryWithSleep(TimelineV2ClientImpl.java:179)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:151)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:254)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:248)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:375)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:313)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.io.IOException:
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException:
Connection refused (Connection refused)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:195)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:147)
... 8 more
Caused by: com.sun.jersey.api.client.ClientHandlerException:
java.net.ConnectException: Connection refused (Connection refused)
at
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at
com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:533)
at
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:188)
... 9 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
at
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:76)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127)
at
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
at
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory$1.run(TimelineConnector.java:261)
at
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory$1.run(TimelineConnector.java:258)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
at
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory.getHttpURLConnection(TimelineConnector.java:258)
at
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
at
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
... 14 more
{code}
> Timeline client won't be able to write when TimelineCollector is not up yet,
> or NM is down
> ------------------------------------------------------------------------------------------
>
> Key: YARN-6767
> URL: https://issues.apache.org/jira/browse/YARN-6767
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineclient
> Affects Versions: 3.0.0-alpha4
> Reporter: Haibo Chen
>
> As discussed in the call, when an application first starts to run, its
> corresponding TimelineCollector instance may not be up yet, or if the
> TimelineCollector goes down when node manager dies (TimelineCollector now
> runs as part of NM auxiliary services), the timeline client
> will not able to write entities. We need to address or mitigate the issue if
> possible, or at least call it out.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]