[ 
https://issues.apache.org/jira/browse/YARN-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091187#comment-16091187
 ] 

Vrushali C commented on YARN-6767:
----------------------------------

Here is my observation in one case. I started up a job and then killed the NM 
of the node that the AM was running on. The job ran successfully and I also 
have an history file. 

I see the following error messages in the timeline service context in the AM 
log.

{code}


2017-07-18 06:31:55,772 ERROR [pool-8-thread-1] 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: TimelineClient has 
reached to max retry times : 30 for service address: hostname:port
2017-07-18 06:31:55,773 ERROR [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Failed to 
process Event JOB_FINISHED for the job : job_1500067716904_0256
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:425)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:121)
        at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1289)
        at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:590)
        at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:339)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: TimelineClient has reached to max retry times : 
30 for service address: hostname:port
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.checkRetryWithSleep(TimelineV2ClientImpl.java:179)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:151)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:254)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:248)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:375)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:313)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 more
Caused by: java.io.IOException: 
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
Connection refused (Connection refused)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:195)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:147)
        ... 8 more
Caused by: com.sun.jersey.api.client.ClientHandlerException: 
java.net.ConnectException: Connection refused (Connection refused)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
        at com.sun.jersey.api.client.Client.handle(Client.java:648)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at 
com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:533)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:188)
        ... 9 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
        at sun.net.www.http.HttpClient.New(HttpClient.java:339)
        at sun.net.www.http.HttpClient.New(HttpClient.java:357)
        at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
        at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
        at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
        at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
        at 
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:76)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory$1.run(TimelineConnector.java:261)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory$1.run(TimelineConnector.java:258)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineURLConnectionFactory.getHttpURLConnection(TimelineConnector.java:258)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
        ... 14 more

{code}



> Timeline client won't be able to write when TimelineCollector is not up yet, 
> or NM is down
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-6767
>                 URL: https://issues.apache.org/jira/browse/YARN-6767
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineclient
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Haibo Chen
>
> As discussed in the call, when an application first starts to run, its 
> corresponding TimelineCollector instance may not be up yet, or if the 
> TimelineCollector goes down when node manager dies (TimelineCollector now 
> runs as part of NM auxiliary services), the timeline client
> will not able to write entities. We need to address or mitigate the issue if 
> possible, or at least call it out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to