[ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528420#comment-14528420
 ] 

Junping Du commented on YARN-3477:
----------------------------------

Adding a little background (for timeline service v2) here why we could prefer 
DEBUG than INFO here in retry logic: 
In timeline service version 2, the timeline service address (per application 
per agent - we call it AppTimelineCollector) is automatically discovered, the 
current flow is: 
1. for a new application, when AM get launched in NM, the auxiliary service of 
container launch will trigger initializing of AppTimelineCollector, which 
report its bind address to NM (we add new RPC there); 
2. NM will notify RM about this new AppTimelineCollector address in next 
heartbeat; 
3. Other NMs (has container running against this app) get this address from RM.
Both AM and NM leverage TimelineClient to publish events/metrics info to 
timeline service and this auto-discovery process do need some time (several 
heartbeat intervals) to figure out rather than a static pre-configured address. 
And we will always see some disturbed info if we put INFO level message there. 
Thoughts? 

> TimelineClientImpl swallows exceptions
> --------------------------------------
>
>                 Key: YARN-3477
>                 URL: https://issues.apache.org/jira/browse/YARN-3477
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: YARN-3477-001.patch, YARN-3477-002.patch
>
>
> If timeline client fails more than the retry count, the original exception is 
> not thrown. Instead some runtime exception is raised saying "retries run out"
> # the failing exception should be rethrown, ideally via 
> NetUtils.wrapException to include URL of the failing endpoing
> # Otherwise, the raised RTE should (a) state that URL and (b) set the 
> original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to