[
https://issues.apache.org/jira/browse/YARN-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569568#comment-15569568
]
Junping Du commented on YARN-5718:
----------------------------------
Thanks Vrushali for quick comments. I think compile error is a bit misleading
but indeed an issue need to fix in TestFSRMStateStore (due to a stupid mistake
in generating v2 patch). v2.1 should fix the issue.
> TimelineClient (and other places in YARN) shouldn't over-write HDFS client
> retry settings which could cause unexpected behavior
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-5718
> URL: https://issues.apache.org/jira/browse/YARN-5718
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager, timelineclient
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: YARN-5718-v2.1.patch, YARN-5718-v2.patch, YARN-5718.patch
>
>
> In one HA cluster, after NN failed over, we noticed that job is getting
> failed as TimelineClient failed to retry connection to proper NN. This is
> because we are overwrite hdfs client settings that hard code retry policy to
> be enabled that conflict NN failed-over case - hdfs client should fail fast
> so can retry on another NN.
> We shouldn't assume any retry policy for hdfs client at all places in YARN.
> This should keep consistent with HDFS settings that has different retry
> polices in different deployment case. Thus, we should clean up these hard
> code settings in YARN, include: FileSystemTimelineWriter,
> FileSystemRMStateStore and FileSystemNodeLabelsStore.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]