Junping Du created YARN-5718:
--------------------------------
Summary: TimelineClient (and other places in YARN) shouldn't
over-write HDFS client retry settings which could cause unexpected behavior
Key: YARN-5718
URL: https://issues.apache.org/jira/browse/YARN-5718
Project: Hadoop YARN
Issue Type: Bug
Components: timelineclient, resourcemanager
Reporter: Junping Du
Assignee: Junping Du
In one HA cluster, after NN failed over, we noticed that job is getting failed
as TimelineClient failed to retry connection to proper NN. This is because we
are overwrite hdfs client settings that hard code retry policy to be enabled
that conflict NN failed-over case - hdfs client should fail fast so can retry
on another NN.
We shouldn't assume any retry policy for hdfs client at all places in YARN.
This should keep consistent with HDFS settings that has different retry polices
in different deployment case. Thus, we should clean up these hard code settings
in YARN, include: FileSystemTimelineWriter, FileSystemRMStateStore and
FileSystemNodeLabelsStore.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]