Marouane RAJI created YARN-9506:
-----------------------------------
Summary: Node Managers fail to update cached IP entries of
Resource Managers
Key: YARN-9506
URL: https://issues.apache.org/jira/browse/YARN-9506
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.7.1
Reporter: Marouane RAJI
Attachments: NM_logs.txt
Hi,
We are running a Yarn Cluster (for Samza Jobs) on AWS. We are running it in HA
mode, with yarn.resourcemanager.ha.automatic-failover.enabled= true
To reproduce the issue :
# Have a running cluster with 2 NodeManagers and 2 Resource Managers in HA
mode, with fail-over enabled.
** These Resource Managers need to have DNS entries defined, and set in the
config:
*** ex: yarnrm1.me.local and yarnrm2.me.local
# stop the active resource manager (yarnrm1.me.local), and retire its
instance. (Node Managers will fallback to the standby yarnrm2.me.local)
# provision a new resource manager with a new IP. Make sure the DNS entry
yarnrm1.me.local is assigned to it.
# stop the new active resource manager (yarnrm2.me.local).
# Check the logs of NodeManagers failing to access the newly provisioned
Resource Manager, and trying to access it through the old IP.
I can provide config files, yarn-site and core-site if needed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]