[
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606769#comment-16606769
]
Rahul Anand commented on YARN-7592:
-----------------------------------
As per my understanding, for a Non-HA setup, with the default configuration,
this will always create a problem. I have listed down my analysis.
NodeManager registration starts from {{NodeManager#main}} and evetually invokes
{{NodeStatusUpdaterImpl#serviceStart}}
{code:java}
protected void serviceStart() throws Exception \{
...
this.resourceTracker = getRMClient();
..
} catch (Exception e) \{
String errorMessage = "Unexpected error starting NodeStatusUpdater";
LOG.error(errorMessage, e);
throw new YarnRuntimeException(e);
}
}
{code}
Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource
tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance
{code:java}
if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) {
RMFailoverProxyProvider<T> provider =
instance.createRMFailoverProxyProvider(conf, protocol);{code}
is failing the registration of the nodemanager. By default,
RMProxy#createRMFailoverProxyProvider will always select
ConfiguredRMFailoverProxyProvider
{code:java}
RMFailoverProxyProvider<T> provider = ReflectionUtils.newInstance(
conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER,
defaultProviderClass, RMFailoverProxyProvider.class), conf);
provider.init(conf, (RMProxy<T>) this, protocol);{code}
and eventually, it will try to get RM's id from
ConfiguredRMFailoverProxyProvider#init
{code:java}
Collection<String> rmIds = HAUtil.getRMHAIds(conf);
which would have been set only in case of HA setup according to
ResourceManager#serviceInit.
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) \{
HAUtil.verifyAndSetConfiguration(this.conf);
}
{code}
When I tried to run with the proxy provider as
FederationRMFailoverProxyProvider, it started the nodemanager but this would be
idealistic to work with only in case of 1 RM.
{code:java}
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider</value>
</property>{code}
Please correct if I am wrong at any point.
> yarn.federation.failover.enabled missing in yarn-default.xml
> ------------------------------------------------------------
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
> Issue Type: Bug
> Components: federation
> Affects Versions: 3.0.0-beta1
> Reporter: Gera Shegalov
> Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I
> am also not sure why it should be true by default and force the HA retry
> policy in {{RMProxy#createRMProxy}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]