[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606769#comment-16606769
 ] 

Rahul Anand commented on YARN-7592:
-----------------------------------

As per my understanding, for a Non-HA setup, with the default configuration, 
this will always create a problem. I have listed down my analysis.

NodeManager registration starts from {{NodeManager#main}} and evetually invokes 
{{NodeStatusUpdaterImpl#serviceStart}} 
{code:java}
protected void serviceStart() throws Exception \{
...
    this.resourceTracker = getRMClient();
..
  } catch (Exception e) \{
  String errorMessage = "Unexpected error starting NodeStatusUpdater";
  LOG.error(errorMessage, e);
  throw new YarnRuntimeException(e);
 }
}
 {code}
Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource 
tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance 
{code:java}
if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) {
   RMFailoverProxyProvider<T> provider =
       instance.createRMFailoverProxyProvider(conf, protocol);{code}
is failing the registration of the nodemanager. By default, 
RMProxy#createRMFailoverProxyProvider will always select 
ConfiguredRMFailoverProxyProvider 
{code:java}
RMFailoverProxyProvider<T> provider = ReflectionUtils.newInstance(
      conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER,
         defaultProviderClass, RMFailoverProxyProvider.class), conf);
provider.init(conf, (RMProxy<T>) this, protocol);{code}
and eventually, it will try to get RM's id from 
ConfiguredRMFailoverProxyProvider#init
{code:java}
Collection<String> rmIds = HAUtil.getRMHAIds(conf);
 which would have been set only in case of HA setup according to 
ResourceManager#serviceInit.
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) \{
    HAUtil.verifyAndSetConfiguration(this.conf);
}
  {code}
 

When I tried to run with the proxy provider as 
FederationRMFailoverProxyProvider, it started the nodemanager but this would be 
idealistic to work with only in case of 1 RM. 
{code:java}
<property>
    <name>yarn.client.failover-proxy-provider</name>
    
<value>org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider</value>
</property>{code}
Please correct if I am wrong at any point. 

 

> yarn.federation.failover.enabled missing in yarn-default.xml
> ------------------------------------------------------------
>
>                 Key: YARN-7592
>                 URL: https://issues.apache.org/jira/browse/YARN-7592
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: federation
>    Affects Versions: 3.0.0-beta1
>            Reporter: Gera Shegalov
>            Priority: Major
>         Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to