Vinod Kumar Vavilapalli commented on YARN-2884:

[~jianhe] mentioned this offline and the configuration approach concerns me too.

Stepping back, I think the current discovery of Scheduler by the apps is 
completely broken. Distributed Shell for e.g. works only because it is a java 
application and NM happens to put HADOOP_CONF_DIR in the classpath. 
Irrespective of this JIRA, we need to fix the scheduler discovery for the apps. 
The current way of depending on server configuration is unreliable in the face 
of rolling-upgrades.

The specific solution in this JIRA further breaks rolling-upgrades and 
configuration updates. If and when, an admin forces client configuration 
changes, the config written by the Node will go out of sync. This overall makes 
the situation worse.

I'd suggest that we start moving towards a better scheduler-discovery model. We 
have already done similar work with Timeline service (YARN-3039). We can 
implement part of that here - an environment based discovery - we can simply 
have an environment say YARN_SCHEDULER_ADDRESS for now set by the NodeManager 
into the AM-env, that is respected as the first level discovery mechanism. As 
we add more first-class discovery mechanisms, this env can take lesser 
precedence. This approach isn't too far from your current solution too, instead 
of pointing to a conf-dir env, you are pointing to a scheduler-address env 

> Proxying all AM-RM communications
> ---------------------------------
>                 Key: YARN-2884
>                 URL: https://issues.apache.org/jira/browse/YARN-2884
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Carlo Curino
>            Assignee: Kishore Chaliparambil
>         Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs

This message was sent by Atlassian JIRA

Reply via email to