[ 
https://issues.apache.org/jira/browse/YARN-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848061#comment-13848061
 ] 

Xuan Gong commented on YARN-1459:
---------------------------------

There are several places might be changed while the RM is running.
1. refreshQueue : capacity-scheduler.xml/fair-scheduler.xml
2. refreshNodes : the file contains including rm_nodes, and the file contains 
excluding rm_nodes.
3.refreshSuperUserGroupsConfiguration: core-site.xml
4.refreshUserToGroupsMappings : core-site.xml
5.refreshAdminAcls : yarn-site.xml
6.refreshServiceAcls : hadoop-policy.xml

Basically, the adminUser can change those configuration files in one rm, and 
use rmadminCLI to refresh and get latest value. If the failover happens, this 
rm goes to standby state. Other active rm have no way to get the updated values.

Here is my proposal : 

We can create a configuration folder in hdfs, set the proper permission for it. 
If the admin user call the any of refresh functions, we can upload the latest 
configuration files into this configuration folder. For example, if the admin 
user call refreshNodes through the rmadminCLI, we need to upload two files : 
the file contains including rm_nodes, and the file contains excluding rm_nodes. 
After the failover happens, the rm, which is transited to active state, need to 
download all the configuration files, and do the all refresh functions to get 
the latest value. But using this approach, we saved many un-related 
informations, such as we save the whole core_site.xml or yarn_site.xml. 

The other way is : we can create several PB files (just like we did for saving 
the application info, appAttempt info, etc) to save the enough information for 
queue info, node info, superusergroup info, usertoGroupsmapping info, etc. We 
can save them into the same place as we save the app information (in different 
folder, of course). When the rm transit to active state, it will read them back 
(since the rm will read all the app information back, anyway). The good thing 
about this approach is that we just need to save the information we need. 

Any other suggestions ?



> Handle supergroups, usergroups and ACLs across RMs during failover
> ------------------------------------------------------------------
>
>                 Key: YARN-1459
>                 URL: https://issues.apache.org/jira/browse/YARN-1459
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Xuan Gong
>
> The supergroups, usergroups and ACL configurations are per RM and might have 
> been changed while the RM is running. After failing over, the new Active RM 
> should have the latest configuration from the previously Active RM.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to