[ 
https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236885#comment-16236885
 ] 

Botong Huang commented on YARN-6128:
------------------------------------

Thanks [~subru] for the review. v4 patch uploaded addressing the comments. 
bq. Why are using both synchronized and concurrent HashMap in 
FederationRegistryClient? What's the access pattern as I am wondering if 
read-write locks might be more efficient?
Synchronize block removed. 
bq. Can we load the entire state at one shot in 
FederationRegistryClient::loadStateFromRegistry instead of multiple looping?
I don't understand what you mean. The current code first call list to find all 
subclusters in registry, then read the token for each subcluster in a loop. 
bq. I don't understand the need for ignoreMemoryState (relates to first comment 
about Javadoc)?
This is to say whether to trust in memory data to decide whether to go into 
registry and delete. For {{FederationInterceptor}}, the in memory data is 
always in sync, so it sets ignoreMemoryState = false. However for registry 
cleanup (will do inside GPG), GPG will not have the in memory data, so will 
pass in true here to force deletion. 
bq. In UnmanagedAMPoolManager, we are always setting 
setKeepContainersAcrossApplicationAttempts, is that intended or should it only 
be done when HA or restart is enabled?
Fixed, exposing this flag in UAM interface. 
bq. I don't grok why are we additionally persisting the Credentials in 
AMRMProxyApplicationContext. Where's it being consumed in the patch?
For store impl of registry, credential might be needed to access store. 
bq. I am not able to follow 
FederationInterceptor::reAttachUAMAndMergeRegisterResponse invocation. 
Shouldn't this be done during recovery? and why are we invoking this right 
after register?
The register is for home RM, after that, to return the full list of containers 
from previous attempt, we reattach to all existing UAMs and get running 
containers in secondary sub-clusters, merge all of them and return it to AM. In 
YARN-6704 later, inside {{FederationInterceptor::recover}}, I will be calling 
reattach as well. 

> Add support for AMRMProxy HA
> ----------------------------
>
>                 Key: YARN-6128
>                 URL: https://issues.apache.org/jira/browse/YARN-6128
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: amrmproxy, nodemanager
>            Reporter: Subru Krishnan
>            Assignee: Botong Huang
>            Priority: Major
>         Attachments: YARN-6128.v0.patch, YARN-6128.v1.patch, 
> YARN-6128.v1.patch, YARN-6128.v2.patch, YARN-6128.v3.patch, 
> YARN-6128.v3.patch, YARN-6128.v4.patch
>
>
> YARN-556 added the ability for RM failover without loosing any running 
> applications. In a Federated YARN environment, there's additional state in 
> the {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we 
> need to enhance {{AMRMProxy}} to support HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to