[
https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236885#comment-16236885
]
Botong Huang commented on YARN-6128:
------------------------------------
Thanks [~subru] for the review. v4 patch uploaded addressing the comments.
bq. Why are using both synchronized and concurrent HashMap in
FederationRegistryClient? What's the access pattern as I am wondering if
read-write locks might be more efficient?
Synchronize block removed.
bq. Can we load the entire state at one shot in
FederationRegistryClient::loadStateFromRegistry instead of multiple looping?
I don't understand what you mean. The current code first call list to find all
subclusters in registry, then read the token for each subcluster in a loop.
bq. I don't understand the need for ignoreMemoryState (relates to first comment
about Javadoc)?
This is to say whether to trust in memory data to decide whether to go into
registry and delete. For {{FederationInterceptor}}, the in memory data is
always in sync, so it sets ignoreMemoryState = false. However for registry
cleanup (will do inside GPG), GPG will not have the in memory data, so will
pass in true here to force deletion.
bq. In UnmanagedAMPoolManager, we are always setting
setKeepContainersAcrossApplicationAttempts, is that intended or should it only
be done when HA or restart is enabled?
Fixed, exposing this flag in UAM interface.
bq. I don't grok why are we additionally persisting the Credentials in
AMRMProxyApplicationContext. Where's it being consumed in the patch?
For store impl of registry, credential might be needed to access store.
bq. I am not able to follow
FederationInterceptor::reAttachUAMAndMergeRegisterResponse invocation.
Shouldn't this be done during recovery? and why are we invoking this right
after register?
The register is for home RM, after that, to return the full list of containers
from previous attempt, we reattach to all existing UAMs and get running
containers in secondary sub-clusters, merge all of them and return it to AM. In
YARN-6704 later, inside {{FederationInterceptor::recover}}, I will be calling
reattach as well.
> Add support for AMRMProxy HA
> ----------------------------
>
> Key: YARN-6128
> URL: https://issues.apache.org/jira/browse/YARN-6128
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: amrmproxy, nodemanager
> Reporter: Subru Krishnan
> Assignee: Botong Huang
> Priority: Major
> Attachments: YARN-6128.v0.patch, YARN-6128.v1.patch,
> YARN-6128.v1.patch, YARN-6128.v2.patch, YARN-6128.v3.patch,
> YARN-6128.v3.patch, YARN-6128.v4.patch
>
>
> YARN-556 added the ability for RM failover without loosing any running
> applications. In a Federated YARN environment, there's additional state in
> the {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we
> need to enhance {{AMRMProxy}} to support HA.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]