[
https://issues.apache.org/jira/browse/YARN-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Botong Huang updated YARN-6511:
-------------------------------
Attachment: YARN-6511-YARN-2915.v3.patch
Thanks [~subru] for the review! I've addressed most comments in v3 patch (as
well as ones from [~jianhe]). For the rest, please see below:
bq. Do we need a {{UnmanagedAMPoolManager}} per interceptor instance or can we
use one at {{AMRMProxyService}} level?
It is easier the current way because we need to constantly get all UAM
associated with one application (keyed by subClusterId).
If we do one pool per AMRMProxy, then we probably need to key UAM with
appId+subclusterId. The search for UAMs associated with one application will
not be straight forward.
bq. Is updating the queue below safe in *loadAMRMPolicy*
Yes, the variable _queue_ is a local string, used by only the policy manager.
bq. I feel the *finishApplicationMaster* of the pool should be moved to
{{UnmanagedAMPoolManager}}.
Yes we can choose to. However it will likely be a blocking call then, where we
loose the freedom to schedule the tasks, synchronously call finish in home, and
then wait for the secondaries to come back. Or, we need addition interface in
UAMPoolManager, one for schedule and one for fetch result. I've added a TODO
for this.
bq. I see dynamic instantiations of {{ExecutorCompletionService}} in finish,
register, etc invocations. Wouldn't we be better served by pre-initializing it?
We need to create them locally because of concurrency. The allocate and finish
calls can be invoked concurrently. Sharing the same completion service object
will confuse the tasks submitted from both sides.
bq. Is *getSubClusterForNode* required as the resolver should be doing this
instead of every client?
_AbstractSubClusterResolver.getSubClusterForNode_ throws when resolving an
unknown node, we don't want to throw in this case, and thus need to catch and
log the warning.
bq. Move _YarnConfiguration_ outside the for loop in
*registerWithNewSubClusters*
We cannot because we need a different config per UAM, loaded with the
sub-cluster id
bq. Consider looping on _registrations_ in lieu of _requests_ in
*sendRequestsToSecondaryResourceManagers*
Registration only contains the newly added secondary sub-clusters, while we
need to loop over (send heartbeat to) all known secondaries here.
> Federation Intercepting and propagating AM-RM communications (part two:
> secondary subclusters added)
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-6511
> URL: https://issues.apache.org/jira/browse/YARN-6511
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Botong Huang
> Assignee: Botong Huang
> Attachments: YARN-6511-YARN-2915.v1.patch,
> YARN-6511-YARN-2915.v2.patch, YARN-6511-YARN-2915.v3.patch
>
>
> In order to support transparent "spanning" of jobs across sub-clusters, all
> AM-RM communications are proxied (via YARN-2884).
> This JIRA tracks federation-specific mechanisms that decide how to
> "split/broadcast" requests to the RMs and "merge" answers to
> the AM.
> This the part two jira, which adds secondary subclusters and do full
> split-merge for requests. Part one is in YARN-3666
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]