[jira] [Updated] (YARN-6511) Federation Intercepting and propagating AM-RM communications (part two: secondary subclusters added)

Botong Huang (JIRA) Tue, 06 Jun 2017 10:54:41 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Botong Huang updated YARN-6511:
-------------------------------
    Attachment: YARN-6511-YARN-2915.v3.patch

Thanks [~subru] for the review! I've addressed most comments in v3 patch (as 
well as ones from [~jianhe]). For the rest, please see below: 

bq. Do we need a {{UnmanagedAMPoolManager}} per interceptor instance or can we 
use one at {{AMRMProxyService}} level?
It is easier the current way because we need to constantly get all UAM 
associated with one application (keyed by subClusterId). 
If we do one pool per AMRMProxy, then we probably need to key UAM with 
appId+subclusterId. The search for UAMs associated with one application will 
not be straight forward. 

bq. Is updating the queue below safe in *loadAMRMPolicy*
Yes, the variable _queue_ is a local string, used by only the policy manager.

bq. I feel the *finishApplicationMaster* of the pool should be moved to 
{{UnmanagedAMPoolManager}}. 
Yes we can choose to. However it will likely be a blocking call then, where we 
loose the freedom to schedule the tasks, synchronously call finish in home, and 
then wait for the secondaries to come back. Or, we need addition interface in 
UAMPoolManager, one for schedule and one for fetch result. I've added a TODO 
for this. 

bq. I see dynamic instantiations of {{ExecutorCompletionService}} in finish, 
register, etc invocations. Wouldn't we be better served by pre-initializing it?
We need to create them locally because of concurrency. The allocate and finish 
calls can be invoked concurrently. Sharing the same completion service object 
will confuse the tasks submitted from both sides.

bq. Is *getSubClusterForNode* required as the resolver should be doing this 
instead of every client?
_AbstractSubClusterResolver.getSubClusterForNode_ throws when resolving an 
unknown node, we don't want to throw in this case, and thus need to catch and 
log the warning.

bq. Move _YarnConfiguration_ outside the for loop in 
*registerWithNewSubClusters*
We cannot because we need a different config per UAM, loaded with the 
sub-cluster id

bq. Consider looping on _registrations_ in lieu of _requests_ in 
*sendRequestsToSecondaryResourceManagers*
Registration only contains the newly added secondary sub-clusters, while we 
need to loop over (send heartbeat to) all known secondaries here.

> Federation Intercepting and propagating AM-RM communications (part two: 
> secondary subclusters added)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6511
>                 URL: https://issues.apache.org/jira/browse/YARN-6511
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>         Attachments: YARN-6511-YARN-2915.v1.patch, 
> YARN-6511-YARN-2915.v2.patch, YARN-6511-YARN-2915.v3.patch
>
>
> In order to support transparent "spanning" of jobs across sub-clusters, all 
> AM-RM communications are proxied (via YARN-2884).
> This JIRA tracks federation-specific mechanisms that decide how to 
> "split/broadcast" requests to the RMs and "merge" answers to 
> the AM.
> This the part two jira, which adds secondary subclusters and do full 
> split-merge for requests. Part one is in YARN-3666



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-6511) Federation Intercepting and propagating AM-RM communications (part two: secondary subclusters added)

Reply via email to