[ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781
 ] 

Tao Yang commented on YARN-9313:
--------------------------------

Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to represent 
multiple nodes.
2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
4. Maintain the relationships between input node and activities key. For 
multi-nodes placement scenario, input node can be a special node or null, the 
activities index should be ActivitiesManager#MULTI_NODES_AGENT and activities 
info should be a special node or ActivitiesManager#MULTI_NODES_AGENT. Thus we 
need to transform nodeId somewhere to make it work: (1) Input nodeId should be 
a special nodeId if input node is not null and should be 
ActivitiesManager#MULTI_NODES_AGENT if input node is null and multi-nodes is 
recording, input nodeId should be updated properly in ActivitiesLogger. (2) 
When recording activities, input node could be a special node but activities 
key should be ActivitiesManager#MULTI_NODES_AGENT, so that we need to get 
correct recording key at the head of ActivitiesManager#getCurrentNodeAllocation 
and still recording the special nodeId in activities info.
5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
6. ActivitiesManager#recordingNodesAllocation should be updated to be a 
thread-local variable to avoid recording mixed activities from multiple 
scheduling processes in asynchronized scheduling mode.
7. Add TestActivitiesManager to test multiple threads can run without 
interference for normal scenario and multi-nodes enabled scenario.
8. Update check logic in 
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
 since collection logic of scheduler activities changed after this patch and 
only one allocation should be recorded for all scenarios.
9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test 
recording scheduler activities with multi-nodes enabled.

> Support asynchronized scheduling mode and multi-node lookup mechanism for 
> scheduler activities
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-9313
>                 URL: https://issues.apache.org/jira/browse/YARN-9313
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-9313.001.patch
>
>
> [Design 
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to