[ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781
 ] 

Tao Yang edited comment on YARN-9313 at 2/20/19 9:38 AM:
---------------------------------------------------------

Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
 1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to 
represent multiple nodes.
 2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
 3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
 4. Maintain the relationships among input node, nodeId key of 
recordingNodeAllocation and nodeId in activities info. For multi-nodes 
placement scenario, input node can be a special node or null, the nodeId key of 
recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT and the 
nodeId in activities info should be a special node or 
ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in 
recording key or nodeId in activities info based on input node: (1) nodeId 
should be the nodeId of input node which is not null, and should be 
ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile 
multi-nodes is enabled, somewhere should be updated properly in 
ActivitiesLogger. (2) When recording activities, nodeId in activities info 
could be a special node but nodeId key of recordingNodeAllocation should be 
ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording 
key at the head of ActivitiesManager#getCurrentNodeAllocation and still 
recording the nodeId of input node in activities info.
 5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
 6. ActivitiesManager#recordingNodesAllocation should be updated to be a 
thread-local variable to avoid recording mixed activities from multiple 
scheduling processes in asynchronized scheduling mode.
 7. Add TestActivitiesManager to test multiple threads can run without 
interference for normal scenario and multi-nodes enabled scenario.
 8. Update check logic in 
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
 since collection logic of scheduler activities changed after this patch and 
only one allocation should be recorded for all scenarios.
 9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test 
recording scheduler activities with multi-nodes enabled.


was (Author: tao yang):
Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
 1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to 
represent multiple nodes.
 2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
 3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
 4. Maintain the relationships between input node and recording key. For 
multi-nodes placement scenario, input node can be a special node or null, the 
nodeId in recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT 
and the nodeId in activities info should be a special node or 
ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in 
recording key or nodeId in activities info based on input node: (1) nodeId 
should be the nodeId of input node which is not null, and should be 
ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile 
multi-nodes is enabled, somewhere should be updated properly in 
ActivitiesLogger. (2) When recording activities, nodeId in activities info 
could be a special node but in recordingNodeAllocation nodeId should be 
ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording 
key at the head of ActivitiesManager#getCurrentNodeAllocation and still 
recording the nodeId of input node in activities info.
 5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
 6. ActivitiesManager#recordingNodesAllocation should be updated to be a 
thread-local variable to avoid recording mixed activities from multiple 
scheduling processes in asynchronized scheduling mode.
 7. Add TestActivitiesManager to test multiple threads can run without 
interference for normal scenario and multi-nodes enabled scenario.
 8. Update check logic in 
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
 since collection logic of scheduler activities changed after this patch and 
only one allocation should be recorded for all scenarios.
 9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test 
recording scheduler activities with multi-nodes enabled.

> Support asynchronized scheduling mode and multi-node lookup mechanism for 
> scheduler activities
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-9313
>                 URL: https://issues.apache.org/jira/browse/YARN-9313
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-9313.001.patch
>
>
> [Design 
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to