[
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781
]
Tao Yang edited comment on YARN-9313 at 2/20/19 9:38 AM:
---------------------------------------------------------
Descriptions of key changes in this patch are as follows, hope someone can help
for the review:
1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to
represent multiple nodes.
2. Place the start/finish points of scheduler activities in front of/after the
allocation based on single node (input node is a real node) or multiple nodes
(input node is ActivitiesManager#MULTI_NODES_AGENT) in
CapacityScheduler#allocateContainersToNode instead of
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified
entrance and exit.
3. After initializing activities, activeRecordedNodes should remove current
active node in ActivitiesManager#startNodeUpdateRecording to make sure current
activities process can only be started once.
4. Maintain the relationships among input node, nodeId key of
recordingNodeAllocation and nodeId in activities info. For multi-nodes
placement scenario, input node can be a special node or null, the nodeId key of
recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT and the
nodeId in activities info should be a special node or
ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in
recording key or nodeId in activities info based on input node: (1) nodeId
should be the nodeId of input node which is not null, and should be
ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile
multi-nodes is enabled, somewhere should be updated properly in
ActivitiesLogger. (2) When recording activities, nodeId in activities info
could be a special node but nodeId key of recordingNodeAllocation should be
ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording
key at the head of ActivitiesManager#getCurrentNodeAllocation and still
recording the nodeId of input node in activities info.
5. Update the if clauses at the head of several methods in ActivitiesLogger to
relax restrictions(only for non-null node now) on scheduler activities.
6. ActivitiesManager#recordingNodesAllocation should be updated to be a
thread-local variable to avoid recording mixed activities from multiple
scheduling processes in asynchronized scheduling mode.
7. Add TestActivitiesManager to test multiple threads can run without
interference for normal scenario and multi-nodes enabled scenario.
8. Update check logic in
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
since collection logic of scheduler activities changed after this patch and
only one allocation should be recorded for all scenarios.
9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test
recording scheduler activities with multi-nodes enabled.
was (Author: tao yang):
Descriptions of key changes in this patch are as follows, hope someone can help
for the review:
1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to
represent multiple nodes.
2. Place the start/finish points of scheduler activities in front of/after the
allocation based on single node (input node is a real node) or multiple nodes
(input node is ActivitiesManager#MULTI_NODES_AGENT) in
CapacityScheduler#allocateContainersToNode instead of
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified
entrance and exit.
3. After initializing activities, activeRecordedNodes should remove current
active node in ActivitiesManager#startNodeUpdateRecording to make sure current
activities process can only be started once.
4. Maintain the relationships between input node and recording key. For
multi-nodes placement scenario, input node can be a special node or null, the
nodeId in recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT
and the nodeId in activities info should be a special node or
ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in
recording key or nodeId in activities info based on input node: (1) nodeId
should be the nodeId of input node which is not null, and should be
ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile
multi-nodes is enabled, somewhere should be updated properly in
ActivitiesLogger. (2) When recording activities, nodeId in activities info
could be a special node but in recordingNodeAllocation nodeId should be
ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording
key at the head of ActivitiesManager#getCurrentNodeAllocation and still
recording the nodeId of input node in activities info.
5. Update the if clauses at the head of several methods in ActivitiesLogger to
relax restrictions(only for non-null node now) on scheduler activities.
6. ActivitiesManager#recordingNodesAllocation should be updated to be a
thread-local variable to avoid recording mixed activities from multiple
scheduling processes in asynchronized scheduling mode.
7. Add TestActivitiesManager to test multiple threads can run without
interference for normal scenario and multi-nodes enabled scenario.
8. Update check logic in
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
since collection logic of scheduler activities changed after this patch and
only one allocation should be recorded for all scenarios.
9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test
recording scheduler activities with multi-nodes enabled.
> Support asynchronized scheduling mode and multi-node lookup mechanism for
> scheduler activities
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-9313
> URL: https://issues.apache.org/jira/browse/YARN-9313
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Attachments: YARN-9313.001.patch
>
>
> [Design
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]