[
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781
]
Tao Yang commented on YARN-9313:
--------------------------------
Descriptions of key changes in this patch are as follows, hope someone can help
for the review:
1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to represent
multiple nodes.
2. Place the start/finish points of scheduler activities in front of/after the
allocation based on single node (input node is a real node) or multiple nodes
(input node is ActivitiesManager#MULTI_NODES_AGENT) in
CapacityScheduler#allocateContainersToNode instead of
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified
entrance and exit.
3. After initializing activities, activeRecordedNodes should remove current
active node in ActivitiesManager#startNodeUpdateRecording to make sure current
activities process can only be started once.
4. Maintain the relationships between input node and activities key. For
multi-nodes placement scenario, input node can be a special node or null, the
activities index should be ActivitiesManager#MULTI_NODES_AGENT and activities
info should be a special node or ActivitiesManager#MULTI_NODES_AGENT. Thus we
need to transform nodeId somewhere to make it work: (1) Input nodeId should be
a special nodeId if input node is not null and should be
ActivitiesManager#MULTI_NODES_AGENT if input node is null and multi-nodes is
recording, input nodeId should be updated properly in ActivitiesLogger. (2)
When recording activities, input node could be a special node but activities
key should be ActivitiesManager#MULTI_NODES_AGENT, so that we need to get
correct recording key at the head of ActivitiesManager#getCurrentNodeAllocation
and still recording the special nodeId in activities info.
5. Update the if clauses at the head of several methods in ActivitiesLogger to
relax restrictions(only for non-null node now) on scheduler activities.
6. ActivitiesManager#recordingNodesAllocation should be updated to be a
thread-local variable to avoid recording mixed activities from multiple
scheduling processes in asynchronized scheduling mode.
7. Add TestActivitiesManager to test multiple threads can run without
interference for normal scenario and multi-nodes enabled scenario.
8. Update check logic in
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
since collection logic of scheduler activities changed after this patch and
only one allocation should be recorded for all scenarios.
9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test
recording scheduler activities with multi-nodes enabled.
> Support asynchronized scheduling mode and multi-node lookup mechanism for
> scheduler activities
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-9313
> URL: https://issues.apache.org/jira/browse/YARN-9313
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Attachments: YARN-9313.001.patch
>
>
> [Design
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]