[
https://issues.apache.org/jira/browse/YARN-10968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lee young gon updated YARN-10968:
---------------------------------
Attachment: YARN-10968.002.patch
> SchedulingRequests can be wrong when multiple containers stopped at the same
> time
> ---------------------------------------------------------------------------------
>
> Key: YARN-10968
> URL: https://issues.apache.org/jira/browse/YARN-10968
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.1.2
> Reporter: Lee young gon
> Priority: Major
> Attachments: YARN-10968.001.patch, YARN-10968.002.patch
>
>
> There are two ways to request containers to RM through AMRMClientImpl.
> # addContainerRequest
> # addSchedulingRequests
> These two requests are linked to each parameter in Scheduler's allocate()
> {code:java}
> # addContainerRequest <-> ask
> # addSchedulingRequests <-> schedulingRequestspublic Allocation
> allocate(ApplicationAttemptId applicationAttemptId,
> List<ResourceRequest> ask, List<SchedulingRequest> schedulingRequests,
> List<ContainerId> release, List<String> blacklistAdditions,
> List<String> blacklistRemovals, ContainerUpdates updateRequests) {
> FiCaSchedulerApp application =
> getApplicationAttempt(applicationAttemptId);
> {code}
>
> We are using yarn-service and placement_policy, in which case
> addSchedulingRequests is used.
> AddSchedulingRequests have the problems.
> When two containers are terminated at the same time in the presence of a
> placement_policy, AM requests a submitting scheduling request twice as
> follows.
> {code:java}
> 2021-03-31 17:56:07,485 [Component dispatcher] INFO component.Component -
> [COMPONENT sleep] Requesting for 1 container(s)
> 2021-03-31 17:56:07,485 [Component dispatcher] INFO component.Component -
> [COMPONENT sleep] Submitting scheduling request:
> SchedulingRequestPBImpl{priority=0, allocationReqId=0,
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true},
> allocationTags=[testapp],
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512,
> vCores:1>},
> placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp}2021-03-31
> 17:56:07,486 [Component dispatcher] INFO component.Component - [COMPONENT
> sleep] Requesting for 1 container(s)
> 2021-03-31 17:56:07,487 [Component dispatcher] INFO component.Component -
> [COMPONENT sleep] Submitting scheduling request:
> SchedulingRequestPBImpl{priority=0, allocationReqId=0,
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true},
> allocationTags=[testapp],
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512,
> vCores:1>},
> placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp}
> {code}
> And this comes to RM at each request.
> Then if the above request is received, the
> SingleConstrainAppPlaceAllocatorwill have only the last value.
> In other words, if multiple containers die at the same time, multiple
> requests are created, and RM accepts only the final one request and allocates
> it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]