[jira] [Created] (YARN-10968) SchedulingRequests can be wrong when multiple containers stopped at the same time

Lee young gon (Jira) Thu, 23 Sep 2021 20:28:08 -0700

Lee young gon created YARN-10968:
------------------------------------

             Summary: SchedulingRequests can be wrong when multiple containers 
stopped at the same time
                 Key: YARN-10968
                 URL: https://issues.apache.org/jira/browse/YARN-10968
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.1.2
            Reporter: Lee young gon



There are two ways to request containers to RM through AMRMClientImpl.
 # addContainerRequest
 # addSchedulingRequests

These two requests are linked to each parameter in Scheduler's allocate()
{code:java}
# addContainerRequest <-> ask
# addSchedulingRequests  <->  schedulingRequestspublic Allocation 
allocate(ApplicationAttemptId applicationAttemptId,
      List<ResourceRequest> ask, List<SchedulingRequest> schedulingRequests,
      List<ContainerId> release, List<String> blacklistAdditions,
      List<String> blacklistRemovals, ContainerUpdates updateRequests) {
    FiCaSchedulerApp application = getApplicationAttempt(applicationAttemptId);
{code}
 

We are using yarn-service and placement_policy, in which case 
addSchedulingRequests is used.

AddSchedulingRequests have the problems.

When two containers are terminated at the same time in the presence of a 
placement_policy, AM requests a submitting scheduling request twice as follows.
{code:java}
2021-03-31 17:56:07,485 [Component  dispatcher] INFO  component.Component - 
[COMPONENT sleep] Requesting for 1 container(s)
2021-03-31 17:56:07,485 [Component  dispatcher] INFO  component.Component - 
[COMPONENT sleep] Submitting scheduling request: 
SchedulingRequestPBImpl{priority=0, allocationReqId=0, executionType={Execution 
Type: GUARANTEED, Enforce Execution Type: true}, allocationTags=[testapp], 
resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512, 
vCores:1>}, 
placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp}2021-03-31
 17:56:07,486 [Component  dispatcher] INFO  component.Component - [COMPONENT 
sleep] Requesting for 1 container(s)
2021-03-31 17:56:07,487 [Component  dispatcher] INFO  component.Component - 
[COMPONENT sleep] Submitting scheduling request: 
SchedulingRequestPBImpl{priority=0, allocationReqId=0, executionType={Execution 
Type: GUARANTEED, Enforce Execution Type: true}, allocationTags=[testapp], 
resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512, 
vCores:1>}, 
placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp} 
{code}
And this comes to RM at each request.

Then if the above request is received, the SingleConstrainAppPlaceAllocatorwill 
have only the last value.

In other words, if multiple containers die at the same time, multiple requests 
are created, and RM accepts only the final one request and allocates it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (YARN-10968) SchedulingRequests can be wrong when multiple containers stopped at the same time

Reply via email to