[
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tarun Parimi updated YARN-9921:
-------------------------------
Attachment: YARN-9921.001.patch
> Issue in PlacementConstraint when YARN Service AM retries allocation on
> component failure.
> ------------------------------------------------------------------------------------------
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Tarun Parimi
> Assignee: Tarun Parimi
> Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException:
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException:
> Invalid updated SchedulingRequest added to scheduler, we only allows changing
> numAllocations for the updated SchedulingRequest.
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0,
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true},
> allocationTags=[component],
> resourceSizing=ResourceSizingPBImpl{numAllocations=0,
> resources=<memory:557056, vCores:1>},
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]}
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0,
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true},
> allocationTags=[component],
> resourceSizing=ResourceSizingPBImpl{numAllocations=1,
> resources=<memory:557056, vCores:1>},
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
> if any fields need to be updated, please cancel the old request (by setting
> numAllocations to 0) and send a SchedulingRequest with different combination
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid
> with everything same except numAllocations as expected. But still the below
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
> if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]