[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958613#comment-16958613 ]
Tarun Parimi commented on YARN-9921: ------------------------------------ Thanks for the reviews [~tangzhankun] and [~prabhujoseph#1] > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > ------------------------------------------------------------------------------------------ > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.1.0 > Reporter: Tarun Parimi > Assignee: Tarun Parimi > Priority: Major > Fix For: 3.3.0, 3.1.4 > > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=<memory:557056, vCores:1>}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=<memory:557056, vCores:1>}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org