[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958613#comment-16958613 ] Tarun Parimi commented on YARN-9921: Thanks for the reviews [~tangzhankun] and [~prabhujoseph#1] > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Fix For: 3.3.0, 3.1.4 > > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958468#comment-16958468 ] Hudson commented on YARN-9921: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17565 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17565/]) YARN-9921. Issue in PlacementConstraint when YARN Service AM retries (ztang: rev fd84ca5161d171f7e754b9b06623c6118e048066) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SchedulingRequestPBImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/TestSingleConstraintAppPlacementAllocator.java > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Fix For: 3.3.0, 3.1.4 > > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958465#comment-16958465 ] Zhankun Tang commented on YARN-9921: [~prabhujoseph], Thanks for the review. [~tarunparimi], Thanks for the patch. Committed to trunk and branch-3.1. > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957707#comment-16957707 ] Prabhu Joseph commented on YARN-9921: - [~tangzhankun] The patch looks good. +1 Thanks [~tarunparimi] for the patch. > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957693#comment-16957693 ] Zhankun Tang commented on YARN-9921: [~Prabhu Joseph], [~sunilg], if no more comment. I'll commit it soon > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955955#comment-16955955 ] Tarun Parimi commented on YARN-9921: The Findbugs warning is due to the changes done in YARN-9773 and is not related to the patch. > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955881#comment-16955881 ] Hadoop QA commented on YARN-9921: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 40s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 47s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}170m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9921 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983572/YARN-9921.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a4cac15d457d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 447f46d | | maven | version: Apache Maven 3.3.
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955803#comment-16955803 ] Tarun Parimi commented on YARN-9921: Thanks for the review [~tangzhankun]. > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955769#comment-16955769 ] Zhankun Tang commented on YARN-9921: [~tarunparimi], Thanks for reproducing it and find the root cause! The patch looks good to me. +1 > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9921.001.patch, differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.
[ https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955755#comment-16955755 ] Tarun Parimi commented on YARN-9921: On debugging this, I found that the targetExpressions object is considered by protobuf as unequal. This is because the order of elements in targetExpressions is expected to be same. But the order can change as we can see below. !differenceProtobuf.png! The reason the order changes is because we have defined targetExpression as an unordered Set. {code:java} /** * Get the target expressions of the constraint. * * @return the set of target expressions */ public Set getTargetExpressions() { return targetExpressions; } {code} But the proto is defined as repeated string. I see in https://github.com/protocolbuffers/protobuf/issues/2116 that order is strictly checked for repeated fields. {code:java} repeated PlacementConstraintTargetProto targetExpressions = 2; {code} I don't think it is safe to make any changes to the proto to handle this issue as it can cause backward compatibility/upgrade and other problems. A simple fix is to change the equals method in SchedulingRequestPBImpl to not depend on the equals method of protobuf. Will submit a working patch on this soon. > Issue in PlacementConstraint when YARN Service AM retries allocation on > component failure. > -- > > Key: YARN-9921 > URL: https://issues.apache.org/jira/browse/YARN-9921 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: differenceProtobuf.png > > > When YARN Service AM tries to relaunch a container on failure, we encounter > the below error in PlacementConstraints. > {code:java} > ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: > Invalid updated SchedulingRequest added to scheduler, we only allows changing > numAllocations for the updated SchedulingRequest. > Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=0, > resources=}, > placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} > new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[component], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, > resources=}, > placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]}, > if any fields need to be updated, please cancel the old request (by setting > numAllocations to 0) and send a SchedulingRequest with different combination > of priority/allocationId > {code} > But we can see from the message that the SchedulingRequest is indeed valid > with everything same except numAllocations as expected. But still the below > equals check in SingleConstraintAppPlacementAllocator fails. > {code:java} > // Compare two objects > if (!schedulingRequest.equals(newSchedulingRequest)) { > // Rollback #numAllocations > sizing.setNumAllocations(newNumAllocations); > throw new SchedulerInvalidResoureRequestException( > "Invalid updated SchedulingRequest added to scheduler, " > + " we only allows changing numAllocations for the updated " > + "SchedulingRequest. Old=" + schedulingRequest.toString() > + " new=" + newSchedulingRequest.toString() > + ", if any fields need to be updated, please cancel the " > + "old request (by setting numAllocations to 0) and send a " > + "SchedulingRequest with different combination of " > + "priority/allocationId"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org