[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-24 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958613#comment-16958613
 ] 

Tarun Parimi commented on YARN-9921:


Thanks for the reviews [~tangzhankun] and [~prabhujoseph#1]

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958468#comment-16958468
 ] 

Hudson commented on YARN-9921:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17565 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17565/])
YARN-9921. Issue in PlacementConstraint when YARN Service AM retries (ztang: 
rev fd84ca5161d171f7e754b9b06623c6118e048066)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SchedulingRequestPBImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/TestSingleConstraintAppPlacementAllocator.java


> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958465#comment-16958465
 ] 

Zhankun Tang commented on YARN-9921:


[~prabhujoseph], Thanks for the review.

[~tarunparimi], Thanks for the patch. Committed to trunk and branch-3.1.

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957707#comment-16957707
 ] 

Prabhu Joseph commented on YARN-9921:
-

[~tangzhankun] The patch looks good. +1 

Thanks [~tarunparimi] for the patch.

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-23 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957693#comment-16957693
 ] 

Zhankun Tang commented on YARN-9921:


[~Prabhu Joseph], [~sunilg], if no more comment. I'll commit it soon

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-21 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955955#comment-16955955
 ] 

Tarun Parimi commented on YARN-9921:


The Findbugs warning is due to the changes done in YARN-9773  and is not 
related to the patch. 

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-21 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955881#comment-16955881
 ] 

Hadoop QA commented on YARN-9921:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
19s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
47s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}170m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9921 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983572/YARN-9921.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a4cac15d457d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 447f46d |
| maven | version: Apache Maven 3.3.

[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-20 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955803#comment-16955803
 ] 

Tarun Parimi commented on YARN-9921:


Thanks for the review [~tangzhankun].

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-20 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955769#comment-16955769
 ] 

Zhankun Tang commented on YARN-9921:


[~tarunparimi], Thanks for reproducing it and find the root cause!  The patch 
looks good to me. +1

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9921.001.patch, differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9921) Issue in PlacementConstraint when YARN Service AM retries allocation on component failure.

2019-10-20 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955755#comment-16955755
 ] 

Tarun Parimi commented on YARN-9921:


On debugging this, I found that the targetExpressions object is considered by 
protobuf as unequal.

This is because the order of elements in targetExpressions is expected to be 
same. But the order can change as we can see below. !differenceProtobuf.png!

The reason the order changes is because we have defined targetExpression as an 
unordered Set.

{code:java}
/**
 * Get the target expressions of the constraint.
 *
 * @return the set of target expressions
 */
public Set getTargetExpressions() {
  return targetExpressions;
}
{code}

But the proto is defined as repeated string. I see in 
https://github.com/protocolbuffers/protobuf/issues/2116 that order is strictly 
checked for repeated fields.

{code:java}
  repeated PlacementConstraintTargetProto targetExpressions = 2;
{code}

I don't think it is safe to make any changes to the proto to handle this issue 
as it can cause backward compatibility/upgrade and other problems.

A simple fix is to change the equals method in SchedulingRequestPBImpl to not 
depend on the equals method of protobuf. Will submit a working patch on this 
soon.
 

 

> Issue in PlacementConstraint when YARN Service AM retries allocation on 
> component failure.
> --
>
> Key: YARN-9921
> URL: https://issues.apache.org/jira/browse/YARN-9921
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: differenceProtobuf.png
>
>
> When YARN Service AM tries to relaunch a container on failure, we encounter 
> the below error in PlacementConstraints.
> {code:java}
> ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.SchedulerInvalidResoureRequestException: 
> Invalid updated SchedulingRequest added to scheduler, we only allows changing 
> numAllocations for the updated SchedulingRequest. 
> Old=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=0, 
> resources=}, 
> placementConstraint=notin,node,llap:notin,node,yarn_node_partition/=[label]} 
> new=SchedulingRequestPBImpl{priority=0, allocationReqId=0, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[component], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, 
> resources=}, 
> placementConstraint=notin,node,component:notin,node,yarn_node_partition/=[label]},
>  if any fields need to be updated, please cancel the old request (by setting 
> numAllocations to 0) and send a SchedulingRequest with different combination 
> of priority/allocationId
> {code}
> But we can see from the message that the SchedulingRequest is indeed valid 
> with everything same except numAllocations as expected. But still the below 
> equals check in SingleConstraintAppPlacementAllocator fails.
> {code:java}
> // Compare two objects
>   if (!schedulingRequest.equals(newSchedulingRequest)) {
> // Rollback #numAllocations
> sizing.setNumAllocations(newNumAllocations);
> throw new SchedulerInvalidResoureRequestException(
> "Invalid updated SchedulingRequest added to scheduler, "
> + " we only allows changing numAllocations for the updated "
> + "SchedulingRequest. Old=" + schedulingRequest.toString()
> + " new=" + newSchedulingRequest.toString()
> + ", if any fields need to be updated, please cancel the "
> + "old request (by setting numAllocations to 0) and send a "
> + "SchedulingRequest with different combination of "
> + "priority/allocationId");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org