[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763298#comment-17763298
 ] 

ASF GitHub Bot commented on YARN-8980:
--

slfan1989 commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1712381663

   @zhengchenyu Thanks for your contribution! Merged Into Trunk.




> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763297#comment-17763297
 ] 

ASF GitHub Bot commented on YARN-8980:
--

slfan1989 merged PR #5975:
URL: https://github.com/apache/hadoop/pull/5975




> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763081#comment-17763081
 ] 

ASF GitHub Bot commented on YARN-8980:
--

hadoop-yetus commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1711615691

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 13s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  39m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   2m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  38m 57s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 118m 46s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  unit  |  24m  1s |  |  hadoop-yarn-server-nodemanager 
in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 310m 32s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5975 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c9f4fe6406da 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ed529f049b6f82bdf7876b5f8f923430c8551f68 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/6/testReport/ |
   | Max. process+thread count | 900 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760904#comment-17760904
 ] 

ASF GitHub Bot commented on YARN-8980:
--

slfan1989 commented on code in PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#discussion_r1311548384


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingUnmanagedAM.java:
##
@@ -142,14 +156,124 @@ protected void testUAMRestart(boolean keepContainers) 
throws Exception {
 numContainers = 1;
 am.allocate("127.0.0.1", 1000, numContainers, new 
ArrayList());
 nm.nodeHeartbeat(true);
-conts = am.allocate(new ArrayList(),
-new ArrayList()).getAllocatedContainers();
+allocateResponse = am.allocate(new ArrayList(), new 
ArrayList());
+allocateResponse.getNMTokens().forEach(token -> 
tokenCacheClientSide.add(token.getNodeId()));
+conts = allocateResponse.getAllocatedContainers();
 while (conts.size() < numContainers) {
   nm.nodeHeartbeat(true);
-  conts.addAll(am.allocate(new ArrayList(),
-  new ArrayList()).getAllocatedContainers());
+  allocateResponse =
+  am.allocate(new ArrayList(), new 
ArrayList());
+  allocateResponse.getNMTokens().forEach(token -> 
tokenCacheClientSide.add(token.getNodeId()));
+  conts.addAll(allocateResponse.getAllocatedContainers());
   Thread.sleep(100);
 }
+checkNMTokenForContainer(tokenCacheClientSide, conts);
+
+rm.stop();
+  }
+
+  protected void testUAMRestartWithoutTransferContainer(boolean 
keepContainers) throws Exception {
+// start RM
+MockRM rm = new MockRM();
+rm.start();
+MockNM nm =
+new MockNM("127.0.0.1:1234", 15120, rm.getResourceTrackerService());
+nm.registerNode();
+Set tokenCacheClientSide = new HashSet();
+
+// create app and launch the UAM
+boolean unamanged = true;
+int maxAttempts = 1;
+boolean waitForAccepted = true;
+MockRMAppSubmissionData data =
+MockRMAppSubmissionData.Builder.createWithMemory(200, rm)
+.withAppName("")
+.withUser(UserGroupInformation.getCurrentUser().getShortUserName())
+.withAcls(null)
+.withUnmanagedAM(unamanged)
+.withQueue(null)
+.withMaxAppAttempts(maxAttempts)
+.withCredentials(null)
+.withAppType(null)
+.withWaitForAppAcceptedState(waitForAccepted)
+.withKeepContainers(keepContainers)
+.build();
+RMApp app = MockRMAppSubmitter.submit(rm, data);
+
+MockAM am = MockRM.launchUAM(app, rm, nm);
+
+// Register for the first time
+am.registerAppAttempt();
+
+// Allocate two containers to UAM
+int numContainers = 3;
+AllocateResponse allocateResponse =
+am.allocate("127.0.0.1", 1000, numContainers, new 
ArrayList());
+allocateResponse.getNMTokens().forEach(token -> 
tokenCacheClientSide.add(token.getNodeId()));
+List conts = allocateResponse.getAllocatedContainers();
+while (conts.size() < numContainers) {
+  nm.nodeHeartbeat(true);
+  allocateResponse =
+  am.allocate(new ArrayList(), new 
ArrayList());
+  allocateResponse.getNMTokens().forEach(token -> 
tokenCacheClientSide.add(token.getNodeId()));
+  conts.addAll(allocateResponse.getAllocatedContainers());
+  Thread.sleep(100);
+}
+checkNMTokenForContainer(tokenCacheClientSide, conts);
+
+// Release all containers, then there are no transfer containfer app 
attempt
+List releaseList = new ArrayList();
+releaseList.add(conts.get(0).getId());
+releaseList.add(conts.get(1).getId());
+releaseList.add(conts.get(2).getId());
+List finishedConts =
+am.allocate(new ArrayList(), releaseList)
+.getCompletedContainersStatuses();
+while (finishedConts.size() < releaseList.size()) {
+  nm.nodeHeartbeat(true);
+  finishedConts
+  .addAll(am
+  .allocate(new ArrayList(),
+  new ArrayList())
+  .getCompletedContainersStatuses());
+  Thread.sleep(100);
+}
+
+// Register for the second time
+RegisterApplicationMasterResponse response = null;
+try {
+  response = am.registerAppAttempt(false);
+  // When AM restart, it means nmToken in client side should be missing
+  tokenCacheClientSide.clear();
+  response.getNMTokensFromPreviousAttempts()
+  .forEach(token -> tokenCacheClientSide.add(token.getNodeId()));
+} catch (InvalidApplicationMasterRequestException e) {
+  Assert.assertEquals(false, keepContainers);
+  return;
+}
+Assert.assertEquals("RM should not allow second register"
++ " for UAM without keep container flag ", true, keepContainers);
+
+// Expecting the zero running containers 

[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760902#comment-17760902
 ] 

ASF GitHub Bot commented on YARN-8980:
--

slfan1989 commented on code in PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#discussion_r1311543472


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java:
##
@@ -1434,6 +1449,17 @@ private void mergeAllocateResponses(AllocateResponse 
mergedResponse) {
 }
   }
 }
+// When re-register RM, client may not cache the NMToken from register 
response.
+// Here we pass these NMToken in allocate stage.
+if (nmTokenMapFromRegisterSecondaryCluster.size() > 0) {
+  List duplicateNmToken = new 
ArrayList(nmTokenMapFromRegisterSecondaryCluster);

Review Comment:
   Why do we need to remove the token data from 
`nmTokenMapFromRegisterSecondaryCluster`?





> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760900#comment-17760900
 ] 

ASF GitHub Bot commented on YARN-8980:
--

slfan1989 commented on code in PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#discussion_r1311543472


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java:
##
@@ -1434,6 +1449,17 @@ private void mergeAllocateResponses(AllocateResponse 
mergedResponse) {
 }
   }
 }
+// When re-register RM, client may not cache the NMToken from register 
response.
+// Here we pass these NMToken in allocate stage.
+if (nmTokenMapFromRegisterSecondaryCluster.size() > 0) {
+  List duplicateNmToken = new 
ArrayList(nmTokenMapFromRegisterSecondaryCluster);

Review Comment:
   If nmTokenMapFromRegisterSecondaryCluster is already a set, why is 
deduplication necessary?





> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760897#comment-17760897
 ] 

ASF GitHub Bot commented on YARN-8980:
--

slfan1989 commented on code in PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#discussion_r1311542321


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java:
##
@@ -260,6 +261,16 @@ public class FederationInterceptor extends 
AbstractRequestInterceptor {
 
   private final MonotonicClock clock = new MonotonicClock();
 
+  /*
+   * For UAM, keepContainersAcrossApplicationAttempts is always true.
+   * When re-register to RM, RM will clear node set and regenerate NMToken for 
transferred
+   * container. But If keepContainersAcrossApplicationAttempts of MA is false, 
AM may not
+   * called getNMTokensFromPreviousAttempts, so the NMToken which is pass from
+   * RegisterApplicationMasterResponse will be missing. Here we cache these 
NMToken,
+   * then pass to AM in allocate stage.
+   * */
+  private Set nmTokenMapFromRegisterSecondaryCluster;

Review Comment:
   Using Set is feasible, but should we consider using 
Map> for better differentiation of NMToken lists for 
each subcluster?





> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760891#comment-17760891
 ] 

ASF GitHub Bot commented on YARN-8980:
--

slfan1989 commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1700908318

   @zhengchenyu Thanks for your contribution! LGTM.




> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759052#comment-17759052
 ] 

ASF GitHub Bot commented on YARN-8980:
--

hadoop-yetus commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1693371460

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 11s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m  6s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   2m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   2m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   2m 26s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 41s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   4m  7s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 109m 32s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  unit  |  25m 13s |  |  hadoop-yarn-server-nodemanager 
in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 303m 47s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5975 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 8e8419904a3c 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / dcefe058c926eda05c294168f4b62d5d3e28d373 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/3/testReport/ |
   | Max. process+thread count | 908 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759038#comment-17759038
 ] 

ASF GitHub Bot commented on YARN-8980:
--

hadoop-yetus commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1693327592

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  8s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  30m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   2m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  32m 49s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   2m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   2m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  33m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 100m 43s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  unit  |  24m 23s |  |  hadoop-yarn-server-nodemanager 
in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 269m 53s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5975 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d9d17e4b84ba 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / dcefe058c926eda05c294168f4b62d5d3e28d373 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/4/testReport/ |
   | Max. process+thread count | 937 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758890#comment-17758890
 ] 

ASF GitHub Bot commented on YARN-8980:
--

zhengchenyu opened a new pull request, #5975:
URL: https://github.com/apache/hadoop/pull/5975

   ### Description of PR
   
   In order to avoid repeatedly passing NMToken to an Applicaiton, 
ResourceManager introduces NMTokenSecretManagerInRM, in which 
appAttemptToNodeKeyMap records which Nodes have applied for Token, here in the 
AppAttempt dimension. 
   For UAM, there is only one AppAttempt. Therefore, after UAM restarts, the 
previous NMToken will be lost. However, since 
NMTokenSecretManagerInRM::appAttemptToNodeKeyMap is not clear, the 
ResourceManager task will not resend the applied NMToken. So it will report the 
error that NMToken is lost. The specific errors are as follows:
   
   ```
   No NMToken sent for XX_HOST:XX_PORT 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:262)
 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:252)
 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:137)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:433)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:146)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
 
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:748)
   ```
   
   For now, when UAM is restart and re-registered, appAttemptToNodeKeyMap will 
be cleared only when there are transferredContainers.
   
   ### How was this patch tested?
   
   unit test and test in real cluster.
   
   
   ### For code changes:
   
   Just move the clear code forward.
   
   | getKeepContainersAcrossApplicationAttempts |  getUnmanagedAM | effect |
   | -

> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758845#comment-17758845
 ] 

ASF GitHub Bot commented on YARN-8980:
--

zhengchenyu closed pull request #5975: YARN-8980. Mapreduce application 
container start fail after AM restart.
URL: https://github.com/apache/hadoop/pull/5975




> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758796#comment-17758796
 ] 

ASF GitHub Bot commented on YARN-8980:
--

zhengchenyu commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1692675962

   > ```
   > List transferredContainers = 
getScheduler().getTransferredContainers(applicationAttemptId);
   > if (!transferredContainers.isEmpty()) {
   > response.setContainersFromPreviousAttempts(transferredContainers);
   > 
rmContext.getNMTokenSecretManager().clearNodeSetForAttempt(applicationAttemptId);
   > }
   > ```
   > 
   > **In this code, the main operations are retrieving transferred containers, 
updating the response, and clearing the node set. All of these operations are 
O(1), and they are not nested or iterated over a collection. Therefore, the 
overall time complexity of this code is O(1).**
   
   @whoami-anoint 
   
   Do you mean the complexity of this code is not O(1) after this PR? 
   I think the complexity of this code still is O(1) after this PR.




> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757873#comment-17757873
 ] 

ASF GitHub Bot commented on YARN-8980:
--

zhengchenyu commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1689488659

   @goiri @slfan1989 Can you please review this PR?
   
   There is another issue to be discussed here. 
   Here when submit UAM for federated application, 
keepContainersAcrossApplicationAttempts is always true. 
   As YARN-8898 is resloved, do we need to pass this value according to the 
original applicationSubmissionContext?
   For me, I think this value should be fixed value 'true', so I did not change 
this. What do you think?




> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757813#comment-17757813
 ] 

ASF GitHub Bot commented on YARN-8980:
--

hadoop-yetus commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1689305261

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 29s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 40s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  86m  2s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 175m 29s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5975 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 673b9bcf80d7 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9e8388737e164b959fb345c927ed7933d36434ce |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/2/testReport/ |
   | Max. process+thread count | 950 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This 

[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757458#comment-17757458
 ] 

ASF GitHub Bot commented on YARN-8980:
--

hadoop-yetus commented on PR #5975:
URL: https://github.com/apache/hadoop/pull/5975#issuecomment-1688157278

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 57s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  5s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   2m  3s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 57s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 48s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/1/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 57s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 101m  8s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 234m 37s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5975 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e3188c65ed23 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f6c6033c40f2539acb73648415b53651a7a339b3 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5975/1/testReport/ |
   | Max. process+thread count | 948 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
   | Console output | 

[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-08-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757299#comment-17757299
 ] 

ASF GitHub Bot commented on YARN-8980:
--

zhengchenyu opened a new pull request, #5975:
URL: https://github.com/apache/hadoop/pull/5975

   
   ### Description of PR
   
   In order to avoid repeatedly passing NMToken to an Applicaiton, 
ResourceManager introduces NMTokenSecretManagerInRM, in which 
appAttemptToNodeKeyMap records which Nodes have applied for Token, here in the 
AppAttempt dimension. 
   For UAM, there is only one AppAttempt. Therefore, after UAM restarts, the 
previous NMToken will be lost. However, since 
NMTokenSecretManagerInRM::appAttemptToNodeKeyMap is not clear, the 
ResourceManager task will not resend the applied NMToken. So it will report the 
error that NMToken is lost. The specific errors are as follows:
   
   ```
   No NMToken sent for XX_HOST:XX_PORT 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:262)
 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:252)
 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:137)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:433)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:146)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
 
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:748)
   ```
   
   ### How was this patch tested?
   
   unit test and test in real cluster.
   
   
   ### For code changes:
   
   For now, when the current UAM is re-registered, appAttemptToNodeKeyMap will 
be cleared only when there are transferredContainers. Just move the clear code 
forward.
   




> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Shilun Fan
>Priority: Major
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2023-05-18 Thread walhl.liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724130#comment-17724130
 ] 

walhl.liu commented on YARN-8980:
-

I wonder if uma supporting multi-attempt can solve this problem?

> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Shilun Fan
>Priority: Major
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2022-08-15 Thread fanshilun (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579956#comment-17579956
 ] 

fanshilun commented on YARN-8980:
-

I will continue to follow up on this pr.

> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Priority: Major
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2018-11-11 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682962#comment-16682962
 ] 

Botong Huang commented on YARN-8980:


I agree. I am also worried about container leaks, since the new attempt (old) 
AM is not even aware of the existing containers from the UAMs. Note that RM 
only supports one attempt for UAM and this UAM attempt is used throughout all 
AM attempts in home SC.

I think on top of 1 you mentioned (clear token cache in RM), 
_FederationInterceptor_ needs to know the _keepContainer_ flag of the original 
AM. If it is false, after reattaching to the UAMs in 
_registerApplicationMaster_ it needs to release all running containers from UAM.

> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Priority: Major
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2018-11-10 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682770#comment-16682770
 ] 

Bibin A Chundatt commented on YARN-8980:


[~botong]/[~subru]

Issue is not completely related to YARN-8898 discussion but one of the solution 
depends on that (Solution 2)

AMProxy HA works by registering UAM with same application attempt ID . 
ApplicationMasterService#registerApplicationMaster
{code:java}
if (!(appContext.getUnmanagedAM()
&& appContext.getKeepContainersAcrossApplicationAttempts())) {
{code}
Solutions
 # DefaultAMSProcessor#registerApplicationMaster clear NMsecretManager after 
previous attempt containers are set.This will make sure allocated containers 
get NMTokens again for same hostname.
{code:java}
ApplicationSubmissionContext applicationSubmissionContext =
app.getApplicationSubmissionContext();
if (applicationSubmissionContext.getUnmanagedAM()
&& applicationSubmissionContext
.getKeepContainersAcrossApplicationAttempts()) {
  rmContext.getNMTokenSecretManager()
  .clearNodeSetForAttempt(applicationAttemptId);
}
response.setSchedulerResourceTypes(
getScheduler().getSchedulingResourceTypes());
{code}
 # Handle at FederationInterceptor to add token received in recovery to first 
allocate response .

> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Priority: Major
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2018-11-10 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682485#comment-16682485
 ] 

Botong Huang commented on YARN-8980:


Thanks [~bibinchundatt] for reporting. This is along the discussion we are 
having in YARN-8898. Basically it is better to use the original 
_ApplicationSubmissionContext_ for the app when launching the UAMs. We will 
probably need to go with Solution 2 discussed there: Push 
applicationSubmissionContext also to federationStore at router side. [~subru] 
what do you think? 

> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Priority: Major
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8980) Mapreduce application container start fail after AM restart.

2018-11-09 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682255#comment-16682255
 ] 

Bibin A Chundatt commented on YARN-8980:


cc : [~botong]

Mapreduce application for initial containers after restart containers are 
assigned without NMToken, ContainerLaunches are failing with invalid token for 
containers assigned from secondary subclusters.


> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Priority: Major
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org