[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597578#comment-17597578
 ] 

ASF GitHub Bot commented on YARN-11177:
---

hadoop-yetus commented on PR #4764:
URL: https://github.com/apache/hadoop/pull/4764#issuecomment-1231217808

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  8s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 6 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  8s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  29m  4s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   4m 14s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   3m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 20s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m  2s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   4m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 51s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4764/16/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt)
 |  hadoop-yarn-server-common in the patch passed.  |
   | +1 :green_heart: |  unit  | 102m 16s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  unit  |   3m 33s |  |  hadoop-yarn-server-router in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 246m 31s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.server.federation.policies.router.TestLoadBasedRouterPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4764/16/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4764 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux fed9318e6f1f 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git r

[jira] [Commented] (YARN-11284) [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597530#comment-17597530
 ] 

ASF GitHub Bot commented on YARN-11284:
---

hadoop-yetus commented on PR #4814:
URL: https://github.com/apache/hadoop/pull/4814#issuecomment-1231150099

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 49s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 18s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 20s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4814/3/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt)
 |  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 38s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 46s |  |  hadoop-yarn-server-common in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 105m  8s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4814/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4814 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux ceb57739187d 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5d70401fb7e968fc45a9eadc9d337adba1116fcd |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4814/3/testReport/ |
   | Max

[jira] [Resolved] (YARN-11196) NUMA Awareness support in DefaultContainerExecutor

2022-08-29 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph resolved YARN-11196.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> NUMA Awareness support in DefaultContainerExecutor
> --
>
> Key: YARN-11196
> URL: https://issues.apache.org/jira/browse/YARN-11196
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.3.3
>Reporter: Prabhu Joseph
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> [YARN-5764|https://issues.apache.org/jira/browse/YARN-5764] has added support 
> of NUMA Awareness for Containers launched through LinuxContainerExecutor. 
> This feature is useful to have in DefaultContainerExecutor as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11196) NUMA Awareness support in DefaultContainerExecutor

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597527#comment-17597527
 ] 

ASF GitHub Bot commented on YARN-11196:
---

PrabhuJoseph merged PR #4742:
URL: https://github.com/apache/hadoop/pull/4742




> NUMA Awareness support in DefaultContainerExecutor
> --
>
> Key: YARN-11196
> URL: https://issues.apache.org/jira/browse/YARN-11196
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.3.3
>Reporter: Prabhu Joseph
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>
> [YARN-5764|https://issues.apache.org/jira/browse/YARN-5764] has added support 
> of NUMA Awareness for Containers launched through LinuxContainerExecutor. 
> This feature is useful to have in DefaultContainerExecutor as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597519#comment-17597519
 ] 

ASF GitHub Bot commented on YARN-6667:
--

hadoop-yetus commented on PR #4810:
URL: https://github.com/apache/hadoop/pull/4810#issuecomment-1231126792

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 43s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  2s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 52s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  21m 40s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  24m  1s |  |  hadoop-yarn-server-nodemanager 
in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 122m 12s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4810/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4810 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 980b2cd3c46d 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 554719b7b2c672abcb099a589b288861b2e9de31 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4810/9/testReport/ |
   | Max. process+thread count | 675 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-se

[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597508#comment-17597508
 ] 

ASF GitHub Bot commented on YARN-11273:
---

slfan1989 commented on code in PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#discussion_r957975795


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/SQLFederationStateStore.java:
##
@@ -1016,30 +1035,275 @@ private static byte[] getByteArray(ByteBuffer bb) {
   @Override
   public AddReservationHomeSubClusterResponse addReservationHomeSubCluster(

Review Comment:
   Thank you for your suggestion.
   
   I will explain this part of the test:
   
   In YARN-11272, we implemented store Reservations in ZK, in this 
pr(https://github.com/apache/hadoop/pull/4781) we defined the test of the 
method in FederationStateStoreBaseTest.java
   
   ```
   testAddReservationHomeSubCluster()
   testAddReservationHomeSubClusterReservationAlreadyExists()
   testAddReservationHomeSubClusterAppAlreadyExistsInTheSameSC()
   testDeleteReservationHomeSubCluster()
   testDeleteReservationHomeSubClusterUnknownApp()
   testUpdateReservationHomeSubCluster()
   ```
   
   Because TestSQLFederationStateStore extends FederationStateStoreBaseTest, we 
can reuse YARN-11272 tests, and of course we will add some tests.
   
   I need your help to merge YARN-11272 into trunk branch, thank you very much!
   
   





> [RESERVATION] Federation StateStore: Support storage/retrieval of 
> Reservations With SQL
> ---
>
> Key: YARN-11273
> URL: https://issues.apache.org/jira/browse/YARN-11273
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597507#comment-17597507
 ] 

ASF GitHub Bot commented on YARN-11273:
---

slfan1989 commented on code in PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#discussion_r957975795


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/SQLFederationStateStore.java:
##
@@ -1016,30 +1035,275 @@ private static byte[] getByteArray(ByteBuffer bb) {
   @Override
   public AddReservationHomeSubClusterResponse addReservationHomeSubCluster(

Review Comment:
   Thank you for your suggestion.
   
   I will explain this part of the test:
   We are in YARN-11272, we implemented store Reservations in ZK, in this pr we 
defined the test of the method in FederationStateStoreBaseTest.java
   
   ```
   testAddReservationHomeSubCluster()
   testAddReservationHomeSubClusterReservationAlreadyExists()
   testAddReservationHomeSubClusterAppAlreadyExistsInTheSameSC()
   testDeleteReservationHomeSubCluster()
   testDeleteReservationHomeSubClusterUnknownApp()
   testUpdateReservationHomeSubCluster()
   ```
   
   Because TestSQLFederationStateStore extends FederationStateStoreBaseTest, we 
can reuse YARN-11272 tests, and of course we will add some tests.
   
   





> [RESERVATION] Federation StateStore: Support storage/retrieval of 
> Reservations With SQL
> ---
>
> Key: YARN-11273
> URL: https://issues.apache.org/jira/browse/YARN-11273
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597506#comment-17597506
 ] 

ASF GitHub Bot commented on YARN-11273:
---

slfan1989 commented on code in PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#discussion_r957975795


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/SQLFederationStateStore.java:
##
@@ -1016,30 +1035,275 @@ private static byte[] getByteArray(ByteBuffer bb) {
   @Override
   public AddReservationHomeSubClusterResponse addReservationHomeSubCluster(

Review Comment:
   Thank you for your suggestion.
   
   I will explain this part of the test:
   We are in YARN-11272, we implemented store Reservations in ZK, in this pr we 
defined the test of the method in FederationStateStoreBaseTest.java
   
   





> [RESERVATION] Federation StateStore: Support storage/retrieval of 
> Reservations With SQL
> ---
>
> Key: YARN-11273
> URL: https://issues.apache.org/jira/browse/YARN-11273
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597505#comment-17597505
 ] 

ASF GitHub Bot commented on YARN-11273:
---

slfan1989 commented on code in PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#discussion_r957972961


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/SQLFederationStateStore.java:
##
@@ -1016,30 +1035,275 @@ private static byte[] getByteArray(ByteBuffer bb) {
   @Override
   public AddReservationHomeSubClusterResponse addReservationHomeSubCluster(
   AddReservationHomeSubClusterRequest request) throws YarnException {
-throw new NotImplementedException("Code is not implemented");
+// validate
+FederationReservationHomeSubClusterStoreInputValidator.validate(request);
+CallableStatement cstmt = null;
+
+ReservationHomeSubCluster reservationHomeSubCluster = 
request.getReservationHomeSubCluster();
+ReservationId reservationId = reservationHomeSubCluster.getReservationId();
+SubClusterId subClusterId = reservationHomeSubCluster.getHomeSubCluster();
+SubClusterId subClusterHomeId = null;
+
+try {
+  // Call procedure
+  cstmt = getCallableStatement(CALL_SP_ADD_RESERVATION_HOME_SUBCLUSTER);
+
+  // Set the parameters for the stored procedure
+  cstmt.setString(1, reservationId.toString());
+  cstmt.setString(2, subClusterId.getId());
+  cstmt.registerOutParameter(3, java.sql.Types.VARCHAR);
+  cstmt.registerOutParameter(4, java.sql.Types.INTEGER);
+
+  // Execute the query
+  long startTime = clock.getTime();
+  cstmt.executeUpdate();
+  long stopTime = clock.getTime();
+
+  // Get SubClusterHome
+  String subClusterHomeIdString = cstmt.getString(3);
+  subClusterHomeId = SubClusterId.newInstance(subClusterHomeIdString);
+
+  // Get rowCount
+  int rowCount = cstmt.getInt(4);
+
+  // For failover reason, we check the returned subClusterId.
+  // 1.If it is equal to the subClusterId we sent, the call added the new
+  // reservation into FederationStateStore.
+  // 2.If the call returns a different subClusterId
+  // it means we already tried to insert this reservation
+  // but a component (Router/StateStore/RM) failed during the submission.
+  if (subClusterId.equals(subClusterHomeId)) {
+// if it is equal to 0
+// it means the call did not add a new reservation into 
FederationStateStore.
+if (rowCount == 0) {
+  LOG.info("The reservation {} was not inserted in the StateStore 
because it" +
+  " was already present in subCluster {}", reservationId, 
subClusterHomeId);
+} else if (rowCount != 1) {
+  // if it is different from 1
+  // it means the call had a wrong behavior. Maybe the database is not 
set correctly.
+  FederationStateStoreUtils.logAndThrowStoreException(LOG,
+  "Wrong behavior during the insertion of subCluster %s.", 
subClusterId);
+}
+  } else {
+// If it is different from 0,
+// it means that there is a data situation that does not meet the 
expectations,
+// and an exception should be thrown at this time
+if (rowCount != 0) {
+  FederationStateStoreUtils.logAndThrowStoreException(LOG,
+  "The reservation %s does exist but was overwritten.", 
reservationId);
+}
+LOG.info("Reservation: {} already present with subCluster: {}.",
+reservationId, subClusterHomeId);
+  }
+
+  // Record successful call time
+  FederationStateStoreClientMetrics.succeededStateStoreCall(stopTime - 
startTime);
+} catch (SQLException e) {
+  FederationStateStoreClientMetrics.failedStateStoreCall();
+  FederationStateStoreUtils.logAndThrowRetriableException(e, LOG,
+  "Unable to insert the newly generated reservation %s to subCluster 
%s.",
+  reservationId, subClusterId);
+} finally {
+  // Return to the pool the CallableStatement
+  FederationStateStoreUtils.returnToPool(LOG, cstmt);
+}
+
+return AddReservationHomeSubClusterResponse.newInstance(subClusterHomeId);
   }
 
   @Override
   public GetReservationHomeSubClusterResponse getReservationHomeSubCluster(
   GetReservationHomeSubClusterRequest request) throws YarnException {
-throw new NotImplementedException("Code is not implemented");
+// validate
+FederationReservationHomeSubClusterStoreInputValidator.validate(request);
+
+CallableStatement cstmt = null;
+ReservationId reservationId = request.getReservationId();
+SubClusterId subClusterId = null;
+
+try {
+  cstmt = getCallableStatement(CALL_SP_GET_RESERVATION_HOME_SUBCLUSTER);
+
+  // Set the parameters for the stored procedure
+  

[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597503#comment-17597503
 ] 

ASF GitHub Bot commented on YARN-11273:
---

slfan1989 commented on code in PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#discussion_r957972465


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/SQLFederationStateStore.java:
##
@@ -1016,30 +1035,275 @@ private static byte[] getByteArray(ByteBuffer bb) {
   @Override
   public AddReservationHomeSubClusterResponse addReservationHomeSubCluster(
   AddReservationHomeSubClusterRequest request) throws YarnException {
-throw new NotImplementedException("Code is not implemented");
+// validate
+FederationReservationHomeSubClusterStoreInputValidator.validate(request);
+CallableStatement cstmt = null;
+
+ReservationHomeSubCluster reservationHomeSubCluster = 
request.getReservationHomeSubCluster();
+ReservationId reservationId = reservationHomeSubCluster.getReservationId();
+SubClusterId subClusterId = reservationHomeSubCluster.getHomeSubCluster();
+SubClusterId subClusterHomeId = null;
+
+try {
+  // Call procedure
+  cstmt = getCallableStatement(CALL_SP_ADD_RESERVATION_HOME_SUBCLUSTER);
+
+  // Set the parameters for the stored procedure
+  cstmt.setString(1, reservationId.toString());
+  cstmt.setString(2, subClusterId.getId());
+  cstmt.registerOutParameter(3, java.sql.Types.VARCHAR);
+  cstmt.registerOutParameter(4, java.sql.Types.INTEGER);
+
+  // Execute the query
+  long startTime = clock.getTime();
+  cstmt.executeUpdate();
+  long stopTime = clock.getTime();
+
+  // Get SubClusterHome
+  String subClusterHomeIdString = cstmt.getString(3);
+  subClusterHomeId = SubClusterId.newInstance(subClusterHomeIdString);
+
+  // Get rowCount
+  int rowCount = cstmt.getInt(4);
+
+  // For failover reason, we check the returned subClusterId.
+  // 1.If it is equal to the subClusterId we sent, the call added the new
+  // reservation into FederationStateStore.
+  // 2.If the call returns a different subClusterId
+  // it means we already tried to insert this reservation
+  // but a component (Router/StateStore/RM) failed during the submission.
+  if (subClusterId.equals(subClusterHomeId)) {
+// if it is equal to 0
+// it means the call did not add a new reservation into 
FederationStateStore.
+if (rowCount == 0) {
+  LOG.info("The reservation {} was not inserted in the StateStore 
because it" +
+  " was already present in subCluster {}", reservationId, 
subClusterHomeId);
+} else if (rowCount != 1) {
+  // if it is different from 1
+  // it means the call had a wrong behavior. Maybe the database is not 
set correctly.
+  FederationStateStoreUtils.logAndThrowStoreException(LOG,
+  "Wrong behavior during the insertion of subCluster %s.", 
subClusterId);
+}
+  } else {
+// If it is different from 0,
+// it means that there is a data situation that does not meet the 
expectations,
+// and an exception should be thrown at this time
+if (rowCount != 0) {
+  FederationStateStoreUtils.logAndThrowStoreException(LOG,
+  "The reservation %s does exist but was overwritten.", 
reservationId);
+}
+LOG.info("Reservation: {} already present with subCluster: {}.",
+reservationId, subClusterHomeId);
+  }
+
+  // Record successful call time
+  FederationStateStoreClientMetrics.succeededStateStoreCall(stopTime - 
startTime);
+} catch (SQLException e) {
+  FederationStateStoreClientMetrics.failedStateStoreCall();
+  FederationStateStoreUtils.logAndThrowRetriableException(e, LOG,
+  "Unable to insert the newly generated reservation %s to subCluster 
%s.",
+  reservationId, subClusterId);
+} finally {
+  // Return to the pool the CallableStatement
+  FederationStateStoreUtils.returnToPool(LOG, cstmt);
+}
+
+return AddReservationHomeSubClusterResponse.newInstance(subClusterHomeId);
   }
 
   @Override
   public GetReservationHomeSubClusterResponse getReservationHomeSubCluster(
   GetReservationHomeSubClusterRequest request) throws YarnException {
-throw new NotImplementedException("Code is not implemented");
+// validate
+FederationReservationHomeSubClusterStoreInputValidator.validate(request);
+
+CallableStatement cstmt = null;
+ReservationId reservationId = request.getReservationId();
+SubClusterId subClusterId = null;
+
+try {
+  cstmt = getCallableStatement(CALL_SP_GET_RESERVATION_HOME_SUBCLUSTER);
+
+  // Set the parameters for the stored procedure
+  

[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597502#comment-17597502
 ] 

ASF GitHub Bot commented on YARN-9708:
--

slfan1989 commented on code in PR #4746:
URL: https://github.com/apache/hadoop/pull/4746#discussion_r957969979


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/impl/pb/YARNDelegationTokenIdentifierPBImpl.java:
##
@@ -0,0 +1,200 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.security.client.impl.pb;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.thirdparty.protobuf.TextFormat;
+import 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos.YARNDelegationTokenIdentifierProto;
+import 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos.YARNDelegationTokenIdentifierProtoOrBuilder;
+import org.apache.hadoop.yarn.security.client.YARNDelegationTokenIdentifier;
+
+@InterfaceAudience.Private

Review Comment:
   Thank you very much for your help reviewing the code, I will fix it.





> Yarn Router Support DelegationToken
> ---
>
> Key: YARN-9708
> URL: https://issues.apache.org/jira/browse/YARN-9708
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: router
>Affects Versions: 3.1.1
>Reporter: Xie YiFan
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, 
> RMDelegationTokenSecretManager_storeNewMasterKey.svg, 
> RouterDelegationTokenSecretManager_storeNewMasterKey.svg
>
>
> 1.we use router as proxy to manage multiple cluster which be independent of 
> each other in order to apply unified client. Thus, we implement our 
> customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other 
> cluster.
> 2.Our production environment need kerberos. But router doesn't support 
> SecureLogin for now.
> https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we 
> improvement it.
> 3.Some framework like oozie would get Token via yarnclient#getDelegationToken 
> which router doesn't support. Our solution is that adding homeCluster to 
> ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would 
> be submitted with specified clusterid so that router knows which cluster to 
> submit this job. Router would get Token from one RM according to specified 
> clusterid when client call getDelegation meanwhile apply some mechanism to 
> save this token in memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11284) [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597501#comment-17597501
 ] 

ASF GitHub Bot commented on YARN-11284:
---

slfan1989 commented on code in PR #4814:
URL: https://github.com/apache/hadoop/pull/4814#discussion_r957953877


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java:
##
@@ -501,4 +472,51 @@ public Map 
batchFinishApplicationMaster
 
 return responseMap;
   }
+
+  Runnable createForceFinishApplicationThread() {
+return () -> {
+
+  ExecutorCompletionService completionService =
+  new ExecutorCompletionService<>(threadpool);
+
+  // Save a local copy of the key set so that it won't change with the map
+  Set addressList = new HashSet<>(unmanagedAppMasterMap.keySet());
+
+  LOG.warn("Abnormal shutdown of UAMPoolManager, still {} UAMs in map", 
addressList.size());
+
+  for (final String uamId : addressList) {
+completionService.submit(() -> {
+  try {
+ApplicationId appId = appIdMap.get(uamId);
+LOG.info("Force-killing UAM id {} for application {}", uamId, 
appId);
+return unmanagedAppMasterMap.remove(uamId).forceKillApplication();
+  } catch (Exception e) {
+LOG.error("Failed to kill unmanaged application master", e);
+return null;
+  }
+});
+  }
+
+  for (int i = 0; i < addressList.size(); ++i) {
+try {
+  Future future = completionService.take();
+  future.get();

Review Comment:
   The code in this part remains the same as the code in the original trunk 
version. This part of the code is to force Kill Application. It does not care 
whether the Kill is successful or not, because the application will have a 
timeout and will be killed after the timeout. Even if the forced Kill fails, it 
should have no effect. 
   
   I agree with you that we should check the return status.





> [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop
> 
>
> Key: YARN-11284
> URL: https://issues.apache.org/jira/browse/YARN-11284
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>
> There is a todo in UnmanagedAMPoolManager#ServiceStop
> {code:java}
> TODO: move waiting for the kill to finish into a separate thread, without 
> blocking the serviceStop. {code}
> I use a separate thread for this work, no longer Block blocking the 
> serviceStop



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11284) [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597483#comment-17597483
 ] 

ASF GitHub Bot commented on YARN-11284:
---

slfan1989 commented on code in PR #4814:
URL: https://github.com/apache/hadoop/pull/4814#discussion_r957953877


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java:
##
@@ -501,4 +472,51 @@ public Map 
batchFinishApplicationMaster
 
 return responseMap;
   }
+
+  Runnable createForceFinishApplicationThread() {
+return () -> {
+
+  ExecutorCompletionService completionService =
+  new ExecutorCompletionService<>(threadpool);
+
+  // Save a local copy of the key set so that it won't change with the map
+  Set addressList = new HashSet<>(unmanagedAppMasterMap.keySet());
+
+  LOG.warn("Abnormal shutdown of UAMPoolManager, still {} UAMs in map", 
addressList.size());
+
+  for (final String uamId : addressList) {
+completionService.submit(() -> {
+  try {
+ApplicationId appId = appIdMap.get(uamId);
+LOG.info("Force-killing UAM id {} for application {}", uamId, 
appId);
+return unmanagedAppMasterMap.remove(uamId).forceKillApplication();
+  } catch (Exception e) {
+LOG.error("Failed to kill unmanaged application master", e);
+return null;
+  }
+});
+  }
+
+  for (int i = 0; i < addressList.size(); ++i) {
+try {
+  Future future = completionService.take();
+  future.get();

Review Comment:
   The code in this part remains the same as the code in the original trunk 
version. This part of the code is to force Kill Application. It does not care 
whether the Kill is successful or not, because the application will have a 
timeout and will be killed after the timeout. Even if the forced Kill fails, it 
should have no effect. 
   
   From the perspective of code implementation, we should identify the status 
of the force kill.





> [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop
> 
>
> Key: YARN-11284
> URL: https://issues.apache.org/jira/browse/YARN-11284
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>
> There is a todo in UnmanagedAMPoolManager#ServiceStop
> {code:java}
> TODO: move waiting for the kill to finish into a separate thread, without 
> blocking the serviceStop. {code}
> I use a separate thread for this work, no longer Block blocking the 
> serviceStop



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597477#comment-17597477
 ] 

ASF GitHub Bot commented on YARN-11177:
---

slfan1989 commented on code in PR #4764:
URL: https://github.com/apache/hadoop/pull/4764#discussion_r957947734


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java:
##
@@ -1633,4 +1790,49 @@ public FederationStateStoreFacade getFederationFacade() {
   public Map getClientRMProxies() {
 return clientRMProxies;
   }
+
+  private Boolean isExistsReservationHomeSubCluster(ReservationId 
reservationId) {
+try {
+  SubClusterId subClusterId = 
federationFacade.getReservationHomeSubCluster(reservationId);
+  if (subClusterId != null) {
+return true;
+  }
+} catch (YarnException e) {
+  LOG.warn("get homeSubCluster by reservationId = {} error.", 
reservationId, e);
+}
+return false;
+  }
+
+  private void addReservationHomeSubCluster(ReservationId reservationId,
+  ReservationHomeSubCluster homeSubCluster) throws YarnException {
+try {
+  // persist the mapping of reservationId and the subClusterId which has
+  // been selected as its home
+  federationFacade.addReservationHomeSubCluster(homeSubCluster);
+} catch (YarnException e) {
+  RouterServerUtil.logAndThrowException(e,
+  "Unable to insert the ReservationId %s into the 
FederationStateStore.",
+   reservationId);

Review Comment:
   I will fix it.





> Support getNewReservation, submitReservation, updateReservation, 
> deleteReservation API's for Federation
> ---
>
> Key: YARN-11177
> URL: https://issues.apache.org/jira/browse/YARN-11177
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597474#comment-17597474
 ] 

ASF GitHub Bot commented on YARN-11177:
---

slfan1989 commented on code in PR #4764:
URL: https://github.com/apache/hadoop/pull/4764#discussion_r957946915


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java:
##
@@ -888,13 +890,88 @@ public MoveApplicationAcrossQueuesResponse 
moveApplicationAcrossQueues(
   @Override
   public GetNewReservationResponse getNewReservation(
   GetNewReservationRequest request) throws YarnException, IOException {
-throw new NotImplementedException("Code is not implemented");
+
+if (request == null) {
+  routerMetrics.incrGetNewReservationFailedRetrieved();
+  String errMsg = "Missing getNewReservation request.";
+  RouterServerUtil.logAndThrowException(errMsg, null);
+}
+
+long startTime = clock.getTime();
+Map subClustersActive = 
federationFacade.getSubClusters(true);
+
+for (int i = 0; i < numSubmitRetries; ++i) {
+  SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive);
+  LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId);
+  ApplicationClientProtocol clientRMProxy = 
getClientRMProxyForSubCluster(subClusterId);
+  try {
+GetNewReservationResponse response = 
clientRMProxy.getNewReservation(request);
+if (response != null) {
+  long stopTime = clock.getTime();
+  routerMetrics.succeededGetNewReservationRetrieved(stopTime - 
startTime);
+  return response;
+}
+  } catch (Exception e) {
+LOG.warn("Unable to create a new Reservation in SubCluster {}.", 
subClusterId.getId(), e);
+subClustersActive.remove(subClusterId);
+  }
+}
+
+routerMetrics.incrGetNewReservationFailedRetrieved();
+String errMsg = "Failed to create a new reservation.";
+throw new YarnException(errMsg);
   }
 
   @Override
   public ReservationSubmissionResponse submitReservation(
   ReservationSubmissionRequest request) throws YarnException, IOException {
-throw new NotImplementedException("Code is not implemented");
+
+if (request == null || request.getReservationId() == null
+|| request.getReservationDefinition() == null || request.getQueue() == 
null) {
+  routerMetrics.incrSubmitReservationFailedRetrieved();
+  RouterServerUtil.logAndThrowException(
+  "Missing submitReservation request or reservationId " +
+  "or reservation definition or queue.", null);
+}
+
+long startTime = clock.getTime();
+ReservationId reservationId = request.getReservationId();
+
+for (int i = 0; i < numSubmitRetries; i++) {
+  try {
+// First, Get SubClusterId according to specific strategy.
+SubClusterId subClusterId = 
policyFacade.getReservationHomeSubCluster(request);
+LOG.info("submitReservation ReservationId {} try #{} on SubCluster 
{}.",
+reservationId, i, subClusterId);
+ReservationHomeSubCluster reservationHomeSubCluster =
+ReservationHomeSubCluster.newInstance(reservationId, subClusterId);
+
+// Second, determine whether the current ReservationId has a 
corresponding subCluster.
+// If it does not exist, add it. If it exists, update it.
+Boolean exists = isExistsReservationHomeSubCluster(reservationId);
+if(!exists) {

Review Comment:
   I will fix it.





> Support getNewReservation, submitReservation, updateReservation, 
> deleteReservation API's for Federation
> ---
>
> Key: YARN-11177
> URL: https://issues.apache.org/jira/browse/YARN-11177
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597473#comment-17597473
 ] 

ASF GitHub Bot commented on YARN-11177:
---

slfan1989 commented on code in PR #4764:
URL: https://github.com/apache/hadoop/pull/4764#discussion_r957946401


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java:
##
@@ -1633,4 +1790,49 @@ public FederationStateStoreFacade getFederationFacade() {
   public Map getClientRMProxies() {
 return clientRMProxies;
   }
+
+  private Boolean isExistsReservationHomeSubCluster(ReservationId 
reservationId) {

Review Comment:
   Thanks for your suggestion, I will modify the code.





> Support getNewReservation, submitReservation, updateReservation, 
> deleteReservation API's for Federation
> ---
>
> Key: YARN-11177
> URL: https://issues.apache.org/jira/browse/YARN-11177
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597472#comment-17597472
 ] 

ASF GitHub Bot commented on YARN-6667:
--

slfan1989 commented on code in PR #4810:
URL: https://github.com/apache/hadoop/pull/4810#discussion_r957943343


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java:
##
@@ -1761,4 +1794,26 @@ public static  boolean isNullOrEmpty(Collection c) 
{
   public static  boolean isNullOrEmpty(Map c) {
 return (c == null || c.size() == 0);
   }
+
+  @VisibleForTesting
+  protected void cacheAllocatedContainersForSubClusterId(
+  List containers, SubClusterId subClusterId) {
+cacheAllocatedContainers(containers, subClusterId);
+  }
+
+  @VisibleForTesting
+  protected Map getContainerIdToSubClusterIdMap() {
+return containerIdToSubClusterIdMap;
+  }
+
+  private boolean isSCHealth(SubClusterId subClusterId) throws YarnException {
+boolean isSCHealth = true;
+Set timeOutScs = getTimedOutSCs(true);
+SubClusterInfo subClusterInfo = 
federationFacade.getSubCluster(subClusterId);
+if (timeOutScs.contains(subClusterId) ||
+ subClusterInfo == null || subClusterInfo.getState().isUnusable()) {
+  isSCHealth = false;

Review Comment:
   Thanks for your suggestion, I will modify the code.





> Handle containerId duplicate without failing the heartbeat in Federation 
> Interceptor
> 
>
> Key: YARN-6667
> URL: https://issues.apache.org/jira/browse/YARN-6667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
>
> From the actual situation, the probability of this happening is very low. 
> It can only be caused by the master-slave fail-hover of YARN and the wrong 
> Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run 
> as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the 
> same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the 
> RUNNING state, select the previously allocated RM for Container processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597471#comment-17597471
 ] 

ASF GitHub Bot commented on YARN-6667:
--

slfan1989 commented on code in PR #4810:
URL: https://github.com/apache/hadoop/pull/4810#discussion_r957943189


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java:
##
@@ -1761,4 +1794,26 @@ public static  boolean isNullOrEmpty(Collection c) 
{
   public static  boolean isNullOrEmpty(Map c) {
 return (c == null || c.size() == 0);
   }
+
+  @VisibleForTesting
+  protected void cacheAllocatedContainersForSubClusterId(
+  List containers, SubClusterId subClusterId) {
+cacheAllocatedContainers(containers, subClusterId);
+  }
+
+  @VisibleForTesting
+  protected Map getContainerIdToSubClusterIdMap() {
+return containerIdToSubClusterIdMap;
+  }
+
+  private boolean isSCHealth(SubClusterId subClusterId) throws YarnException {
+boolean isSCHealth = true;
+Set timeOutScs = getTimedOutSCs(true);
+SubClusterInfo subClusterInfo = 
federationFacade.getSubCluster(subClusterId);
+if (timeOutScs.contains(subClusterId) ||
+ subClusterInfo == null || subClusterInfo.getState().isUnusable()) {

Review Comment:
   I will fix it.





> Handle containerId duplicate without failing the heartbeat in Federation 
> Interceptor
> 
>
> Key: YARN-6667
> URL: https://issues.apache.org/jira/browse/YARN-6667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
>
> From the actual situation, the probability of this happening is very low. 
> It can only be caused by the master-slave fail-hover of YARN and the wrong 
> Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run 
> as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the 
> same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the 
> RUNNING state, select the previously allocated RM for Container processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11286) Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597443#comment-17597443
 ] 

ASF GitHub Bot commented on YARN-11286:
---

hadoop-yetus commented on PR #4824:
URL: https://github.com/apache/hadoop/pull/4824#issuecomment-1231030004

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  0s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 31s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m 49s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   8m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   2m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 27s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 56s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m  9s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   9m  9s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   8m 39s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 57s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m 36s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 30s |  |  hadoop-yarn-api in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   5m 12s |  |  hadoop-yarn-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  9s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 162m 12s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4824/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4824 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 0fda6643ad02 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e82f3a19c312920e7aeb21292823ca90c0b20402 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranc

[jira] [Commented] (YARN-11286) Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597420#comment-17597420
 ] 

ASF GitHub Bot commented on YARN-11286:
---

slfan1989 opened a new pull request, #4824:
URL: https://github.com/apache/hadoop/pull/4824

   JIRA: YARN-11286. Make AsyncDispatcher#printEventDetailsExecutor thread pool 
parameter configurable.




> Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter 
> configurable
> -
>
> Key: YARN-11286
> URL: https://issues.apache.org/jira/browse/YARN-11286
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Minor
>
> AsyncDispatcher#printEventDetailsExecutor thread pool parameters are 
> hard-coded, extract this part of hard-coded configuration parameters to the 
> configuration file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11286) Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable

2022-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11286:
--
Labels: pull-request-available  (was: )

> Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter 
> configurable
> -
>
> Key: YARN-11286
> URL: https://issues.apache.org/jira/browse/YARN-11286
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
>
> AsyncDispatcher#printEventDetailsExecutor thread pool parameters are 
> hard-coded, extract this part of hard-coded configuration parameters to the 
> configuration file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11286) Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable

2022-08-29 Thread fanshilun (Jira)
fanshilun created YARN-11286:


 Summary: Make AsyncDispatcher#printEventDetailsExecutor thread 
pool parameter configurable
 Key: YARN-11286
 URL: https://issues.apache.org/jira/browse/YARN-11286
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.4.0
Reporter: fanshilun
Assignee: fanshilun


AsyncDispatcher#printEventDetailsExecutor thread pool parameters are 
hard-coded, extract this part of hard-coded configuration parameters to the 
configuration file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11284) [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597411#comment-17597411
 ] 

ASF GitHub Bot commented on YARN-11284:
---

slfan1989 commented on code in PR #4814:
URL: https://github.com/apache/hadoop/pull/4814#discussion_r957817304


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/uam/TestUnmanagedApplicationManager.java:
##
@@ -58,6 +64,9 @@ public class TestUnmanagedApplicationManager {
 
   private ApplicationAttemptId attemptId;
 
+  private UnmanagedAMPoolManager uamPool;
+  private ExecutorService threadpool;

Review Comment:
   I will fix it.





> [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop
> 
>
> Key: YARN-11284
> URL: https://issues.apache.org/jira/browse/YARN-11284
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>
> There is a todo in UnmanagedAMPoolManager#ServiceStop
> {code:java}
> TODO: move waiting for the kill to finish into a separate thread, without 
> blocking the serviceStop. {code}
> I use a separate thread for this work, no longer Block blocking the 
> serviceStop



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11284) [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597410#comment-17597410
 ] 

ASF GitHub Bot commented on YARN-11284:
---

slfan1989 commented on code in PR #4814:
URL: https://github.com/apache/hadoop/pull/4814#discussion_r957817189


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java:
##
@@ -501,4 +472,51 @@ public Map 
batchFinishApplicationMaster
 
 return responseMap;
   }
+
+  Runnable createForceFinishApplicationThread() {
+return () -> {
+
+  ExecutorCompletionService completionService =
+  new ExecutorCompletionService<>(threadpool);
+
+  // Save a local copy of the key set so that it won't change with the map
+  Set addressList = new HashSet<>(unmanagedAppMasterMap.keySet());
+
+  LOG.warn("Abnormal shutdown of UAMPoolManager, still {} UAMs in map", 
addressList.size());
+
+  for (final String uamId : addressList) {
+completionService.submit(() -> {
+  try {
+ApplicationId appId = appIdMap.get(uamId);
+LOG.info("Force-killing UAM id {} for application {}", uamId, 
appId);
+return unmanagedAppMasterMap.remove(uamId).forceKillApplication();
+  } catch (Exception e) {
+LOG.error("Failed to kill unmanaged application master", e);
+return null;
+  }
+});
+  }
+
+  for (int i = 0; i < addressList.size(); ++i) {
+try {
+  Future future = completionService.take();
+  future.get();

Review Comment:
   Thank you very much for helping to review the code, I will modify the code.





> [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop
> 
>
> Key: YARN-11284
> URL: https://issues.apache.org/jira/browse/YARN-11284
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>
> There is a todo in UnmanagedAMPoolManager#ServiceStop
> {code:java}
> TODO: move waiting for the kill to finish into a separate thread, without 
> blocking the serviceStop. {code}
> I use a separate thread for this work, no longer Block blocking the 
> serviceStop



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597328#comment-17597328
 ] 

ASF GitHub Bot commented on YARN-11273:
---

hadoop-yetus commented on PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#issuecomment-1230645244

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 37s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 31s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m 51s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   8m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   2m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m  5s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   4m  7s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  13m 38s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   5m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 15s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   9m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   8m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 45s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   6m  0s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 52s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  13m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  36m 10s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/5/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn.txt)
 |  hadoop-yarn in the patch failed.  |
   | +1 :green_heart: |  unit  |   3m 19s |  |  hadoop-yarn-server-common in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 19s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 226m 25s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4817 |
   | Optional Tests | dupname asflicense codespell detsecrets compile javac 
javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle |
   | uname | Linux 5e14e56856f1 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4333bcef9279bb3fc0f7378a008781a624bb8f25 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/5/testReport/ |
   | Ma

[jira] [Updated] (YARN-11285) LocalizedResources are leaked and its LocalPath are not cleared

2022-08-29 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-11285:
-
Attachment: TestConcurrency.java

> LocalizedResources are leaked and its LocalPath are not cleared
> ---
>
> Key: YARN-11285
> URL: https://issues.apache.org/jira/browse/YARN-11285
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.1
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: TestConcurrency.java
>
>
> LocalizedResources are leaked and its LocalPath are not cleared from NM Local 
> Directories. When multiple containers are initialized at same time, 
> LocalResourcesTrackerImpl REQUEST handler could create and handle multiple 
> LocalizedResource object for the same input path due to race condition in 
> [below 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java#L149]
> {code}
> case REQUEST:
> LocalResourceRequest req = event.getLocalResourceRequest();
> LocalizedResource rsrc = localrsrc.get(req);
>  
>   if (null == rsrc) {
> rsrc = new LocalizedResource(req, dispatcher);
> localrsrc.put(req, rsrc);
>   }
>  rsrc.handle(event);
> {code}
> Each container will have separate LocalizedResource object and separate local 
> path like below.
> {code}
>/mnt/yarn/usercache/hive/filecache/6/2552419:
>total 28456
>-r-x-- 1 yarn yarn 29135164 Aug  7 10:24 
> hive-exec-2.3.4.50-3fd48f33b0c0b82ab431013f0fe794dfe75c31a5027567e6865cccbb49de862b.jar
>/mnt/yarn/usercache/hive/filecache/6/2552420:
>total 28456
>-r-x-- 1 yarn yarn 29135164 Aug  7 10:24 
> hive-exec-2.3.4.50-3fd48f33b0c0b82ab431013f0fe794dfe75c31a5027567e6865cccbb49de862b.jar
>/mnt/yarn/usercache/hive/filecache/6/2552421:
>total 28456
>-r-x-- 1 yarn yarn 29135164 Aug  7 10:24 
> hive-exec-2.3.4.50-3fd48f33b0c0b82ab431013f0fe794dfe75c31a5027567e6865cccbb49de862b.jar
>/mnt/yarn/usercache/hive/filecache/6/2552422:
>total 28456
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11285) LocalizedResources are leaked and its LocalPath are not cleared

2022-08-29 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-11285:


 Summary: LocalizedResources are leaked and its LocalPath are not 
cleared
 Key: YARN-11285
 URL: https://issues.apache.org/jira/browse/YARN-11285
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.1
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


LocalizedResources are leaked and its LocalPath are not cleared from NM Local 
Directories. When multiple containers are initialized at same time, 
LocalResourcesTrackerImpl REQUEST handler could create and handle multiple 
LocalizedResource object for the same input path due to race condition in 
[below 
code|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java#L149]

{code}
case REQUEST:
LocalResourceRequest req = event.getLocalResourceRequest();
LocalizedResource rsrc = localrsrc.get(req);
 
  if (null == rsrc) {
rsrc = new LocalizedResource(req, dispatcher);
localrsrc.put(req, rsrc);
  }
 rsrc.handle(event);
{code}


Each container will have separate LocalizedResource object and separate local 
path like below.
{code}
   /mnt/yarn/usercache/hive/filecache/6/2552419:
   total 28456
   -r-x-- 1 yarn yarn 29135164 Aug  7 10:24 
hive-exec-2.3.4.50-3fd48f33b0c0b82ab431013f0fe794dfe75c31a5027567e6865cccbb49de862b.jar

   /mnt/yarn/usercache/hive/filecache/6/2552420:
   total 28456
   -r-x-- 1 yarn yarn 29135164 Aug  7 10:24 
hive-exec-2.3.4.50-3fd48f33b0c0b82ab431013f0fe794dfe75c31a5027567e6865cccbb49de862b.jar

   /mnt/yarn/usercache/hive/filecache/6/2552421:
   total 28456
   -r-x-- 1 yarn yarn 29135164 Aug  7 10:24 
hive-exec-2.3.4.50-3fd48f33b0c0b82ab431013f0fe794dfe75c31a5027567e6865cccbb49de862b.jar

   /mnt/yarn/usercache/hive/filecache/6/2552422:
   total 28456
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597290#comment-17597290
 ] 

ASF GitHub Bot commented on YARN-9708:
--

goiri commented on code in PR #4746:
URL: https://github.com/apache/hadoop/pull/4746#discussion_r957557488


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/impl/pb/YARNDelegationTokenIdentifierPBImpl.java:
##
@@ -0,0 +1,200 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.security.client.impl.pb;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.thirdparty.protobuf.TextFormat;
+import 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos.YARNDelegationTokenIdentifierProto;
+import 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos.YARNDelegationTokenIdentifierProtoOrBuilder;
+import org.apache.hadoop.yarn.security.client.YARNDelegationTokenIdentifier;
+
+@InterfaceAudience.Private

Review Comment:
   Import these things directly





> Yarn Router Support DelegationToken
> ---
>
> Key: YARN-9708
> URL: https://issues.apache.org/jira/browse/YARN-9708
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: router
>Affects Versions: 3.1.1
>Reporter: Xie YiFan
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, 
> RMDelegationTokenSecretManager_storeNewMasterKey.svg, 
> RouterDelegationTokenSecretManager_storeNewMasterKey.svg
>
>
> 1.we use router as proxy to manage multiple cluster which be independent of 
> each other in order to apply unified client. Thus, we implement our 
> customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other 
> cluster.
> 2.Our production environment need kerberos. But router doesn't support 
> SecureLogin for now.
> https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we 
> improvement it.
> 3.Some framework like oozie would get Token via yarnclient#getDelegationToken 
> which router doesn't support. Our solution is that adding homeCluster to 
> ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would 
> be submitted with specified clusterid so that router knows which cluster to 
> submit this job. Router would get Token from one RM according to specified 
> clusterid when client call getDelegation meanwhile apply some mechanism to 
> save this token in memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597289#comment-17597289
 ] 

ASF GitHub Bot commented on YARN-11273:
---

goiri commented on code in PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#discussion_r957556044


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/SQLFederationStateStore.java:
##
@@ -1016,30 +1035,275 @@ private static byte[] getByteArray(ByteBuffer bb) {
   @Override
   public AddReservationHomeSubClusterResponse addReservationHomeSubCluster(
   AddReservationHomeSubClusterRequest request) throws YarnException {
-throw new NotImplementedException("Code is not implemented");
+// validate
+FederationReservationHomeSubClusterStoreInputValidator.validate(request);
+CallableStatement cstmt = null;
+
+ReservationHomeSubCluster reservationHomeSubCluster = 
request.getReservationHomeSubCluster();
+ReservationId reservationId = reservationHomeSubCluster.getReservationId();
+SubClusterId subClusterId = reservationHomeSubCluster.getHomeSubCluster();
+SubClusterId subClusterHomeId = null;
+
+try {
+  // Call procedure
+  cstmt = getCallableStatement(CALL_SP_ADD_RESERVATION_HOME_SUBCLUSTER);
+
+  // Set the parameters for the stored procedure
+  cstmt.setString(1, reservationId.toString());
+  cstmt.setString(2, subClusterId.getId());
+  cstmt.registerOutParameter(3, java.sql.Types.VARCHAR);
+  cstmt.registerOutParameter(4, java.sql.Types.INTEGER);
+
+  // Execute the query
+  long startTime = clock.getTime();
+  cstmt.executeUpdate();
+  long stopTime = clock.getTime();
+
+  // Get SubClusterHome
+  String subClusterHomeIdString = cstmt.getString(3);
+  subClusterHomeId = SubClusterId.newInstance(subClusterHomeIdString);
+
+  // Get rowCount
+  int rowCount = cstmt.getInt(4);
+
+  // For failover reason, we check the returned subClusterId.
+  // 1.If it is equal to the subClusterId we sent, the call added the new
+  // reservation into FederationStateStore.
+  // 2.If the call returns a different subClusterId
+  // it means we already tried to insert this reservation
+  // but a component (Router/StateStore/RM) failed during the submission.
+  if (subClusterId.equals(subClusterHomeId)) {
+// if it is equal to 0
+// it means the call did not add a new reservation into 
FederationStateStore.
+if (rowCount == 0) {
+  LOG.info("The reservation {} was not inserted in the StateStore 
because it" +
+  " was already present in subCluster {}", reservationId, 
subClusterHomeId);
+} else if (rowCount != 1) {
+  // if it is different from 1
+  // it means the call had a wrong behavior. Maybe the database is not 
set correctly.
+  FederationStateStoreUtils.logAndThrowStoreException(LOG,
+  "Wrong behavior during the insertion of subCluster %s.", 
subClusterId);
+}
+  } else {
+// If it is different from 0,
+// it means that there is a data situation that does not meet the 
expectations,
+// and an exception should be thrown at this time
+if (rowCount != 0) {
+  FederationStateStoreUtils.logAndThrowStoreException(LOG,
+  "The reservation %s does exist but was overwritten.", 
reservationId);
+}
+LOG.info("Reservation: {} already present with subCluster: {}.",
+reservationId, subClusterHomeId);
+  }
+
+  // Record successful call time
+  FederationStateStoreClientMetrics.succeededStateStoreCall(stopTime - 
startTime);
+} catch (SQLException e) {
+  FederationStateStoreClientMetrics.failedStateStoreCall();
+  FederationStateStoreUtils.logAndThrowRetriableException(e, LOG,
+  "Unable to insert the newly generated reservation %s to subCluster 
%s.",
+  reservationId, subClusterId);
+} finally {
+  // Return to the pool the CallableStatement
+  FederationStateStoreUtils.returnToPool(LOG, cstmt);
+}
+
+return AddReservationHomeSubClusterResponse.newInstance(subClusterHomeId);
   }
 
   @Override
   public GetReservationHomeSubClusterResponse getReservationHomeSubCluster(
   GetReservationHomeSubClusterRequest request) throws YarnException {
-throw new NotImplementedException("Code is not implemented");
+// validate
+FederationReservationHomeSubClusterStoreInputValidator.validate(request);
+
+CallableStatement cstmt = null;
+ReservationId reservationId = request.getReservationId();
+SubClusterId subClusterId = null;
+
+try {
+  cstmt = getCallableStatement(CALL_SP_GET_RESERVATION_HOME_SUBCLUSTER);
+
+  // Set the parameters for the stored procedure
+  cstm

[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597288#comment-17597288
 ] 

ASF GitHub Bot commented on YARN-11177:
---

goiri commented on code in PR #4764:
URL: https://github.com/apache/hadoop/pull/4764#discussion_r957553830


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java:
##
@@ -888,13 +890,88 @@ public MoveApplicationAcrossQueuesResponse 
moveApplicationAcrossQueues(
   @Override
   public GetNewReservationResponse getNewReservation(
   GetNewReservationRequest request) throws YarnException, IOException {
-throw new NotImplementedException("Code is not implemented");
+
+if (request == null) {
+  routerMetrics.incrGetNewReservationFailedRetrieved();
+  String errMsg = "Missing getNewReservation request.";
+  RouterServerUtil.logAndThrowException(errMsg, null);
+}
+
+long startTime = clock.getTime();
+Map subClustersActive = 
federationFacade.getSubClusters(true);
+
+for (int i = 0; i < numSubmitRetries; ++i) {
+  SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive);
+  LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId);
+  ApplicationClientProtocol clientRMProxy = 
getClientRMProxyForSubCluster(subClusterId);
+  try {
+GetNewReservationResponse response = 
clientRMProxy.getNewReservation(request);
+if (response != null) {
+  long stopTime = clock.getTime();
+  routerMetrics.succeededGetNewReservationRetrieved(stopTime - 
startTime);
+  return response;
+}
+  } catch (Exception e) {
+LOG.warn("Unable to create a new Reservation in SubCluster {}.", 
subClusterId.getId(), e);
+subClustersActive.remove(subClusterId);
+  }
+}
+
+routerMetrics.incrGetNewReservationFailedRetrieved();
+String errMsg = "Failed to create a new reservation.";
+throw new YarnException(errMsg);
   }
 
   @Override
   public ReservationSubmissionResponse submitReservation(
   ReservationSubmissionRequest request) throws YarnException, IOException {
-throw new NotImplementedException("Code is not implemented");
+
+if (request == null || request.getReservationId() == null
+|| request.getReservationDefinition() == null || request.getQueue() == 
null) {
+  routerMetrics.incrSubmitReservationFailedRetrieved();
+  RouterServerUtil.logAndThrowException(
+  "Missing submitReservation request or reservationId " +
+  "or reservation definition or queue.", null);
+}
+
+long startTime = clock.getTime();
+ReservationId reservationId = request.getReservationId();
+
+for (int i = 0; i < numSubmitRetries; i++) {
+  try {
+// First, Get SubClusterId according to specific strategy.
+SubClusterId subClusterId = 
policyFacade.getReservationHomeSubCluster(request);
+LOG.info("submitReservation ReservationId {} try #{} on SubCluster 
{}.",
+reservationId, i, subClusterId);
+ReservationHomeSubCluster reservationHomeSubCluster =
+ReservationHomeSubCluster.newInstance(reservationId, subClusterId);
+
+// Second, determine whether the current ReservationId has a 
corresponding subCluster.
+// If it does not exist, add it. If it exists, update it.
+Boolean exists = isExistsReservationHomeSubCluster(reservationId);
+if(!exists) {

Review Comment:
   `if (!exists) {`



##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java:
##
@@ -1633,4 +1790,49 @@ public FederationStateStoreFacade getFederationFacade() {
   public Map getClientRMProxies() {
 return clientRMProxies;
   }
+
+  private Boolean isExistsReservationHomeSubCluster(ReservationId 
reservationId) {
+try {
+  SubClusterId subClusterId = 
federationFacade.getReservationHomeSubCluster(reservationId);
+  if (subClusterId != null) {
+return true;
+  }
+} catch (YarnException e) {
+  LOG.warn("get homeSubCluster by reservationId = {} error.", 
reservationId, e);
+}
+return false;
+  }
+
+  private void addReservationHomeSubCluster(ReservationId reservationId,
+  ReservationHomeSubCluster homeSubCluster) throws YarnException {
+try {
+  // persist the mapping of reservationId and the subClusterId which has
+  // been selected as its home
+  federationFacade.addReservationHomeSubCluster(homeSubCluster);
+} catch (YarnException e) {
+  RouterServerUtil.logAndThrowException(e,
+  "Unable to insert the ReservationId %s into the 
FederationStateStore.",
+   reservationId);

Review Comment:
 

[jira] [Commented] (YARN-11284) [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597285#comment-17597285
 ] 

ASF GitHub Bot commented on YARN-11284:
---

goiri commented on code in PR #4814:
URL: https://github.com/apache/hadoop/pull/4814#discussion_r957551298


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java:
##
@@ -501,4 +472,51 @@ public Map 
batchFinishApplicationMaster
 
 return responseMap;
   }
+
+  Runnable createForceFinishApplicationThread() {
+return () -> {
+
+  ExecutorCompletionService completionService =
+  new ExecutorCompletionService<>(threadpool);
+
+  // Save a local copy of the key set so that it won't change with the map
+  Set addressList = new HashSet<>(unmanagedAppMasterMap.keySet());
+
+  LOG.warn("Abnormal shutdown of UAMPoolManager, still {} UAMs in map", 
addressList.size());
+
+  for (final String uamId : addressList) {
+completionService.submit(() -> {
+  try {
+ApplicationId appId = appIdMap.get(uamId);
+LOG.info("Force-killing UAM id {} for application {}", uamId, 
appId);
+return unmanagedAppMasterMap.remove(uamId).forceKillApplication();
+  } catch (Exception e) {
+LOG.error("Failed to kill unmanaged application master", e);
+return null;
+  }
+});
+  }
+
+  for (int i = 0; i < addressList.size(); ++i) {
+try {
+  Future future = completionService.take();
+  future.get();

Review Comment:
   We don't check for anything?



##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/uam/TestUnmanagedApplicationManager.java:
##
@@ -58,6 +64,9 @@ public class TestUnmanagedApplicationManager {
 
   private ApplicationAttemptId attemptId;
 
+  private UnmanagedAMPoolManager uamPool;
+  private ExecutorService threadpool;

Review Comment:
   Clean this?



##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/uam/TestUnmanagedApplicationManager.java:
##
@@ -58,6 +64,9 @@ public class TestUnmanagedApplicationManager {
 
   private ApplicationAttemptId attemptId;
 
+  private UnmanagedAMPoolManager uamPool;
+  private ExecutorService threadpool;

Review Comment:
   stop()





> [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop
> 
>
> Key: YARN-11284
> URL: https://issues.apache.org/jira/browse/YARN-11284
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>
> There is a todo in UnmanagedAMPoolManager#ServiceStop
> {code:java}
> TODO: move waiting for the kill to finish into a separate thread, without 
> blocking the serviceStop. {code}
> I use a separate thread for this work, no longer Block blocking the 
> serviceStop



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597280#comment-17597280
 ] 

ASF GitHub Bot commented on YARN-6667:
--

goiri commented on code in PR #4810:
URL: https://github.com/apache/hadoop/pull/4810#discussion_r957544974


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java:
##
@@ -1761,4 +1794,26 @@ public static  boolean isNullOrEmpty(Collection c) 
{
   public static  boolean isNullOrEmpty(Map c) {
 return (c == null || c.size() == 0);
   }
+
+  @VisibleForTesting
+  protected void cacheAllocatedContainersForSubClusterId(
+  List containers, SubClusterId subClusterId) {
+cacheAllocatedContainers(containers, subClusterId);
+  }
+
+  @VisibleForTesting
+  protected Map getContainerIdToSubClusterIdMap() {
+return containerIdToSubClusterIdMap;
+  }
+
+  private boolean isSCHealth(SubClusterId subClusterId) throws YarnException {
+boolean isSCHealth = true;
+Set timeOutScs = getTimedOutSCs(true);
+SubClusterInfo subClusterInfo = 
federationFacade.getSubCluster(subClusterId);
+if (timeOutScs.contains(subClusterId) ||
+ subClusterInfo == null || subClusterInfo.getState().isUnusable()) {
+  isSCHealth = false;

Review Comment:
   I would just do return true and return false at the end.



##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java:
##
@@ -1761,4 +1794,26 @@ public static  boolean isNullOrEmpty(Collection c) 
{
   public static  boolean isNullOrEmpty(Map c) {
 return (c == null || c.size() == 0);
   }
+
+  @VisibleForTesting
+  protected void cacheAllocatedContainersForSubClusterId(
+  List containers, SubClusterId subClusterId) {
+cacheAllocatedContainers(containers, subClusterId);
+  }
+
+  @VisibleForTesting
+  protected Map getContainerIdToSubClusterIdMap() {
+return containerIdToSubClusterIdMap;
+  }
+
+  private boolean isSCHealth(SubClusterId subClusterId) throws YarnException {
+boolean isSCHealth = true;
+Set timeOutScs = getTimedOutSCs(true);
+SubClusterInfo subClusterInfo = 
federationFacade.getSubCluster(subClusterId);
+if (timeOutScs.contains(subClusterId) ||
+ subClusterInfo == null || subClusterInfo.getState().isUnusable()) {

Review Comment:
   Indentation





> Handle containerId duplicate without failing the heartbeat in Federation 
> Interceptor
> 
>
> Key: YARN-6667
> URL: https://issues.apache.org/jira/browse/YARN-6667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
>
> From the actual situation, the probability of this happening is very low. 
> It can only be caused by the master-slave fail-hover of YARN and the wrong 
> Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run 
> as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the 
> same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the 
> RUNNING state, select the previously allocated RM for Container processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597239#comment-17597239
 ] 

ASF GitHub Bot commented on YARN-9708:
--

slfan1989 commented on PR #4746:
URL: https://github.com/apache/hadoop/pull/4746#issuecomment-1230414220

   @goiri Please help to review the code again, Thank you very much!




> Yarn Router Support DelegationToken
> ---
>
> Key: YARN-9708
> URL: https://issues.apache.org/jira/browse/YARN-9708
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: router
>Affects Versions: 3.1.1
>Reporter: Xie YiFan
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, 
> RMDelegationTokenSecretManager_storeNewMasterKey.svg, 
> RouterDelegationTokenSecretManager_storeNewMasterKey.svg
>
>
> 1.we use router as proxy to manage multiple cluster which be independent of 
> each other in order to apply unified client. Thus, we implement our 
> customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other 
> cluster.
> 2.Our production environment need kerberos. But router doesn't support 
> SecureLogin for now.
> https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we 
> improvement it.
> 3.Some framework like oozie would get Token via yarnclient#getDelegationToken 
> which router doesn't support. Our solution is that adding homeCluster to 
> ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would 
> be submitted with specified clusterid so that router knows which cluster to 
> submit this job. Router would get Token from one RM according to specified 
> clusterid when client call getDelegation meanwhile apply some mechanism to 
> save this token in memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597238#comment-17597238
 ] 

ASF GitHub Bot commented on YARN-6667:
--

slfan1989 commented on PR #4810:
URL: https://github.com/apache/hadoop/pull/4810#issuecomment-1230413309

   @goiri Please help to review the code again, Thank you very much!




> Handle containerId duplicate without failing the heartbeat in Federation 
> Interceptor
> 
>
> Key: YARN-6667
> URL: https://issues.apache.org/jira/browse/YARN-6667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
>
> From the actual situation, the probability of this happening is very low. 
> It can only be caused by the master-slave fail-hover of YARN and the wrong 
> Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run 
> as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the 
> same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the 
> RUNNING state, select the previously allocated RM for Container processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11272) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With Zk

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597237#comment-17597237
 ] 

ASF GitHub Bot commented on YARN-11272:
---

slfan1989 commented on PR #4781:
URL: https://github.com/apache/hadoop/pull/4781#issuecomment-1230410501

   @goiri Can you help to merge this pr into trunk branch? I will continue to 
follow up on YARN-11273, which requires part of the pr junit test. Thank you 
very much!




> [RESERVATION] Federation StateStore: Support storage/retrieval of 
> Reservations With Zk
> --
>
> Key: YARN-11272
> URL: https://issues.apache.org/jira/browse/YARN-11272
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, reservation system
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597233#comment-17597233
 ] 

ASF GitHub Bot commented on YARN-11273:
---

hadoop-yetus commented on PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#issuecomment-1230403358

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 57s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 30s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m 44s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   8m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   2m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 10s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   4m 12s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  13m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   5m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m  6s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   9m  6s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   8m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 53s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/4/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt)
 |  hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 1 unchanged - 
0 fixed = 2 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   5m 53s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 46s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  13m 31s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  35m 58s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/4/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn.txt)
 |  hadoop-yarn in the patch failed.  |
   | -1 :x: |  unit  |   3m 24s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/4/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt)
 |  hadoop-yarn-server-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 16s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 225m  2s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.server.federation.policies.router.TestLoadBasedRouterPolicy |
   |   | 
hadoop.yarn.server.federation.policies.router.TestLoadBasedRouterPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4817 |
   | Optional Tests | dupname asfl

[jira] [Commented] (YARN-11278) Ambiguous error message in mutation API

2022-08-29 Thread groot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597217#comment-17597217
 ] 

groot commented on YARN-11278:
--

[~quapaw] - I can see the problem here, what is your suggestion for dealing 
with error message here. Shall we make it generic or do you have anything else 
in mind

> Ambiguous error message in mutation API
> ---
>
> Key: YARN-11278
> URL: https://issues.apache.org/jira/browse/YARN-11278
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: András Győri
>Assignee: groot
>Priority: Major
>
> In RMWebServices#updateSchedulerConfiguration, we are checking two 
> prerequisites:
> {code:java}
> if (scheduler instanceof MutableConfScheduler && ((MutableConfScheduler)
> scheduler).isConfigurationMutable()) { {code}
> However, the error message is misleading in the second case (namely if the 
> configuration is not mutable eg. a FILE_CONFIGURATION_STORE)
> {code:java}
> } else {
>   return Response.status(Status.BAD_REQUEST)
>   .entity("Configuration change only supported by " +
>   "MutableConfScheduler.")
>   .build(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597191#comment-17597191
 ] 

ASF GitHub Bot commented on YARN-11273:
---

hadoop-yetus commented on PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#issuecomment-1230296943

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  0s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  26m 23s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  10m 34s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   8m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 54s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 18s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   4m 13s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  13m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   5m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 11s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   9m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   8m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 52s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/3/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt)
 |  hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 1 unchanged - 
0 fixed = 2 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   5m 54s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 43s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  13m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  36m  6s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/3/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn.txt)
 |  hadoop-yarn in the patch failed.  |
   | +1 :green_heart: |  unit  |   3m 16s |  |  hadoop-yarn-server-common in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 22s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 226m 50s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4817 |
   | Optional Tests | dupname asflicense codespell detsecrets compile javac 
javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle |
   | uname | Linux b2dcaab46f07 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / cea20aa410395076be50bedf4fe698aeba246ae0 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/j

[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597141#comment-17597141
 ] 

ASF GitHub Bot commented on YARN-11273:
---

hadoop-yetus commented on PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#issuecomment-1230121429

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 36s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 53s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 50s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/2/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt)
 |  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 59s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/2/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt)
 |  hadoop-yarn-server-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 44s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 46s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.server.federation.policies.router.TestLoadBasedRouterPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4817 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c5cb07ae217d 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 536b10c9552ba3e477d54

[jira] [Commented] (YARN-11273) [RESERVATION] Federation StateStore: Support storage/retrieval of Reservations With SQL

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597021#comment-17597021
 ] 

ASF GitHub Bot commented on YARN-11273:
---

hadoop-yetus commented on PR #4817:
URL: https://github.com/apache/hadoop/pull/4817#issuecomment-1229906201

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 45s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 54s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 46s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 25s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/1/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt)
 |  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 54s |  |  hadoop-yarn-server-common in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 45s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 27s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4817 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 841d2864b5ca 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7a52365fd1032905ee859f362eed66eef0219559 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4817/1/testReport/ |
   | Max