[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771348#comment-17771348
 ] 

ASF GitHub Bot commented on YARN-11582:
---

hadoop-yetus commented on PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1744289927

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 28s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 18s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 27s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 26s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6139/2/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt)
 |  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 8 new + 40 unchanged - 0 fixed = 48 total (was 40)  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  85m 57s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 175m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6139/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6139 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 3fafba7a4f52 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0b78bfd5c130df194d8b590a6bb8f8609b9a0c22 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6139/2/testReport/ |
   | Max. process+thread count | 9

[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771305#comment-17771305
 ] 

ASF GitHub Bot commented on YARN-11582:
---

slfan1989 commented on code in PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343290820


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java:
##
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertThat;
+import static org.hamcrest.CoreMatchers.containsString;
+
+public class TestFicaSchedulerAPP {
+
+  @Test
+  public void testGetActivedAppDiagnosticMessage() throws 
IllegalAccessException, InstantiationException {

Review Comment:
   I took a closer look at this unit test, which uses simple string 
concatenation for testing. I think this is not sufficient. Can we rewrite a 
more meaningful unit test?





> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771306#comment-17771306
 ] 

ASF GitHub Bot commented on YARN-11582:
---

xiaojunxiang2023 commented on PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1743994028

   Indeed, I will learn how to write test cases with context first, and I will 
catch up later




> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771304#comment-17771304
 ] 

ASF GitHub Bot commented on YARN-11582:
---

slfan1989 commented on code in PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343286751


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java:
##
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertThat;

Review Comment:
   We recommend using `org.assertj.core.api.Assertions.assertThat` instead of 
`org.junit.Assert.assertThats.assertThat`.





> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771303#comment-17771303
 ] 

ASF GitHub Bot commented on YARN-11582:
---

slfan1989 commented on code in PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343290820


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java:
##
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertThat;
+import static org.hamcrest.CoreMatchers.containsString;
+
+public class TestFicaSchedulerAPP {
+
+  @Test
+  public void testGetActivedAppDiagnosticMessage() throws 
IllegalAccessException, InstantiationException {

Review Comment:
   I took a closer look at this unit test, which uses simple string 
concatenation for testing. I think this is not sufficient. Can we rewrite a 
more meaningful unit test?





> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771302#comment-17771302
 ] 

ASF GitHub Bot commented on YARN-11582:
---

slfan1989 commented on code in PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343287506


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java:
##
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertThat;
+import static org.hamcrest.CoreMatchers.containsString;
+
+public class TestFicaSchedulerAPP {
+
+  @Test
+  public void testGetActivedAppDiagnosticMessage() throws 
IllegalAccessException, InstantiationException {
+  StringBuilder diagnosticMessage = new StringBuilder(
+  "Application is Activated, waiting for resources to be assigned 
for AM");
+  getActivedAppDiagnosticMessage(diagnosticMessage);
+  assertThat("AM Resource Request information was not successfully 
displayed.",
+  diagnosticMessage.toString(), containsString("AM Resource 
Request ="));
+  }
+
+  // copy from FiCaSchedulerApp#getActivedAppDiagnosticMessage
+  protected void getActivedAppDiagnosticMessage(
+  StringBuilder diagnosticMessage) {
+  diagnosticMessage.append(" Details : AM Partition = ")
+  .append(" ; ")

Review Comment:
   5chars.





> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771301#comment-17771301
 ] 

ASF GitHub Bot commented on YARN-11582:
---

slfan1989 commented on code in PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343287151


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java:
##
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertThat;
+import static org.hamcrest.CoreMatchers.containsString;
+
+public class TestFicaSchedulerAPP {
+
+  @Test
+  public void testGetActivedAppDiagnosticMessage() throws 
IllegalAccessException, InstantiationException {
+  StringBuilder diagnosticMessage = new StringBuilder(
+  "Application is Activated, waiting for resources to be assigned 
for AM");

Review Comment:
   indentation 5chars.





> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771300#comment-17771300
 ] 

ASF GitHub Bot commented on YARN-11582:
---

slfan1989 commented on code in PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343286751


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java:
##
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertThat;

Review Comment:
   We recommend using `org.assertj.core.api.Assertions.assertThat` instead of 
`org.junit.Assert.assertThats.assertThat`.





> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11584) [CS] Attempting to create Leaf Queue with empty shortname should fail without crashing RM

2023-10-02 Thread Brian Goerlitz (Jira)
Brian Goerlitz created YARN-11584:
-

 Summary: [CS] Attempting to create Leaf Queue with empty shortname 
should fail without crashing RM
 Key: YARN-11584
 URL: https://issues.apache.org/jira/browse/YARN-11584
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Brian Goerlitz
Assignee: Brian Goerlitz


If an app submission results in attempting to auto-create a leaf queue with an 
empty short name, the app submission should be rejected without the RM 
crashing. Currently, the queue will be created, but the RM encounters a FATAL 
exception due to metrics collision.

For example, if an app is placed to 'root.' the RM will fail with the below.
{noformat}
2023-09-12 20:23:43,294 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
Error in handling event type APP_ADDED to the Event Dispatcher
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueMetrics.forQueue(CSQueueMetrics.java:309)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:147)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractLeafQueue.(AbstractLeafQueue.java:148)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:42)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.createNewQueue(ParentQueue.java:495)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.addDynamicChildQueue(ParentQueue.java:563)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.addDynamicLeafQueue(ParentQueue.java:517)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createAutoQueue(CapacitySchedulerQueueManager.java:678)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:511)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:898)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:962)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1920)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:170)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.base/java.lang.Thread.run(Thread.java:834)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771169#comment-17771169
 ] 

ASF GitHub Bot commented on YARN-11578:
---

hadoop-yetus commented on PR #6142:
URL: https://github.com/apache/hadoop/pull/6142#issuecomment-1743266807

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   5m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | -1 :x: |  mvninstall  |  35m  2s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/artifact/out/branch-mvninstall-root.txt)
 |  root in branch-3.3 failed.  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   1m 33s |  |  branch-3.3 passed  |
   | -1 :x: |  shadedclient  |  25m 14s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  25m 11s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   4m  1s |  |  hadoop-yarn-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 105m 11s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6142 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 32110dbad7f4 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / 270c90dbaeda36d947ad9eaa2ed9d07a72ba2280 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/testReport/ |
   | Max. process+thread count | 551 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this ch

[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771167#comment-17771167
 ] 

ASF GitHub Bot commented on YARN-11578:
---

hadoop-yetus commented on PR #6143:
URL: https://github.com/apache/hadoop/pull/6143#issuecomment-1743248142

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m 29s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.2 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 39s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  spotbugs  |   1m 55s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  shadedclient  |  18m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 34s | 
[/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt)
 |  hadoop-yarn-common in the patch failed.  |
   | -1 :x: |  compile  |   0m 34s | 
[/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt)
 |  hadoop-yarn-common in the patch failed.  |
   | -1 :x: |  javac  |   0m 34s | 
[/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt)
 |  hadoop-yarn-common in the patch failed.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   0m 35s | 
[/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt)
 |  hadoop-yarn-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  the patch passed  |
   | -1 :x: |  spotbugs  |   0m 33s | 
[/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt)
 |  hadoop-yarn-common in the patch failed.  |
   | -1 :x: |  shadedclient  |   8m  7s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   0m 34s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt)
 |  hadoop-yarn-common in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  83m 57s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6143 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c83622a2c3d4 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.2 / a5faa68043b22b502603d5a8b6657099694f015e |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/testReport/ |
   | Max. process+thread count | 338 (vs. u

[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771143#comment-17771143
 ] 

ASF GitHub Bot commented on YARN-11578:
---

tomicooler commented on PR #6120:
URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1743093089

   @brumi1024 
   
- branch-3.3: https://github.com/apache/hadoop/pull/6142
- branch-3.2: https://github.com/apache/hadoop/pull/6143
   
   There were conflicts on both branches, need to wait for Yetus.




> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771142#comment-17771142
 ] 

ASF GitHub Bot commented on YARN-11578:
---

tomicooler opened a new pull request, #6143:
URL: https://github.com/apache/hadoop/pull/6143

   Original PR: #6120
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'YARN-11578. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11583) Improve Node Link for YARN Federation Web Page

2023-10-02 Thread Shilun Fan (Jira)
Shilun Fan created YARN-11583:
-

 Summary: Improve Node Link for YARN Federation Web Page
 Key: YARN-11583
 URL: https://issues.apache.org/jira/browse/YARN-11583
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: federation
Affects Versions: 3.4.0
Reporter: Shilun Fan
Assignee: Shilun Fan


When working on the YARN Federation Web Page, I noticed that the functionality 
for Node redirection is missing. In this JIRA, I will be enhancing this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771138#comment-17771138
 ] 

ASF GitHub Bot commented on YARN-11578:
---

tomicooler opened a new pull request, #6142:
URL: https://github.com/apache/hadoop/pull/6142

   Original PR: #6120
   
   # Conflicts:
   #
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java
 #   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/filecontroller/TestLogAggregationFileController.java
   
   
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'YARN-11578. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771135#comment-17771135
 ] 

ASF GitHub Bot commented on YARN-11582:
---

xiaojunxiang2023 commented on PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1743058539

   Good idea,  tomorrow I will try to see if the fair scheduler has this 
problem.




> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11579) Fix 'Physical Mem Used' and 'Physical VCores Used' are not displaying data

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771132#comment-17771132
 ] 

ASF GitHub Bot commented on YARN-11579:
---

slfan1989 commented on PR #6123:
URL: https://github.com/apache/hadoop/pull/6123#issuecomment-1743056621

   @goiri Can you help review this PR? Thank you very much!




> Fix 'Physical Mem Used' and 'Physical VCores Used' are not displaying data
> --
>
> Key: YARN-11579
> URL: https://issues.apache.org/jira/browse/YARN-11579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> During the YARN Federation integration testing process, we encountered an 
> issue where the 'About' page of the Router does not display cluster resource 
> utilization. This problem arises from the omission of calculations for the 
> 'Physical Mem Used' and 'Physical VCores Used' metrics when merging metrics 
> from sub-clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771131#comment-17771131
 ] 

ASF GitHub Bot commented on YARN-11582:
---

slfan1989 commented on PR #6139:
URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1743052847

   @xiaojunxiang2023 Can we check if fairscheduler has this issue? We need to 
fix checkstyle.




> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771099#comment-17771099
 ] 

ASF GitHub Bot commented on YARN-11578:
---

brumi1024 commented on PR #6120:
URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1743034385

   Additionally, @tomicooler can you please do the backports for 3.3 and 3.2 
branches?




> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-11578:
-
Fix Version/s: 3.4.0

> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke resolved YARN-11578.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771092#comment-17771092
 ] 

ASF GitHub Bot commented on YARN-11578:
---

brumi1024 merged PR #6120:
URL: https://github.com/apache/hadoop/pull/6120




> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771025#comment-17771025
 ] 

ASF GitHub Bot commented on YARN-11578:
---

brumi1024 commented on PR #6120:
URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1742709497

   Thanks @tomicooler for the patch, LGTM. @slfan1989 do you have anything, or 
are you ok with this being merged?




> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771024#comment-17771024
 ] 

ASF GitHub Bot commented on YARN-11578:
---

brumi1024 commented on code in PR #6120:
URL: https://github.com/apache/hadoop/pull/6120#discussion_r1342481403


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java:
##
@@ -429,26 +460,34 @@ public void verifyAndCreateRemoteLogDir() {
 + remoteRootLogDir + "]", e);
   }
 } else {
-  //Check if FS has capability to set/modify permissions
-  Path permissionCheckFile = new Path(qualified, 
String.format("%s.permission_check",
-  RandomStringUtils.randomAlphanumeric(8)));
+  final FsLogPathKey key = new FsLogPathKey(remoteFS.getClass(), 
qualified);
+  FileSystem finalRemoteFS = remoteFS;
+  fsSupportsChmod = FS_CHMOD_CACHE.computeIfAbsent(key,
+  k -> checkFsSupportsChmod(finalRemoteFS, remoteRootLogDir, 
qualified));
+}
+  }
+
+  private boolean checkFsSupportsChmod(FileSystem remoteFS, Path logDir, Path 
qualified) {
+//Check if FS has capability to set/modify permissions
+Path permissionCheckFile = new Path(qualified, 
String.format("%s.permission_check",
+RandomStringUtils.randomAlphanumeric(8)));
+try {
+  remoteFS.createNewFile(permissionCheckFile);
+  remoteFS.setPermission(permissionCheckFile, new 
FsPermission(TLDIR_PERMISSIONS));
+  return true;
+} catch (UnsupportedOperationException use) {
+  LOG.info("Unable to set permissions for configured filesystem since"
+  + " it does not support this {}", remoteFS.getScheme());
+} catch (IOException e) {
+  LOG.warn("Failed to check if FileSystem supports permissions on "
+  + "remoteLogDir [" + logDir + "]", e);
+} finally {
   try {
-remoteFS.createNewFile(permissionCheckFile);
-remoteFS.setPermission(permissionCheckFile, new 
FsPermission(TLDIR_PERMISSIONS));
-  } catch (UnsupportedOperationException use) {
-LOG.info("Unable to set permissions for configured filesystem since"
-+ " it does not support this {}", remoteFS.getScheme());
-fsSupportsChmod = false;

Review Comment:
   Based on the naming and the context I think we can go ahead with the 
modified behaviour: only set it to true when everything succeeds.





> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771009#comment-17771009
 ] 

ASF GitHub Bot commented on YARN-11578:
---

hadoop-yetus commented on PR #6120:
URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1742620975

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 29s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 15s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 31s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m  6s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   4m 39s |  |  hadoop-yarn-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  92m 26s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6120 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 9b2948fff24c 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0e0b977660914a7dd64cc83a530b86aa9cc272e9 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/testReport/ |
   | Max. process+thread count | 684 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Fix performance is

[jira] [Updated] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size

2023-10-02 Thread xiaojunxiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojunxiang updated YARN-11582:

Attachment: success_ShowAMInfo.jpg

> Improve WebUI diagnosticMessage to show AM Container resource request size
> --
>
> Key: YARN-11582
> URL: https://issues.apache.org/jira/browse/YARN-11582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg
>
>
> When Yarn resources are insufficient, the newly submitted job AM may be in 
> the state of "Application is Activated, waiting for resources to be assigned 
> for AM". This is obviously because Yarn doesn't have enough resources to 
> allocate another AM Container, so we want to know how large the AM Container 
> is currently allocated. Unfortunately, the current diagnosticMessage on the 
> Web page does not show this data. Therefore, it is necessary to add the 
> resource size of the AM Container in the diagnosticMessage, which will be 
> very useful for us to troubleshoise the production faults on line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir

2023-10-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770995#comment-17770995
 ] 

ASF GitHub Bot commented on YARN-11578:
---

tomicooler commented on code in PR #6120:
URL: https://github.com/apache/hadoop/pull/6120#discussion_r1342340726


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java:
##
@@ -429,26 +460,34 @@ public void verifyAndCreateRemoteLogDir() {
 + remoteRootLogDir + "]", e);
   }
 } else {
-  //Check if FS has capability to set/modify permissions
-  Path permissionCheckFile = new Path(qualified, 
String.format("%s.permission_check",
-  RandomStringUtils.randomAlphanumeric(8)));
+  final FsLogPathKey key = new FsLogPathKey(remoteFS.getClass(), 
qualified);
+  FileSystem finalRemoteFS = remoteFS;
+  fsSupportsChmod = FS_CHMOD_CACHE.computeIfAbsent(key,
+  k -> checkFsSupportsChmod(finalRemoteFS, remoteRootLogDir, 
qualified));
+}
+  }
+
+  private boolean checkFsSupportsChmod(FileSystem remoteFS, Path logDir, Path 
qualified) {
+//Check if FS has capability to set/modify permissions
+Path permissionCheckFile = new Path(qualified, 
String.format("%s.permission_check",
+RandomStringUtils.randomAlphanumeric(8)));
+try {
+  remoteFS.createNewFile(permissionCheckFile);
+  remoteFS.setPermission(permissionCheckFile, new 
FsPermission(TLDIR_PERMISSIONS));
+  return true;
+} catch (UnsupportedOperationException use) {
+  LOG.info("Unable to set permissions for configured filesystem since"
+  + " it does not support this {}", remoteFS.getScheme());
+} catch (IOException e) {
+  LOG.warn("Failed to check if FileSystem supports permissions on "

Review Comment:
   Thanks for the review. Fixed it.





> Fix performance issue of permission check in verifyAndCreateRemoteLogDir
> 
>
> Key: YARN-11578
> URL: https://issues.apache.org/jira/browse/YARN-11578
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>
> YARN-10901 introduced a check to avoid a warn message in NN logs in certain 
> situations (when /tmp/logs is not owned by the yarn user), but it adds 3 
> NameNode calls (create, setpermission, delete) during log aggregation 
> collection, for *every* NM. Meaning, when a YARN job completes, at the YARN 
> log aggregation phase this check is done for every job, from every 
> NodeManager.
> In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. 
> "write" calls need a Namesystem writeLock as well, so the impact is bigger.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org