[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771348#comment-17771348 ] ASF GitHub Bot commented on YARN-11582: --- hadoop-yetus commented on PR #6139: URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1744289927 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 28s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 18s | | trunk passed | | +1 :green_heart: | compile | 0m 44s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 37s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 42s | | trunk passed | | +1 :green_heart: | javadoc | 0m 44s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 37s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 37s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 26s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6139/2/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) | hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 8 new + 40 unchanged - 0 fixed = 48 total (was 40) | | +1 :green_heart: | mvnsite | 0m 33s | | the patch passed | | +1 :green_heart: | javadoc | 0m 33s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 15s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 29s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 85m 57s | | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicense | 0m 28s | | The patch does not generate ASF License warnings. | | | | 175m 18s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6139/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6139 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 3fafba7a4f52 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 0b78bfd5c130df194d8b590a6bb8f8609b9a0c22 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6139/2/testReport/ | | Max. process+thread count | 9
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771305#comment-17771305 ] ASF GitHub Bot commented on YARN-11582: --- slfan1989 commented on code in PR #6139: URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343290820 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java: ## @@ -0,0 +1,59 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica; + +import org.junit.Test; + +import static org.junit.Assert.assertThat; +import static org.hamcrest.CoreMatchers.containsString; + +public class TestFicaSchedulerAPP { + + @Test + public void testGetActivedAppDiagnosticMessage() throws IllegalAccessException, InstantiationException { Review Comment: I took a closer look at this unit test, which uses simple string concatenation for testing. I think this is not sufficient. Can we rewrite a more meaningful unit test? > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771306#comment-17771306 ] ASF GitHub Bot commented on YARN-11582: --- xiaojunxiang2023 commented on PR #6139: URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1743994028 Indeed, I will learn how to write test cases with context first, and I will catch up later > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771304#comment-17771304 ] ASF GitHub Bot commented on YARN-11582: --- slfan1989 commented on code in PR #6139: URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343286751 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java: ## @@ -0,0 +1,59 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica; + +import org.junit.Test; + +import static org.junit.Assert.assertThat; Review Comment: We recommend using `org.assertj.core.api.Assertions.assertThat` instead of `org.junit.Assert.assertThats.assertThat`. > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771303#comment-17771303 ] ASF GitHub Bot commented on YARN-11582: --- slfan1989 commented on code in PR #6139: URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343290820 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java: ## @@ -0,0 +1,59 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica; + +import org.junit.Test; + +import static org.junit.Assert.assertThat; +import static org.hamcrest.CoreMatchers.containsString; + +public class TestFicaSchedulerAPP { + + @Test + public void testGetActivedAppDiagnosticMessage() throws IllegalAccessException, InstantiationException { Review Comment: I took a closer look at this unit test, which uses simple string concatenation for testing. I think this is not sufficient. Can we rewrite a more meaningful unit test? > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771302#comment-17771302 ] ASF GitHub Bot commented on YARN-11582: --- slfan1989 commented on code in PR #6139: URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343287506 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java: ## @@ -0,0 +1,59 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica; + +import org.junit.Test; + +import static org.junit.Assert.assertThat; +import static org.hamcrest.CoreMatchers.containsString; + +public class TestFicaSchedulerAPP { + + @Test + public void testGetActivedAppDiagnosticMessage() throws IllegalAccessException, InstantiationException { + StringBuilder diagnosticMessage = new StringBuilder( + "Application is Activated, waiting for resources to be assigned for AM"); + getActivedAppDiagnosticMessage(diagnosticMessage); + assertThat("AM Resource Request information was not successfully displayed.", + diagnosticMessage.toString(), containsString("AM Resource Request =")); + } + + // copy from FiCaSchedulerApp#getActivedAppDiagnosticMessage + protected void getActivedAppDiagnosticMessage( + StringBuilder diagnosticMessage) { + diagnosticMessage.append(" Details : AM Partition = ") + .append(" ; ") Review Comment: 5chars. > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771301#comment-17771301 ] ASF GitHub Bot commented on YARN-11582: --- slfan1989 commented on code in PR #6139: URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343287151 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java: ## @@ -0,0 +1,59 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica; + +import org.junit.Test; + +import static org.junit.Assert.assertThat; +import static org.hamcrest.CoreMatchers.containsString; + +public class TestFicaSchedulerAPP { + + @Test + public void testGetActivedAppDiagnosticMessage() throws IllegalAccessException, InstantiationException { + StringBuilder diagnosticMessage = new StringBuilder( + "Application is Activated, waiting for resources to be assigned for AM"); Review Comment: indentation 5chars. > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771300#comment-17771300 ] ASF GitHub Bot commented on YARN-11582: --- slfan1989 commented on code in PR #6139: URL: https://github.com/apache/hadoop/pull/6139#discussion_r1343286751 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/TestFicaSchedulerAPP.java: ## @@ -0,0 +1,59 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica; + +import org.junit.Test; + +import static org.junit.Assert.assertThat; Review Comment: We recommend using `org.assertj.core.api.Assertions.assertThat` instead of `org.junit.Assert.assertThats.assertThat`. > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11584) [CS] Attempting to create Leaf Queue with empty shortname should fail without crashing RM
Brian Goerlitz created YARN-11584: - Summary: [CS] Attempting to create Leaf Queue with empty shortname should fail without crashing RM Key: YARN-11584 URL: https://issues.apache.org/jira/browse/YARN-11584 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Brian Goerlitz Assignee: Brian Goerlitz If an app submission results in attempting to auto-create a leaf queue with an empty short name, the app submission should be rejected without the RM crashing. Currently, the queue will be created, but the RM encounters a FATAL exception due to metrics collision. For example, if an app is placed to 'root.' the RM will fail with the below. {noformat} 2023-09-12 20:23:43,294 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type APP_ADDED to the Event Dispatcher org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueMetrics.forQueue(CSQueueMetrics.java:309) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:147) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractLeafQueue.(AbstractLeafQueue.java:148) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:42) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.createNewQueue(ParentQueue.java:495) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.addDynamicChildQueue(ParentQueue.java:563) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.addDynamicLeafQueue(ParentQueue.java:517) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createAutoQueue(CapacitySchedulerQueueManager.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:511) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:898) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:962) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1920) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:170) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.base/java.lang.Thread.run(Thread.java:834) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771169#comment-17771169 ] ASF GitHub Bot commented on YARN-11578: --- hadoop-yetus commented on PR #6142: URL: https://github.com/apache/hadoop/pull/6142#issuecomment-1743266807 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 5m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ branch-3.3 Compile Tests _ | | -1 :x: | mvninstall | 35m 2s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/artifact/out/branch-mvninstall-root.txt) | root in branch-3.3 failed. | | +1 :green_heart: | compile | 0m 35s | | branch-3.3 passed | | +1 :green_heart: | checkstyle | 0m 30s | | branch-3.3 passed | | +1 :green_heart: | mvnsite | 0m 38s | | branch-3.3 passed | | +1 :green_heart: | javadoc | 0m 45s | | branch-3.3 passed | | +1 :green_heart: | spotbugs | 1m 33s | | branch-3.3 passed | | -1 :x: | shadedclient | 25m 14s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 19s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | +1 :green_heart: | javadoc | 0m 33s | | the patch passed | | +1 :green_heart: | spotbugs | 1m 20s | | the patch passed | | -1 :x: | shadedclient | 25m 11s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 4m 1s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 27s | | The patch does not generate ASF License warnings. | | | | 105m 11s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6142 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 32110dbad7f4 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | branch-3.3 / 270c90dbaeda36d947ad9eaa2ed9d07a72ba2280 | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/testReport/ | | Max. process+thread count | 551 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6142/1/console | | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this ch
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771167#comment-17771167 ] ASF GitHub Bot commented on YARN-11578: --- hadoop-yetus commented on PR #6143: URL: https://github.com/apache/hadoop/pull/6143#issuecomment-1743248142 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 12m 29s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ branch-3.2 Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 39s | | branch-3.2 passed | | +1 :green_heart: | compile | 0m 38s | | branch-3.2 passed | | +1 :green_heart: | checkstyle | 0m 33s | | branch-3.2 passed | | +1 :green_heart: | mvnsite | 0m 44s | | branch-3.2 passed | | +1 :green_heart: | javadoc | 0m 54s | | branch-3.2 passed | | +1 :green_heart: | spotbugs | 1m 55s | | branch-3.2 passed | | +1 :green_heart: | shadedclient | 18m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | -1 :x: | mvninstall | 0m 34s | [/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt) | hadoop-yarn-common in the patch failed. | | -1 :x: | compile | 0m 34s | [/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt) | hadoop-yarn-common in the patch failed. | | -1 :x: | javac | 0m 34s | [/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt) | hadoop-yarn-common in the patch failed. | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 26s | | the patch passed | | -1 :x: | mvnsite | 0m 35s | [/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt) | hadoop-yarn-common in the patch failed. | | +1 :green_heart: | javadoc | 0m 43s | | the patch passed | | -1 :x: | spotbugs | 0m 33s | [/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt) | hadoop-yarn-common in the patch failed. | | -1 :x: | shadedclient | 8m 7s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 0m 34s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt) | hadoop-yarn-common in the patch failed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 83m 57s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6143 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c83622a2c3d4 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | branch-3.2 / a5faa68043b22b502603d5a8b6657099694f015e | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6143/1/testReport/ | | Max. process+thread count | 338 (vs. u
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771143#comment-17771143 ] ASF GitHub Bot commented on YARN-11578: --- tomicooler commented on PR #6120: URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1743093089 @brumi1024 - branch-3.3: https://github.com/apache/hadoop/pull/6142 - branch-3.2: https://github.com/apache/hadoop/pull/6143 There were conflicts on both branches, need to wait for Yetus. > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771142#comment-17771142 ] ASF GitHub Bot commented on YARN-11578: --- tomicooler opened a new pull request, #6143: URL: https://github.com/apache/hadoop/pull/6143 Original PR: #6120 ### Description of PR ### How was this patch tested? ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'YARN-11578. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11583) Improve Node Link for YARN Federation Web Page
Shilun Fan created YARN-11583: - Summary: Improve Node Link for YARN Federation Web Page Key: YARN-11583 URL: https://issues.apache.org/jira/browse/YARN-11583 Project: Hadoop YARN Issue Type: Improvement Components: federation Affects Versions: 3.4.0 Reporter: Shilun Fan Assignee: Shilun Fan When working on the YARN Federation Web Page, I noticed that the functionality for Node redirection is missing. In this JIRA, I will be enhancing this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771138#comment-17771138 ] ASF GitHub Bot commented on YARN-11578: --- tomicooler opened a new pull request, #6142: URL: https://github.com/apache/hadoop/pull/6142 Original PR: #6120 # Conflicts: # hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java # hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/filecontroller/TestLogAggregationFileController.java ### Description of PR ### How was this patch tested? ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'YARN-11578. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771135#comment-17771135 ] ASF GitHub Bot commented on YARN-11582: --- xiaojunxiang2023 commented on PR #6139: URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1743058539 Good idea, tomorrow I will try to see if the fair scheduler has this problem. > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11579) Fix 'Physical Mem Used' and 'Physical VCores Used' are not displaying data
[ https://issues.apache.org/jira/browse/YARN-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771132#comment-17771132 ] ASF GitHub Bot commented on YARN-11579: --- slfan1989 commented on PR #6123: URL: https://github.com/apache/hadoop/pull/6123#issuecomment-1743056621 @goiri Can you help review this PR? Thank you very much! > Fix 'Physical Mem Used' and 'Physical VCores Used' are not displaying data > -- > > Key: YARN-11579 > URL: https://issues.apache.org/jira/browse/YARN-11579 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > During the YARN Federation integration testing process, we encountered an > issue where the 'About' page of the Router does not display cluster resource > utilization. This problem arises from the omission of calculations for the > 'Physical Mem Used' and 'Physical VCores Used' metrics when merging metrics > from sub-clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771131#comment-17771131 ] ASF GitHub Bot commented on YARN-11582: --- slfan1989 commented on PR #6139: URL: https://github.com/apache/hadoop/pull/6139#issuecomment-1743052847 @xiaojunxiang2023 Can we check if fairscheduler has this issue? We need to fix checkstyle. > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771099#comment-17771099 ] ASF GitHub Bot commented on YARN-11578: --- brumi1024 commented on PR #6120: URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1743034385 Additionally, @tomicooler can you please do the backports for 3.3 and 3.2 branches? > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-11578: - Fix Version/s: 3.4.0 > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke resolved YARN-11578. -- Hadoop Flags: Reviewed Resolution: Fixed > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771092#comment-17771092 ] ASF GitHub Bot commented on YARN-11578: --- brumi1024 merged PR #6120: URL: https://github.com/apache/hadoop/pull/6120 > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771025#comment-17771025 ] ASF GitHub Bot commented on YARN-11578: --- brumi1024 commented on PR #6120: URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1742709497 Thanks @tomicooler for the patch, LGTM. @slfan1989 do you have anything, or are you ok with this being merged? > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771024#comment-17771024 ] ASF GitHub Bot commented on YARN-11578: --- brumi1024 commented on code in PR #6120: URL: https://github.com/apache/hadoop/pull/6120#discussion_r1342481403 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java: ## @@ -429,26 +460,34 @@ public void verifyAndCreateRemoteLogDir() { + remoteRootLogDir + "]", e); } } else { - //Check if FS has capability to set/modify permissions - Path permissionCheckFile = new Path(qualified, String.format("%s.permission_check", - RandomStringUtils.randomAlphanumeric(8))); + final FsLogPathKey key = new FsLogPathKey(remoteFS.getClass(), qualified); + FileSystem finalRemoteFS = remoteFS; + fsSupportsChmod = FS_CHMOD_CACHE.computeIfAbsent(key, + k -> checkFsSupportsChmod(finalRemoteFS, remoteRootLogDir, qualified)); +} + } + + private boolean checkFsSupportsChmod(FileSystem remoteFS, Path logDir, Path qualified) { +//Check if FS has capability to set/modify permissions +Path permissionCheckFile = new Path(qualified, String.format("%s.permission_check", +RandomStringUtils.randomAlphanumeric(8))); +try { + remoteFS.createNewFile(permissionCheckFile); + remoteFS.setPermission(permissionCheckFile, new FsPermission(TLDIR_PERMISSIONS)); + return true; +} catch (UnsupportedOperationException use) { + LOG.info("Unable to set permissions for configured filesystem since" + + " it does not support this {}", remoteFS.getScheme()); +} catch (IOException e) { + LOG.warn("Failed to check if FileSystem supports permissions on " + + "remoteLogDir [" + logDir + "]", e); +} finally { try { -remoteFS.createNewFile(permissionCheckFile); -remoteFS.setPermission(permissionCheckFile, new FsPermission(TLDIR_PERMISSIONS)); - } catch (UnsupportedOperationException use) { -LOG.info("Unable to set permissions for configured filesystem since" -+ " it does not support this {}", remoteFS.getScheme()); -fsSupportsChmod = false; Review Comment: Based on the naming and the context I think we can go ahead with the modified behaviour: only set it to true when everything succeeds. > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771009#comment-17771009 ] ASF GitHub Bot commented on YARN-11578: --- hadoop-yetus commented on PR #6120: URL: https://github.com/apache/hadoop/pull/6120#issuecomment-1742620975 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 29s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 15s | | trunk passed | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 32s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 36s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 12s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 31s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 26s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | compile | 0m 25s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 18s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 27s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 6s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 4m 39s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 28s | | The patch does not generate ASF License warnings. | | | | 92m 26s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6120 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 9b2948fff24c 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 0e0b977660914a7dd64cc83a530b86aa9cc272e9 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/testReport/ | | Max. process+thread count | 684 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Fix performance is
[jira] [Updated] (YARN-11582) Improve WebUI diagnosticMessage to show AM Container resource request size
[ https://issues.apache.org/jira/browse/YARN-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaojunxiang updated YARN-11582: Attachment: success_ShowAMInfo.jpg > Improve WebUI diagnosticMessage to show AM Container resource request size > -- > > Key: YARN-11582 > URL: https://issues.apache.org/jira/browse/YARN-11582 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager >Affects Versions: 3.3.4 >Reporter: xiaojunxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-02-00-05-34-337.png, success_ShowAMInfo.jpg > > > When Yarn resources are insufficient, the newly submitted job AM may be in > the state of "Application is Activated, waiting for resources to be assigned > for AM". This is obviously because Yarn doesn't have enough resources to > allocate another AM Container, so we want to know how large the AM Container > is currently allocated. Unfortunately, the current diagnosticMessage on the > Web page does not show this data. Therefore, it is necessary to add the > resource size of the AM Container in the diagnosticMessage, which will be > very useful for us to troubleshoise the production faults on line. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11578) Fix performance issue of permission check in verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770995#comment-17770995 ] ASF GitHub Bot commented on YARN-11578: --- tomicooler commented on code in PR #6120: URL: https://github.com/apache/hadoop/pull/6120#discussion_r1342340726 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java: ## @@ -429,26 +460,34 @@ public void verifyAndCreateRemoteLogDir() { + remoteRootLogDir + "]", e); } } else { - //Check if FS has capability to set/modify permissions - Path permissionCheckFile = new Path(qualified, String.format("%s.permission_check", - RandomStringUtils.randomAlphanumeric(8))); + final FsLogPathKey key = new FsLogPathKey(remoteFS.getClass(), qualified); + FileSystem finalRemoteFS = remoteFS; + fsSupportsChmod = FS_CHMOD_CACHE.computeIfAbsent(key, + k -> checkFsSupportsChmod(finalRemoteFS, remoteRootLogDir, qualified)); +} + } + + private boolean checkFsSupportsChmod(FileSystem remoteFS, Path logDir, Path qualified) { +//Check if FS has capability to set/modify permissions +Path permissionCheckFile = new Path(qualified, String.format("%s.permission_check", +RandomStringUtils.randomAlphanumeric(8))); +try { + remoteFS.createNewFile(permissionCheckFile); + remoteFS.setPermission(permissionCheckFile, new FsPermission(TLDIR_PERMISSIONS)); + return true; +} catch (UnsupportedOperationException use) { + LOG.info("Unable to set permissions for configured filesystem since" + + " it does not support this {}", remoteFS.getScheme()); +} catch (IOException e) { + LOG.warn("Failed to check if FileSystem supports permissions on " Review Comment: Thanks for the review. Fixed it. > Fix performance issue of permission check in verifyAndCreateRemoteLogDir > > > Key: YARN-11578 > URL: https://issues.apache.org/jira/browse/YARN-11578 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > > YARN-10901 introduced a check to avoid a warn message in NN logs in certain > situations (when /tmp/logs is not owned by the yarn user), but it adds 3 > NameNode calls (create, setpermission, delete) during log aggregation > collection, for *every* NM. Meaning, when a YARN job completes, at the YARN > log aggregation phase this check is done for every job, from every > NodeManager. > In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. > "write" calls need a Namesystem writeLock as well, so the impact is bigger. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org