[jira] [Commented] (YARN-10443) Document options of logs CLI
[ https://issues.apache.org/jira/browse/YARN-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199824#comment-17199824 ] Ankit Kumar commented on YARN-10443: I have created a PR for this: [https://github.com/apache/hadoop/pull/2325] > Document options of logs CLI > > > Key: YARN-10443 > URL: https://issues.apache.org/jira/browse/YARN-10443 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Ankit Kumar >Priority: Major > Attachments: YARN-10443.001.patch > > > It's bugging me a lot that the YARN logs CLI is poorly documented. I always > have to type {{yarn logs -help}} to see the full list of supported commands. > It would be nice to have it nicely documented in our website. > Current > [documentation|https://hadoop.apache.org/docs/r3.3.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs] > in the website shows only 5 supported options. > The output of the help command however shows more: > {noformat} > Retrieve logs for YARN applications. > usage: yarn logs -applicationId [OPTIONS] > general options are: > -am Prints the AM Container logs > for this application. > Specify comma-separated > value to get logs for > related AM Container. For > example, If we specify -am > 1,2, we will get the logs > for the first AM Container > as well as the second AM > Container. To get logs for > all AM Containers, use -am > ALL. To get logs for the > latest AM Container, use -am > -1. By default, it will > print all available logs. > Work with -log_files to get > only specific logs. > -appOwner AppOwner (assumed to be > current user if not > specified) > -client_max_retries Set max retry number for a > retry client to get the > container logs for the > running applications. Use a > negative value to make retry > forever. The default value > is 30. > -client_retry_interval_msWork with > --client_max_retries to > create a retry client. The > default value is 1000. > -clusterId ClusterId. By default, it > will take default cluster id > from the RM > -containerId ContainerId. By default, it > will print all available > logs. Work with -log_files > to get only specific logs. > If specified, the > applicationId can be omitted > -helpDisplays help for all > commands. > -list_nodes Show the list of nodes that > successfully aggregated > logs. This option can only > be used with finished > applications. > -log_filesSpecify comma-separated > value to get exact matched > log files. Use "ALL" or "*" > to fetch all the log files > for the container. > -log_files_pattern Specify comma-separated >
[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199792#comment-17199792 ] Hadoop QA commented on YARN-9809: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} buf {color} | {color:blue} 0m 0s{color} | {color:blue} buf was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 21 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 59s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 36s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 14s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 11s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 21m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s{color} | {color:green} branch-3.2 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 53s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 55s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 14m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 24s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 53s{color} | {color:orange} root: The patch generated 4 new + 1035 unchanged - 1 fixed = 1039 total (was 1036) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 14s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 37s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 13s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 18s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 8s{color} | {color:green} hadoop-yarn-client in the patch passed. {color}
[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199707#comment-17199707 ] Eric Badger commented on YARN-9809: --- [~epayne], [~Jim_Brennan], sorry for the delay. I have put up a patch for branch-3.2. However I think this needs another round of review because the diff was quite massive on the cherry-pick and I had to redo a lot of stuff by hand. So in a lot of ways, this is a completely new patch. I think I got all of the unit tests that would've failed, but we'll see what HadoopQA says. > NMs should supply a health status when registering with RM > -- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9809-branch-3.2.007.patch, YARN-9809.001.patch, > YARN-9809.002.patch, YARN-9809.003.patch, YARN-9809.004.patch, > YARN-9809.005.patch, YARN-9809.006.patch, YARN-9809.007.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened YARN-9809: --- > NMs should supply a health status when registering with RM > -- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9809-branch-3.2.007.patch, YARN-9809.001.patch, > YARN-9809.002.patch, YARN-9809.003.patch, YARN-9809.004.patch, > YARN-9809.005.patch, YARN-9809.006.patch, YARN-9809.007.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9809: -- Attachment: YARN-9809-branch-3.2.007.patch > NMs should supply a health status when registering with RM > -- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9809-branch-3.2.007.patch, YARN-9809.001.patch, > YARN-9809.002.patch, YARN-9809.003.patch, YARN-9809.004.patch, > YARN-9809.005.patch, YARN-9809.006.patch, YARN-9809.007.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199476#comment-17199476 ] Hadoop QA commented on YARN-4783: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 49s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 26s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 12 new + 58 unchanged - 0 fixed = 70 total (was 58) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 3s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | |
[jira] [Issue Comment Deleted] (YARN-9192) Deletion Taks will be picked up to delete running containers
[ https://issues.apache.org/jira/browse/YARN-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shilongfei updated YARN-9192: - Comment: was deleted (was: Hi, [~rayman7718], thank you for the conf, I have set these, container is not exit when restarting NM by supervisor. Now,I encountered a different situation.,I set yarn.nodemanager.delete.debug-delay-sec=600s, through the Debug log, I get the following information 1 -> Begin, only one container1 of app1 is running on NM1 2 -> After a while, when container1 is finished, app1 dir (such as /xxx/nmPrivate/app1) will to be cleanup after 600s (After about 10s, I restart NM1, I don't know if this has any effect on the result 3 -> After about 300s, container2 of app1 is allocated on NM1, and file /xxx/nmPrivae/app1/container2/container2.pid is crated 4 -> After the end of container1 600s, the file /xxx/nmPrivae/app1/container2/container2.pid is deleted the container2 never exit, cleanupContainer got stuck, ContainerManagerImpl got stuck, NM got stuck I also need to confirm whether this problem is related to restarting NM. In addition, should NM check the status of the app1 again when deleting nmPrivate/app1) > Deletion Taks will be picked up to delete running containers > > > Key: YARN-9192 > URL: https://issues.apache.org/jira/browse/YARN-9192 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.9.1 >Reporter: Sihai Ke >Priority: Major > > I suspect there is a bug in Yarn deletion task service, below is my repo > steps: > # First let's set yarn.nodemanager.delete.debug-delay-sec=3600, that means > when the app finished, the Binary/container folder will be deleted after 3600 > seconds. > # when the application App1 (long running service) is running on machine > machine1, and machine1 shutdown, ContainerManagerImpl#serviceStop() will be > called -> ContainerManagerImpl#cleanUpApplicationsOnNMShutDown, and > ApplicationFinishEvent will be sent, and then some delection tasks will be > created, but be stored in DB and will be picked up to execute 3600 seconds. > # 100 seconds later, machine1 comes back, and the same app is assigned to > run this this machine, container created and works well. > # then deleting task created in step 2 will be picked up to delete > containers created in step 3 later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199383#comment-17199383 ] Andras Gyori commented on YARN-4783: I have updated the patch to address the multiple nameservice scenario, also added it in the test case. > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch, YARN-4783.002.patch, > YARN-4783.003.patch, YARN-4783.004.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-4783: --- Attachment: YARN-4783.004.patch > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch, YARN-4783.002.patch, > YARN-4783.003.patch, YARN-4783.004.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-4783: --- Attachment: (was: YARN-4783.004.patch) > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch, YARN-4783.002.patch, > YARN-4783.003.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-4783: --- Attachment: YARN-4783.004.patch > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch, YARN-4783.002.patch, > YARN-4783.003.patch, YARN-4783.004.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-4783: --- Attachment: (was: YARN-4783.004.patch) > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch, YARN-4783.002.patch, > YARN-4783.003.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-4783: --- Attachment: YARN-4783.004.patch > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch, YARN-4783.002.patch, > YARN-4783.003.patch, YARN-4783.004.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10413) Change fs2cs to generate mapping rules in the new format
[ https://issues.apache.org/jira/browse/YARN-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199339#comment-17199339 ] Hadoop QA commented on YARN-10413: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 35s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 6s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}102m 22s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed.
[jira] [Assigned] (YARN-10443) Document options of logs CLI
[ https://issues.apache.org/jira/browse/YARN-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal reassigned YARN-10443: - Assignee: Ankit Kumar > Document options of logs CLI > > > Key: YARN-10443 > URL: https://issues.apache.org/jira/browse/YARN-10443 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Ankit Kumar >Priority: Major > > It's bugging me a lot that the YARN logs CLI is poorly documented. I always > have to type {{yarn logs -help}} to see the full list of supported commands. > It would be nice to have it nicely documented in our website. > Current > [documentation|https://hadoop.apache.org/docs/r3.3.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs] > in the website shows only 5 supported options. > The output of the help command however shows more: > {noformat} > Retrieve logs for YARN applications. > usage: yarn logs -applicationId [OPTIONS] > general options are: > -am Prints the AM Container logs > for this application. > Specify comma-separated > value to get logs for > related AM Container. For > example, If we specify -am > 1,2, we will get the logs > for the first AM Container > as well as the second AM > Container. To get logs for > all AM Containers, use -am > ALL. To get logs for the > latest AM Container, use -am > -1. By default, it will > print all available logs. > Work with -log_files to get > only specific logs. > -appOwner AppOwner (assumed to be > current user if not > specified) > -client_max_retries Set max retry number for a > retry client to get the > container logs for the > running applications. Use a > negative value to make retry > forever. The default value > is 30. > -client_retry_interval_msWork with > --client_max_retries to > create a retry client. The > default value is 1000. > -clusterId ClusterId. By default, it > will take default cluster id > from the RM > -containerId ContainerId. By default, it > will print all available > logs. Work with -log_files > to get only specific logs. > If specified, the > applicationId can be omitted > -helpDisplays help for all > commands. > -list_nodes Show the list of nodes that > successfully aggregated > logs. This option can only > be used with finished > applications. > -log_filesSpecify comma-separated > value to get exact matched > log files. Use "ALL" or "*" > to fetch all the log files > for the container. > -log_files_pattern Specify comma-separated > value to get matched log > files by using java
[jira] [Commented] (YARN-10443) Document options of logs CLI
[ https://issues.apache.org/jira/browse/YARN-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199289#comment-17199289 ] Adam Antal commented on YARN-10443: --- Hey [~akumar], No I'm not, I assigned it to you. Will be happy to review your patch if you're ready. > Document options of logs CLI > > > Key: YARN-10443 > URL: https://issues.apache.org/jira/browse/YARN-10443 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Ankit Kumar >Priority: Major > > It's bugging me a lot that the YARN logs CLI is poorly documented. I always > have to type {{yarn logs -help}} to see the full list of supported commands. > It would be nice to have it nicely documented in our website. > Current > [documentation|https://hadoop.apache.org/docs/r3.3.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs] > in the website shows only 5 supported options. > The output of the help command however shows more: > {noformat} > Retrieve logs for YARN applications. > usage: yarn logs -applicationId [OPTIONS] > general options are: > -am Prints the AM Container logs > for this application. > Specify comma-separated > value to get logs for > related AM Container. For > example, If we specify -am > 1,2, we will get the logs > for the first AM Container > as well as the second AM > Container. To get logs for > all AM Containers, use -am > ALL. To get logs for the > latest AM Container, use -am > -1. By default, it will > print all available logs. > Work with -log_files to get > only specific logs. > -appOwner AppOwner (assumed to be > current user if not > specified) > -client_max_retries Set max retry number for a > retry client to get the > container logs for the > running applications. Use a > negative value to make retry > forever. The default value > is 30. > -client_retry_interval_msWork with > --client_max_retries to > create a retry client. The > default value is 1000. > -clusterId ClusterId. By default, it > will take default cluster id > from the RM > -containerId ContainerId. By default, it > will print all available > logs. Work with -log_files > to get only specific logs. > If specified, the > applicationId can be omitted > -helpDisplays help for all > commands. > -list_nodes Show the list of nodes that > successfully aggregated > logs. This option can only > be used with finished > applications. > -log_filesSpecify comma-separated > value to get exact matched > log files. Use "ALL" or "*" > to fetch all the log files > for the container. > -log_files_pattern Specify comma-separated >
[jira] [Resolved] (YARN-10440) resource manager hangs,and i cannot submit any new jobs,but rm and nm processes are normal
[ https://issues.apache.org/jira/browse/YARN-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi resolved YARN-10440. - Resolution: Duplicate Seems to be similar to YARN-8513 . The default config change in YARN-8896 fixes it. Try setting {noformat} yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments=100{noformat} Reopen with jstack dump, if issue reoccurs with the config change. > resource manager hangs,and i cannot submit any new jobs,but rm and nm > processes are normal > -- > > Key: YARN-10440 > URL: https://issues.apache.org/jira/browse/YARN-10440 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: jufeng li >Priority: Blocker > > RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. > I can open x:8088/cluster/apps/RUNNING but can not > x:8088/cluster/scheduler.Those apps submited can not end itself and new > apps can not be submited.just everything hangs but not RM,NM server. How can > I fix this?help me,please! > > here is the log: > {code:java} > ttempt=appattempt_1600074574138_66297_01 container=null queue=tianqiwang > clusterResource= type=NODE_LOCAL > requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_01 > container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_01 > container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_01 > container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_01 > container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_01 > container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_01 > container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_01 > container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO
[jira] [Commented] (YARN-10413) Change fs2cs to generate mapping rules in the new format
[ https://issues.apache.org/jira/browse/YARN-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199251#comment-17199251 ] Peter Bacsko commented on YARN-10413: - Thanks for the comments [~bteke] & [~shuzirra], I did the suggested modifications. > Change fs2cs to generate mapping rules in the new format > > > Key: YARN-10413 > URL: https://issues.apache.org/jira/browse/YARN-10413 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10413-001.patch, YARN-10413-002.patch, > YARN-10413-003.patch, YARN-10413-004.patch, YARN-10413-005.patch > > > Currently, the converter tool {{fs2cs}} can convert placement rules to > mapping rules, but the differences are too big. > It should be modified to generate placement rules to the new engine and > output it to a separate JSON file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10413) Change fs2cs to generate mapping rules in the new format
[ https://issues.apache.org/jira/browse/YARN-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10413: Attachment: YARN-10413-005.patch > Change fs2cs to generate mapping rules in the new format > > > Key: YARN-10413 > URL: https://issues.apache.org/jira/browse/YARN-10413 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10413-001.patch, YARN-10413-002.patch, > YARN-10413-003.patch, YARN-10413-004.patch, YARN-10413-005.patch > > > Currently, the converter tool {{fs2cs}} can convert placement rules to > mapping rules, but the differences are too big. > It should be modified to generate placement rules to the new engine and > output it to a separate JSON file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10413) Change fs2cs to generate mapping rules in the new format
[ https://issues.apache.org/jira/browse/YARN-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10413: Attachment: (was: YARN-10413-005.patch) > Change fs2cs to generate mapping rules in the new format > > > Key: YARN-10413 > URL: https://issues.apache.org/jira/browse/YARN-10413 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10413-001.patch, YARN-10413-002.patch, > YARN-10413-003.patch, YARN-10413-004.patch > > > Currently, the converter tool {{fs2cs}} can convert placement rules to > mapping rules, but the differences are too big. > It should be modified to generate placement rules to the new engine and > output it to a separate JSON file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10413) Change fs2cs to generate mapping rules in the new format
[ https://issues.apache.org/jira/browse/YARN-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10413: Attachment: YARN-10413-005.patch > Change fs2cs to generate mapping rules in the new format > > > Key: YARN-10413 > URL: https://issues.apache.org/jira/browse/YARN-10413 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10413-001.patch, YARN-10413-002.patch, > YARN-10413-003.patch, YARN-10413-004.patch > > > Currently, the converter tool {{fs2cs}} can convert placement rules to > mapping rules, but the differences are too big. > It should be modified to generate placement rules to the new engine and > output it to a separate JSON file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org