[jira] [Assigned] (YARN-1948) Expose utility methods in Apps.java publically
[ https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-1948: --- Assignee: (was: nijel) > Expose utility methods in Apps.java publically > -- > > Key: YARN-1948 > URL: https://issues.apache.org/jira/browse/YARN-1948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.4.0 >Reporter: Sandy Ryza >Priority: Major > Labels: newbie > Attachments: YARN-1948-1.patch > > > Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by > MapReduce, Spark, and Tez that are currently marked private. As these are > useful for any YARN app that wants to allow users to augment container > environments, it would be helpful to make them public. > It may make sense to put them in a new class with a better name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4303) Confusing help message if AM logs cant be retrieved via yarn logs command
[ https://issues.apache.org/jira/browse/YARN-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-4303: --- Assignee: (was: nijel) > Confusing help message if AM logs cant be retrieved via yarn logs command > - > > Key: YARN-4303 > URL: https://issues.apache.org/jira/browse/YARN-4303 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Priority: Minor > > {noformat} > yarn@BLR102525:~/test/install/hadoop/resourcemanager/bin> ./yarn logs > --applicationId application_1445832014581_0028 -am ALL > Can not get AMContainers logs for the > application:application_1445832014581_0028 > This application:application_1445832014581_0028 is finished. Please enable > the application history service. Or Using yarn logs -applicationId > -containerId --nodeAddress to get the > container logs > {noformat} > Part of the command output mentioned above indicates that using {{yarn logs > -applicationId -containerId --nodeAddress > }} will fetch desired result. It asks you to specify > nodeHttpAddress which makes it sound like we have to connect to nodemanager's > webapp address. > This help message should be changed to include command as {{yarn logs > -applicationId -containerId --nodeAddress Address>}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
[ https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-4110: --- Assignee: (was: nijel) > RMappImpl and RmAppAttemptImpl should override hashcode() & equals() > > > Key: YARN-4110 > URL: https://issues.apache.org/jira/browse/YARN-4110 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Priority: Major > Attachments: YARN-4110_1.patch > > > It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() > and equals() implementations. These state objects should override these > implementations. > # For RMAppImpl, we can use of ApplicationId#hashcode and > ApplicationId#equals. > # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and > ApplicationAttemptId#equals -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4249) Many options in "yarn application" command is not documented
[ https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-4249: --- Assignee: (was: nijel) > Many options in "yarn application" command is not documented > > > Key: YARN-4249 > URL: https://issues.apache.org/jira/browse/YARN-4249 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel > > in document only few options are specified. > {code} > Usage: `yarn application [options] ` > | COMMAND\_OPTIONS | Description | > |: |: | > | -appStates \ | Works with -list to filter applications based on > input comma-separated list of application states. The valid application state > can be one of the following: ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, > RUNNING, FINISHED, FAILED, KILLED | > | -appTypes \ | Works with -list to filter applications based on > input comma-separated list of application types. | > | -list | Lists applications from the RM. Supports optional use of -appTypes > to filter applications based on application type, and -appStates to filter > applications based on application state. | > | -kill \ | Kills the application. | > | -status \ | Prints the status of the application. | > {code} > some options are missing like > -appId Specify Application Id to be operated > -help Displays help for all commands. > -movetoqueueMoves the application to a different queue. > -queue Works with the movetoqueue command to specify > which queue to move an application to. > -updatePriority update priority of an > application.ApplicationId can be passed using 'appId' option. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991374#comment-14991374 ] nijel commented on YARN-2934: - thanks [~Naganarasimha] for the patch Few minor comments/doubts 1. {code} FileStatus[] listStatus = fileSystem.listStatus(containerLogDir, new PathFilter() { @Override public boolean accept(Path path) { return FilenameUtils.wildcardMatch(path.getName(), errorFileNamePattern, IOCase.INSENSITIVE); } }); {code} What if this give multiple error files ? 2. {code} } catch (IOException e) { LOG.warn("Failed while trying to read container's error log", e); } {code} Can this be error log ? I think there should not be any exception in reading the file. If there is an error, better to log error log > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4246) NPE while listing app attempt
[ https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978231#comment-14978231 ] nijel commented on YARN-4246: - thanks [~varun_saxena] and [~rohithsharma] for the review and commit. > NPE while listing app attempt > - > > Key: YARN-4246 > URL: https://issues.apache.org/jira/browse/YARN-4246 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: nijel > Fix For: 2.8.0 > > Attachments: YARN-4246_1.patch, YARN-4246_2.patch > > > {noformat} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89) > {noformat} > This is because AM container id can be null if AM container hasnt been > allocated. In ApplicationCLI#listApplicationAttempts we should check whether > AM container ID is null instead of directly calling toString. > {code} > writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport > .getApplicationAttemptId(), appAttemptReport > .getYarnApplicationAttemptState(), appAttemptReport > .getAMContainerId().toString(), appAttemptReport.getTrackingUrl()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4303) Confusing help message if AM logs cant be retrieved via yarn logs command
[ https://issues.apache.org/jira/browse/YARN-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-4303: --- Assignee: nijel > Confusing help message if AM logs cant be retrieved via yarn logs command > - > > Key: YARN-4303 > URL: https://issues.apache.org/jira/browse/YARN-4303 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: nijel >Priority: Minor > > {noformat} > yarn@BLR102525:~/test/install/hadoop/resourcemanager/bin> ./yarn logs > --applicationId application_1445832014581_0028 -am ALL > Can not get AMContainers logs for the > application:application_1445832014581_0028 > This application:application_1445832014581_0028 is finished. Please enable > the application history service. Or Using yarn logs -applicationId > -containerId --nodeAddress to get the > container logs > {noformat} > Part of the command output mentioned above indicates that using {{yarn logs > -applicationId -containerId --nodeAddress > }} will fetch desired result. It asks you to specify > nodeHttpAddress which makes it sound like we have to connect to nodemanager's > webapp address. > This help message should be changed to include command as {{yarn logs > -applicationId -containerId --nodeAddress Address>}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4246) NPE while listing app attempt
[ https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955054#comment-14955054 ] nijel commented on YARN-4246: - bq. -1 yarn tests as per my analysis test fails are not related to this patch. Please review > NPE while listing app attempt > - > > Key: YARN-4246 > URL: https://issues.apache.org/jira/browse/YARN-4246 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: nijel > Attachments: YARN-4246_1.patch, YARN-4246_2.patch > > > {noformat} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89) > {noformat} > This is because AM container id can be null if AM container hasnt been > allocated. In ApplicationCLI#listApplicationAttempts we should check whether > AM container ID is null instead of directly calling toString. > {code} > writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport > .getApplicationAttemptId(), appAttemptReport > .getYarnApplicationAttemptState(), appAttemptReport > .getAMContainerId().toString(), appAttemptReport.getTrackingUrl()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4246) NPE while listing app attempt
[ https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4246: Attachment: YARN-4246_2.patch Thanks [~steve_l] for pointing out the mistake. Updated the patch with the comment fix. thanks > NPE while listing app attempt > - > > Key: YARN-4246 > URL: https://issues.apache.org/jira/browse/YARN-4246 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: nijel > Attachments: YARN-4246_1.patch, YARN-4246_2.patch > > > {noformat} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89) > {noformat} > This is because AM container id can be null if AM container hasnt been > allocated. In ApplicationCLI#listApplicationAttempts we should check whether > AM container ID is null instead of directly calling toString. > {code} > writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport > .getApplicationAttemptId(), appAttemptReport > .getYarnApplicationAttemptState(), appAttemptReport > .getAMContainerId().toString(), appAttemptReport.getTrackingUrl()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4246) NPE while listing app attempt
[ https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4246: Attachment: YARN-4246_1.patch patch. Please review > NPE while listing app attempt > - > > Key: YARN-4246 > URL: https://issues.apache.org/jira/browse/YARN-4246 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: nijel > Attachments: YARN-4246_1.patch > > > {noformat} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89) > {noformat} > This is because AM container id can be null if AM container hasnt been > allocated. In ApplicationCLI#listApplicationAttempts we should check whether > AM container ID is null instead of directly calling toString. > {code} > writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport > .getApplicationAttemptId(), appAttemptReport > .getYarnApplicationAttemptState(), appAttemptReport > .getAMContainerId().toString(), appAttemptReport.getTrackingUrl()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4249) Many options in "yarn application" command is not documented
[ https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4249: Summary: Many options in "yarn application" command is not documented (was: Many options in "yarn application" command is not documents) > Many options in "yarn application" command is not documented > > > Key: YARN-4249 > URL: https://issues.apache.org/jira/browse/YARN-4249 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > > in document only few options are specified. > {code} > Usage: `yarn application [options] ` > | COMMAND\_OPTIONS | Description | > |: |: | > | -appStates \ | Works with -list to filter applications based on > input comma-separated list of application states. The valid application state > can be one of the following: ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, > RUNNING, FINISHED, FAILED, KILLED | > | -appTypes \ | Works with -list to filter applications based on > input comma-separated list of application types. | > | -list | Lists applications from the RM. Supports optional use of -appTypes > to filter applications based on application type, and -appStates to filter > applications based on application state. | > | -kill \ | Kills the application. | > | -status \ | Prints the status of the application. | > {code} > some options are missing like > -appId Specify Application Id to be operated > -help Displays help for all commands. > -movetoqueueMoves the application to a different queue. > -queue Works with the movetoqueue command to specify > which queue to move an application to. > -updatePriority update priority of an > application.ApplicationId can be passed using 'appId' option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4249) Many options in "yarn application" command is not documents
nijel created YARN-4249: --- Summary: Many options in "yarn application" command is not documents Key: YARN-4249 URL: https://issues.apache.org/jira/browse/YARN-4249 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel in document only few options are specified. {code} Usage: `yarn application [options] ` | COMMAND\_OPTIONS | Description | |: |: | | -appStates \ | Works with -list to filter applications based on input comma-separated list of application states. The valid application state can be one of the following: ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED | | -appTypes \ | Works with -list to filter applications based on input comma-separated list of application types. | | -list | Lists applications from the RM. Supports optional use of -appTypes to filter applications based on application type, and -appStates to filter applications based on application state. | | -kill \ | Kills the application. | | -status \ | Prints the status of the application. | {code} some options are missing like -appId Specify Application Id to be operated -help Displays help for all commands. -movetoqueueMoves the application to a different queue. -queue Works with the movetoqueue command to specify which queue to move an application to. -updatePriority update priority of an application.ApplicationId can be passed using 'appId' option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4246) NPE while listing app attempt
[ https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950215#comment-14950215 ] nijel commented on YARN-4246: - thanks [~varun_saxena] for reporting the same issue is there in applicationattempt status also {noformat} ./yarn applicationattempt -status appattempt_1444389134985_0001_01 15/10/09 16:53:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/09 16:53:20 INFO impl.TimelineClientImpl: Timeline service address: http://10.18.130.110:55033/ws/v1/timeline/ 15/10/09 16:53:20 INFO client.RMProxy: Connecting to ResourceManager at host-10-18-130-110/10.18.130.110:8032 15/10/09 16:53:21 INFO client.AHSProxy: Connecting to Application History server at /10.18.130.110:55034 Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationAttemptReport(ApplicationCLI.java:352) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:182) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89) {noformat} > NPE while listing app attempt > - > > Key: YARN-4246 > URL: https://issues.apache.org/jira/browse/YARN-4246 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: nijel > > {noformat} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89) > {noformat} > This is because AM container id can be null if AM container hasnt been > allocated. In ApplicationCLI#listApplicationAttempts we should check whether > AM container ID is null instead of directly calling toString. > {code} > writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport > .getApplicationAttemptId(), appAttemptReport > .getYarnApplicationAttemptState(), appAttemptReport > .getAMContainerId().toString(), appAttemptReport.getTrackingUrl()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4205: Attachment: YARN-4205_03.patch > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch, YARN-4205_02.patch, > YARN-4205_03.patch > > > This JIRA intend to provide a lifetime monitor service. > The service will monitor the applications where the life time is configured. > If the application is running beyond the lifetime, it will be killed. > The lifetime will be considered from the submit time. > The thread monitoring interval is configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933513#comment-14933513 ] nijel commented on YARN-4205: - Thanks [~rohithsharma] for the comments Updated the patch. > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch, YARN-4205_02.patch > > > This JIRA intend to provide a lifetime monitor service. > The service will monitor the applications where the life time is configured. > If the application is running beyond the lifetime, it will be killed. > The lifetime will be considered from the submit time. > The thread monitoring interval is configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909121#comment-14909121 ] nijel commented on YARN-4205: - Test cases failing as "method not found" for the method added in api project. These tests passing locally ! I am not getting the reason for this fail. Any issue with build can cause this ? > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch, YARN-4205_02.patch > > > This JIRA intend to provide a lifetime monitor service. > The service will monitor the applications where the life time is configured. > If the application is running beyond the lifetime, it will be killed. > The lifetime will be considered from the submit time. > The thread monitoring interval is configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909120#comment-14909120 ] nijel commented on YARN-4111: - Thanks [~rohithsharma] and [~sunilg] for the comments If we add the new constructor to add the message, other event classes like RMAppRejectedEvent and RMAppFinishedAttemptEvent can be removed ? these are also added to handle the message. Or these classes can be kept as it is event separation for future updations. what you say ? > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, > YARN-4111_4.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4205: Description: This JIRA intend to provide a lifetime monitor service. The service will monitor the applications where the life time is configured. If the application is running beyond the lifetime, it will be killed. The lifetime will be considered from the submit time. The thread monitoring interval is configurable. > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch, YARN-4205_02.patch > > > This JIRA intend to provide a lifetime monitor service. > The service will monitor the applications where the life time is configured. > If the application is running beyond the lifetime, it will be killed. > The lifetime will be considered from the submit time. > The thread monitoring interval is configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908204#comment-14908204 ] nijel commented on YARN-4205: - Thanks [~leftnoteasy] for the comments Sorry for the small mistakes Updated the patch bq. RMAppLifeTimeMonitorService.rmApps -> applicationIdToLifetime? (or shorter name if you prefer), it's not rmApps actually changed to monitoredapps bq. public synchronized void unregister, synchronized could be removed? Done Other comments are fixed. > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch, YARN-4205_02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4205: Attachment: YARN-4205_02.patch > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch, YARN-4205_02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4205: Attachment: YARN-4205_01.patch Uploading initial version. > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4206) Add life time value in Application report and web UI
nijel created YARN-4206: --- Summary: Add life time value in Application report and web UI Key: YARN-4206 URL: https://issues.apache.org/jira/browse/YARN-4206 Project: Hadoop YARN Issue Type: Sub-task Reporter: nijel Assignee: nijel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4205) Add a service for monitoring application life time out
nijel created YARN-4205: --- Summary: Add a service for monitoring application life time out Key: YARN-4205 URL: https://issues.apache.org/jira/browse/YARN-4205 Project: Hadoop YARN Issue Type: Sub-task Reporter: nijel Assignee: nijel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906252#comment-14906252 ] nijel commented on YARN-3813: - Thanks [~leftnoteasy] for the comments I will update the patch with the code comments bq. Do you plan to support updating lifetime when the application is running? As per our understanding the following 2 are the use cases for this 1. User can increase the life time after some time and seeing the progress which already being monitored by timeout monitor 2. User can add a timeout for a running application so that this also will monitored. In both these cases the updated time out will be the total life time (from submitted time) Please let us know you are thinking of any other scenario so that we can pla the interfaces accordingly. bq. Do you plan to get lifetime via ApplicationReport, CLI, REST API? Yes. As of now we plan for ApplicationReport. Based on the dynamic update the interfaces can be defined and handle as a subtask. We had some offline chat as well. Few subtsks raised for the planned work. Please give your opinion > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3813.patch, 0002_YARN-3813.patch, YARN > Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4202) TestYarnClient#testReservationAPIs fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-4202: --- Assignee: nijel > TestYarnClient#testReservationAPIs fails intermittently > --- > > Key: YARN-4202 > URL: https://issues.apache.org/jira/browse/YARN-4202 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: nijel > Fix For: 3.0.0 > > > Found this failure while looking at the Pre-run on one of my Jiras. > {noformat} > org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException: > The planning algorithm could not find a valid allocation for your request > at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1149) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:428) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:465) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224) > Caused by: > org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException: > The planning algorithm could not find a valid allocation for your request > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:69) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1132) > ... 10 more > {noformat} > TestReport Link: > https://builds.apache.org/job/PreCommit-YARN-Build/9243/testReport/ > When I ran this on my local box branch-2, it succeeds. > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.yarn.client.api.impl.TestYarnClient > Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.999 sec - > in org.apache.hadoop.yarn.client.api.impl.TestYarnClient > Results : > Tests run: 21, Failures: 0, Errors: 0, Skipped: 0 > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 52.029 s > [INFO] Finished at: 2015-09-23T11:25:04-06:00 > [INFO] Final Memory: 31M/391M > [INFO] > > {noformat} > Haven't tried if it is a problem in branch-2.7 or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3813: Attachment: 0002_YARN-3813.patch > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3813.patch, 0002_YARN-3813.patch, YARN > Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904688#comment-14904688 ] nijel commented on YARN-3813: - thanks [~rohithsharma] and [~sunilg] for the comments Updated patch with the the comment fix and test case for recovery. bq. we are starting the monitor thread always regardless whether application demands for applicationtimeout or not. I feel we can have a configuration to enable this feature in RM level. Thoughts? As i pinged you offline, this service will consider only apps which are configured with a timeout. So leaving as a default service. bq.RMAppTimeOutMonitor : When InterruptedException is thrown in the below code, thread should break or throw back exception. So, thread will die else thread wil be alive for ever The while loop is guarded for interrupted state > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.
[ https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877467#comment-14877467 ] nijel commented on YARN-4135: - thanks [~rohithsharma] and [~adhoot] > Improve the assertion message in MockRM while failing after waiting for the > state. > -- > > Key: YARN-4135 > URL: https://issues.apache.org/jira/browse/YARN-4135 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: nijel >Assignee: nijel >Priority: Trivial > Labels: test > Fix For: 2.8.0 > > Attachments: YARN-4135_1.patch, YARN-4135_2.patch > > > In MockRM when the test is failed after waiting for the given state, the > application id or the attempt id can be printed for easy debug > As of now if it hard to track the test fail in log since there is no relation > with test case and the application id. > Any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4192) Add YARN metric logging periodically to a seperate file
nijel created YARN-4192: --- Summary: Add YARN metric logging periodically to a seperate file Key: YARN-4192 URL: https://issues.apache.org/jira/browse/YARN-4192 Project: Hadoop YARN Issue Type: Improvement Reporter: nijel Assignee: nijel Priority: Minor HDFS-8880 added a framework for logging metrics in a given interval. This can be added to YARN as well Any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.
[ https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747106#comment-14747106 ] nijel commented on YARN-4135: - bq.-1 yarn tests 54m 15s Tests failed in hadoop-yarn-server-resourcemanager. Test skip is not related to this change Thanks > Improve the assertion message in MockRM while failing after waiting for the > state. > -- > > Key: YARN-4135 > URL: https://issues.apache.org/jira/browse/YARN-4135 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: nijel >Assignee: nijel >Priority: Minor > Labels: test > Attachments: YARN-4135_1.patch, YARN-4135_2.patch > > > In MockRM when the test is failed after waiting for the given state, the > application id or the attempt id can be printed for easy debug > As of now if it hard to track the test fail in log since there is no relation > with test case and the application id. > Any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.
[ https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4135: Attachment: YARN-4135_2.patch > Improve the assertion message in MockRM while failing after waiting for the > state. > -- > > Key: YARN-4135 > URL: https://issues.apache.org/jira/browse/YARN-4135 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: nijel >Assignee: nijel >Priority: Minor > Labels: test > Attachments: YARN-4135_1.patch, YARN-4135_2.patch > > > In MockRM when the test is failed after waiting for the given state, the > application id or the attempt id can be printed for easy debug > As of now if it hard to track the test fail in log since there is no relation > with test case and the application id. > Any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.
[ https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746870#comment-14746870 ] nijel commented on YARN-4135: - thanks [~adhoot] for the comment Updated the patch. Please review > Improve the assertion message in MockRM while failing after waiting for the > state. > -- > > Key: YARN-4135 > URL: https://issues.apache.org/jira/browse/YARN-4135 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: nijel >Assignee: nijel >Priority: Minor > Labels: test > Attachments: YARN-4135_1.patch > > > In MockRM when the test is failed after waiting for the given state, the > application id or the attempt id can be printed for easy debug > As of now if it hard to track the test fail in log since there is no relation > with test case and the application id. > Any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4146) getServiceState command is missing in yarnadmin command help
[ https://issues.apache.org/jira/browse/YARN-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-4146: --- Assignee: (was: nijel) > getServiceState command is missing in yarnadmin command help > - > > Key: YARN-4146 > URL: https://issues.apache.org/jira/browse/YARN-4146 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Priority: Minor > Labels: help, script > > In yarnadmin command help getServiceState command is not mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4146) getServiceState command is missing in yarnadmin command help
[ https://issues.apache.org/jira/browse/YARN-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel resolved YARN-4146. - Resolution: Invalid Sorry. My env was in non HA mode ! > getServiceState command is missing in yarnadmin command help > - > > Key: YARN-4146 > URL: https://issues.apache.org/jira/browse/YARN-4146 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel >Priority: Minor > Labels: help, script > > In yarnadmin command help getServiceState command is not mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4111: Attachment: YARN-4111_4.patch updated the javadoc for missing "." Test skip is not related to patch > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, > YARN-4111_4.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4146) getServiceState command is missing in yarnadmin command help
nijel created YARN-4146: --- Summary: getServiceState command is missing in yarnadmin command help Key: YARN-4146 URL: https://issues.apache.org/jira/browse/YARN-4146 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Minor In yarnadmin command help getServiceState command is not mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4111: Attachment: YARN-4111_3.patch Updated javadoc comments > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740313#comment-14740313 ] nijel commented on YARN-4111: - Thanks [~sunilg] for the comments bq. RMAppKilledAttemptEvent is used for both RMApp and RMAppAttempt. Name is slightly confusing. I think we can use this only for RMApp. This is the same as failed and finished event. So i think this is ok. bq. Also in RMAppAttempt, RMAppFailedAttemptEvent is changed to RMAppKilledAttemptEvent. Could we generalize RMAppFailedAttemptEvent for both Failed and Killed, and it can also take diagnostics. before this fix failed event is raised with KILLED as state. SInce now the new event for kill is available it is changed. > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4111: Attachment: YARN-4111_2.patch Updated the patch with checkstyle fix. The test failures are not related. Tried executing locally and are passing. > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4111: Attachment: YARN-4111_1.patch Attaching the patch Please review > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.
[ https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4135: Attachment: YARN-4135_1.patch attached the patch please review > Improve the assertion message in MockRM while failing after waiting for the > state. > -- > > Key: YARN-4135 > URL: https://issues.apache.org/jira/browse/YARN-4135 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: nijel >Assignee: nijel >Priority: Minor > Labels: test > Attachments: YARN-4135_1.patch > > > In MockRM when the test is failed after waiting for the given state, the > application id or the attempt id can be printed for easy debug > As of now if it hard to track the test fail in log since there is no relation > with test case and the application id. > Any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.
nijel created YARN-4135: --- Summary: Improve the assertion message in MockRM while failing after waiting for the state. Key: YARN-4135 URL: https://issues.apache.org/jira/browse/YARN-4135 Project: Hadoop YARN Issue Type: Improvement Reporter: nijel Assignee: nijel Priority: Minor In MockRM when the test is failed after waiting for the given state, the application id or the attempt id can be printed for easy debug As of now if it hard to track the test fail in log since there is no relation with test case and the application id. Any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734602#comment-14734602 ] nijel commented on YARN-3771: - hi all, any comment on this change ? > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3771.patch > > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
[ https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4110: Attachment: YARN-4110_1.patch Attached the patch Please review > RMappImpl and RmAppAttemptImpl should override hashcode() & equals() > > > Key: YARN-4110 > URL: https://issues.apache.org/jira/browse/YARN-4110 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4110_1.patch > > > It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() > and equals() implementations. These state objects should override these > implementations. > # For RMAppImpl, we can use of ApplicationId#hashcode and > ApplicationId#equals. > # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and > ApplicationAttemptId#equals -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
[ https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729078#comment-14729078 ] nijel commented on YARN-4110: - Sorry attached wrong patch so deleting the same > RMappImpl and RmAppAttemptImpl should override hashcode() & equals() > > > Key: YARN-4110 > URL: https://issues.apache.org/jira/browse/YARN-4110 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > > It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() > and equals() implementations. These state objects should override these > implementations. > # For RMAppImpl, we can use of ApplicationId#hashcode and > ApplicationId#equals. > # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and > ApplicationAttemptId#equals -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
[ https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4110: Attachment: (was: 01-YARN-4110.patch) > RMappImpl and RmAppAttemptImpl should override hashcode() & equals() > > > Key: YARN-4110 > URL: https://issues.apache.org/jira/browse/YARN-4110 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > > It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() > and equals() implementations. These state objects should override these > implementations. > # For RMAppImpl, we can use of ApplicationId#hashcode and > ApplicationId#equals. > # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and > ApplicationAttemptId#equals -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
[ https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-4110: Attachment: 01-YARN-4110.patch Thanks [~rohithsharma] for reporting Attached the patch > RMappImpl and RmAppAttemptImpl should override hashcode() & equals() > > > Key: YARN-4110 > URL: https://issues.apache.org/jira/browse/YARN-4110 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: 01-YARN-4110.patch > > > It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() > and equals() implementations. These state objects should override these > implementations. > # For RMAppImpl, we can use of ApplicationId#hashcode and > ApplicationId#equals. > # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and > ApplicationAttemptId#equals -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728898#comment-14728898 ] nijel commented on YARN-3813: - This patch will address the initial issue. But this will kill the application even it is in RUNNING state. As i understand the idea is to configure the states which the monitor needs to consider to kill the application. correct ? But one doubt i have is whether the user will be aware of all the intermediate states for an app ? > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3813: Attachment: 0001-YARN-3813.patch Sorry for the long delay.. Adding an initial patch. The action on timeout is considered as KILL. Please have a look. I will update the patch with more test cases after initial review. Thanks > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-3813: --- Assignee: nijel > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-3869: --- Assignee: (was: nijel) Keeping it open for further comments and opinion > Add app name to RM audit log > > > Key: YARN-3869 > URL: https://issues.apache.org/jira/browse/YARN-3869 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shay Rojansky >Priority: Minor > > The YARN resource manager audit log currently includes useful info such as > APPID, USER, etc. One crucial piece of information missing is the > user-supplied application name. > Users are familiar with their application name as shown in the YARN UI, etc. > It's vital for something like logstash to be able to associate logs with the > application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619984#comment-14619984 ] nijel commented on YARN-3813: - Thanks [~sunilg] and [~devaraj.k] for the comments bq.How frequently are you going to check this condition for each application? Plan is to have a configurable interval default to 30 sec (yarn.app.timeout.monitor.interval) bq.Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we may not need a flag. bq.I feel having a TIMEOUT state for RMAppImpl would be proper here. ok. We will add a TIMEOUT state and handle the changes Due to this there will be few changes in app transitions, client package and the WEBUI bq.I have a suggestion here.We can have a BasicAppMonitoringManager which can keep an entry of . bq. when the application gets submitted to RM then we can register the application with RMAppTimeOutMonitor using the user specified timeout. Yes. Good suggestion. This we will update as a registration mechanism. But since each application can have its own timeout period, the code reusability looks like minimal. {code} RMAppTimeOutMonitor local map (appid, timeout) add/register(appid, timeout) --> from RMAppImpl Run -> if app is running/submitted and elapsed the time, kill it. If already completed, remove from map. No delete/unregister method --> this application will be be removed from map from run method {code} > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3813: Attachment: YARN Application Timeout .pdf > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3813: Attachment: (was: YARN Application Timeout -3.pdf) > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618432#comment-14618432 ] nijel commented on YARN-3813: - Attached initial draft for work details. Please share your comments and thoughts > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout -3.pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3813: Attachment: YARN Application Timeout -3.pdf > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout -3.pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616363#comment-14616363 ] nijel commented on YARN-3869: - In Web it is shown as truncate. But if the names are similar, will it serve the purpose ? Let us wait for few other opinion as well :) > Add app name to RM audit log > > > Key: YARN-3869 > URL: https://issues.apache.org/jira/browse/YARN-3869 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shay Rojansky >Assignee: nijel >Priority: Minor > > The YARN resource manager audit log currently includes useful info such as > APPID, USER, etc. One crucial piece of information missing is the > user-supplied application name. > Users are familiar with their application name as shown in the YARN UI, etc. > It's vital for something like logstash to be able to associate logs with the > application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616282#comment-14616282 ] nijel commented on YARN-3869: - hi i started working on this. One observation is in some cases the application name will not be in readable format. Like in HIVE, the name will the complete query string. In this case it will not be good to print this information in logs ! Any thoughts ? > Add app name to RM audit log > > > Key: YARN-3869 > URL: https://issues.apache.org/jira/browse/YARN-3869 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shay Rojansky >Assignee: nijel >Priority: Minor > > The YARN resource manager audit log currently includes useful info such as > APPID, USER, etc. One crucial piece of information missing is the > user-supplied application name. > Users are familiar with their application name as shown in the YARN UI, etc. > It's vital for something like logstash to be able to associate logs with the > application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611374#comment-14611374 ] nijel commented on YARN-3830: - Thanks [~devaraj.k] for the review and committing the patch. > AbstractYarnScheduler.createReleaseCache may try to clean a null attempt > > > Key: YARN-3830 > URL: https://issues.apache.org/jira/browse/YARN-3830 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: nijel >Assignee: nijel > Fix For: 2.8.0 > > Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, > YARN-3830_4.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() > {code} > protected void createReleaseCache() { > // Cleanup the cache after nm expire interval. > new Timer().schedule(new TimerTask() { > @Override > public void run() { > for (SchedulerApplication app : applications.values()) { > T attempt = app.getCurrentAppAttempt(); > synchronized (attempt) { > for (ContainerId containerId : attempt.getPendingRelease()) { > RMAuditLogger.logFailure( > {code} > Here the attempt can be null since the attempt is created later. So null > pointer exception will come > {code} > 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] > threw an Exception. | YarnUncaughtExceptionHandler.java:68 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} > This will skip the other applications in this run. > Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-3869: --- Assignee: nijel > Add app name to RM audit log > > > Key: YARN-3869 > URL: https://issues.apache.org/jira/browse/YARN-3869 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shay Rojansky >Assignee: nijel >Priority: Minor > > The YARN resource manager audit log currently includes useful info such as > APPID, USER, etc. One crucial piece of information missing is the > user-supplied application name. > Users are familiar with their application name as shown in the YARN UI, etc. > It's vital for something like logstash to be able to associate logs with the > application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609812#comment-14609812 ] nijel commented on YARN-3869: - hi [~roji] i would like to work on this improvement. Please let me know is you already started the work > Add app name to RM audit log > > > Key: YARN-3869 > URL: https://issues.apache.org/jira/browse/YARN-3869 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shay Rojansky >Priority: Minor > > The YARN resource manager audit log currently includes useful info such as > APPID, USER, etc. One crucial piece of information missing is the > user-supplied application name. > Users are familiar with their application name as shown in the YARN UI, etc. > It's vital for something like logstash to be able to associated logs with the > application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3830: Attachment: YARN-3830_4.patch Thanks [~devaraj.k] for the suggestion Updated patch with test case Please review > AbstractYarnScheduler.createReleaseCache may try to clean a null attempt > > > Key: YARN-3830 > URL: https://issues.apache.org/jira/browse/YARN-3830 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, > YARN-3830_4.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() > {code} > protected void createReleaseCache() { > // Cleanup the cache after nm expire interval. > new Timer().schedule(new TimerTask() { > @Override > public void run() { > for (SchedulerApplication app : applications.values()) { > T attempt = app.getCurrentAppAttempt(); > synchronized (attempt) { > for (ContainerId containerId : attempt.getPendingRelease()) { > RMAuditLogger.logFailure( > {code} > Here the attempt can be null since the attempt is created later. So null > pointer exception will come > {code} > 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] > threw an Exception. | YarnUncaughtExceptionHandler.java:68 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} > This will skip the other applications in this run. > Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2953) TestWorkPreservingRMRestart fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-2953: --- Assignee: nijel > TestWorkPreservingRMRestart fails on trunk > -- > > Key: YARN-2953 > URL: https://issues.apache.org/jira/browse/YARN-2953 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > > Running > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart > Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart > testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) > Time elapsed: 30.031 sec <<< ERROR! > java.lang.Exception: test timed out after 3 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2953) TestWorkPreservingRMRestart fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609771#comment-14609771 ] nijel commented on YARN-2953: - Hi [~rohithsharma] This test cases is passing in recent code and i see the time out is increased ( @Test (timeout = 5)). This happened on the following check-in {code} Revision: 5f57b904f550515693d93a2959e663b0d0260696 Author: Jian He Date: 31-12-2014 05:05:45 Message: YARN-2492. Added node-labels page on RM web UI. Contributed by Wangda Tan {code} Can you please validate this issue ? > TestWorkPreservingRMRestart fails on trunk > -- > > Key: YARN-2953 > URL: https://issues.apache.org/jira/browse/YARN-2953 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S > > Running > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart > Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart > testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) > Time elapsed: 30.031 sec <<< ERROR! > java.lang.Exception: test timed out after 3 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608093#comment-14608093 ] nijel commented on YARN-3830: - Thanks [~devaraj.k] for the review Test case looks bit tricky :) i will update the patch soon. > AbstractYarnScheduler.createReleaseCache may try to clean a null attempt > > > Key: YARN-3830 > URL: https://issues.apache.org/jira/browse/YARN-3830 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() > {code} > protected void createReleaseCache() { > // Cleanup the cache after nm expire interval. > new Timer().schedule(new TimerTask() { > @Override > public void run() { > for (SchedulerApplication app : applications.values()) { > T attempt = app.getCurrentAppAttempt(); > synchronized (attempt) { > for (ContainerId containerId : attempt.getPendingRelease()) { > RMAuditLogger.logFailure( > {code} > Here the attempt can be null since the attempt is created later. So null > pointer exception will come > {code} > 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] > threw an Exception. | YarnUncaughtExceptionHandler.java:68 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} > This will skip the other applications in this run. > Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3830: Attachment: YARN-3830_3.patch Sorry for the small mistake Line limit is corrected Test fail is not related to this patch. Verified locally. It is passing > AbstractYarnScheduler.createReleaseCache may try to clean a null attempt > > > Key: YARN-3830 > URL: https://issues.apache.org/jira/browse/YARN-3830 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() > {code} > protected void createReleaseCache() { > // Cleanup the cache after nm expire interval. > new Timer().schedule(new TimerTask() { > @Override > public void run() { > for (SchedulerApplication app : applications.values()) { > T attempt = app.getCurrentAppAttempt(); > synchronized (attempt) { > for (ContainerId containerId : attempt.getPendingRelease()) { > RMAuditLogger.logFailure( > {code} > Here the attempt can be null since the attempt is created later. So null > pointer exception will come > {code} > 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] > threw an Exception. | YarnUncaughtExceptionHandler.java:68 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} > This will skip the other applications in this run. > Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3830: Attachment: YARN-3830_2.patch Thanks [~xgong] for the comment. Updated the patch Please review > AbstractYarnScheduler.createReleaseCache may try to clean a null attempt > > > Key: YARN-3830 > URL: https://issues.apache.org/jira/browse/YARN-3830 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3830_1.patch, YARN-3830_2.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() > {code} > protected void createReleaseCache() { > // Cleanup the cache after nm expire interval. > new Timer().schedule(new TimerTask() { > @Override > public void run() { > for (SchedulerApplication app : applications.values()) { > T attempt = app.getCurrentAppAttempt(); > synchronized (attempt) { > for (ContainerId containerId : attempt.getPendingRelease()) { > RMAuditLogger.logFailure( > {code} > Here the attempt can be null since the attempt is created later. So null > pointer exception will come > {code} > 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] > threw an Exception. | YarnUncaughtExceptionHandler.java:68 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} > This will skip the other applications in this run. > Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3830: Attachment: YARN-3830_1.patch Updated the patch. Please review > AbstractYarnScheduler.createReleaseCache may try to clean a null attempt > > > Key: YARN-3830 > URL: https://issues.apache.org/jira/browse/YARN-3830 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3830_1.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() > {code} > protected void createReleaseCache() { > // Cleanup the cache after nm expire interval. > new Timer().schedule(new TimerTask() { > @Override > public void run() { > for (SchedulerApplication app : applications.values()) { > T attempt = app.getCurrentAppAttempt(); > synchronized (attempt) { > for (ContainerId containerId : attempt.getPendingRelease()) { > RMAuditLogger.logFailure( > {code} > Here the attempt can be null since the attempt is created later. So null > pointer exception will come > {code} > 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] > threw an Exception. | YarnUncaughtExceptionHandler.java:68 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} > This will skip the other applications in this run. > Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
nijel created YARN-3830: --- Summary: AbstractYarnScheduler.createReleaseCache may try to clean a null attempt Key: YARN-3830 URL: https://issues.apache.org/jira/browse/YARN-3830 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() {code} protected void createReleaseCache() { // Cleanup the cache after nm expire interval. new Timer().schedule(new TimerTask() { @Override public void run() { for (SchedulerApplication app : applications.values()) { T attempt = app.getCurrentAppAttempt(); synchronized (attempt) { for (ContainerId containerId : attempt.getPendingRelease()) { RMAuditLogger.logFailure( {code} Here the attempt can be null since the attempt is created later. So null pointer exception will come {code} 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw an Exception. | YarnUncaughtExceptionHandler.java:68 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} This will skip the other applications in this run. Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1948) Expose utility methods in Apps.java publically
[ https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589782#comment-14589782 ] nijel commented on YARN-1948: - thanks [~vinodkv] for the comment I am thinking of changing both method names as "*updateEnv*". Not getting any better name :( public static void updateEnv( Another option is to leave the env related stuff and name it from map perspective since the env is represented as map in this function. Any thoughts ? > Expose utility methods in Apps.java publically > -- > > Key: YARN-1948 > URL: https://issues.apache.org/jira/browse/YARN-1948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.4.0 >Reporter: Sandy Ryza >Assignee: nijel > Labels: newbie > Attachments: YARN-1948-1.patch > > > Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by > MapReduce, Spark, and Tez that are currently marked private. As these are > useful for any YARN app that wants to allow users to augment container > environments, it would be helpful to make them public. > It may make sense to put them in a new class with a better name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3813) Support Application timeout feature in YARN.
nijel created YARN-3813: --- Summary: Support Application timeout feature in YARN. Key: YARN-3813 URL: https://issues.apache.org/jira/browse/YARN-3813 Project: Hadoop YARN Issue Type: New Feature Reporter: nijel Fix For: 2.8.0 It will be useful to support Application Timeout in YARN. Some use cases are not worried about the output of the applications if the application is not completed in a specific time. *Background:* The requirement is to show the CDR statistics of last few minutes, say for every 5 minutes. The same Job will run continuously with different dataset. So one job will be started in every 5 minutes. The estimate time for this task is 2 minutes or lesser time. If the application is not completing in the given time the output is not useful. *Proposal* So idea is to support application timeout, with which timeout parameter is given while submitting the job. Here, user is expecting to finish (complete or kill) the application in the given time. One option for us is to move this logic to Application client (who submit the job). But it will be nice if it can be generic logic and can make more robust. Kindly provide your suggestions/opinion on this feature. If it sounds good, i will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1948) Expose utility methods in Apps.java publically
[ https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-1948: Attachment: YARN-1948-1.patch Attached the file with modification Please review > Expose utility methods in Apps.java publically > -- > > Key: YARN-1948 > URL: https://issues.apache.org/jira/browse/YARN-1948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.4.0 >Reporter: Sandy Ryza >Assignee: nijel > Labels: newbie > Attachments: YARN-1948-1.patch > > > Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by > MapReduce, Spark, and Tez that are currently marked private. As these are > useful for any YARN app that wants to allow users to augment container > environments, it would be helpful to make them public. > It may make sense to put them in a new class with a better name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3796) Support User level Quota for space and Name (count)
nijel created YARN-3796: --- Summary: Support User level Quota for space and Name (count) Key: YARN-3796 URL: https://issues.apache.org/jira/browse/YARN-3796 Project: Hadoop YARN Issue Type: New Feature Reporter: nijel Assignee: nijel I would like to have one feature in HDFS to have quota management at user level. Background : When the customer uses a multi tenant solution it will have many Hadoop eco system components like HIVE, HBASE, yarn etc. The base folder of these components are different like /hive - Hive , /hbase -HBase. Now if a user creates some file or table these will be under the folder specific to component. If the user name is taken into account it looks like {code} /hive/user1/table1 /hive/user2/table1 /hbase/user1/Htable1 /hbase/user2/Htable1 Same for yarn/map-reduce data and logs {code} In this case restricting the user to use a certain amount of disk/file is very difficult since the current quota management is at folder level. Requirement: User level Quota for space and Name (count). Say user1 can have 100G irrespective of the folder or location used. Here the idea to consider the file owner ad the key and attribute the quota to it. So the current quota system can have a initial check for the user quota if defined, before validating the folder quota. Note: This need a change in fsimage to store the user and quota information Please have a look on this scenario. If it sounds good, i will create the tasks and the update the design and prototype. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3771: Attachment: 0001-YARN-3771.patch Attached the patch. Please review > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3771.patch > > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel moved HDFS-8526 to YARN-3771: --- Key: YARN-3771 (was: HDFS-8526) Project: Hadoop YARN (was: Hadoop HDFS) > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1948) Expose utility methods in Apps.java publically
[ https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-1948: --- Assignee: nijel > Expose utility methods in Apps.java publically > -- > > Key: YARN-1948 > URL: https://issues.apache.org/jira/browse/YARN-1948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.4.0 >Reporter: Sandy Ryza >Assignee: nijel > Labels: newbie > > Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by > MapReduce, Spark, and Tez that are currently marked private. As these are > useful for any YARN app that wants to allow users to augment container > environments, it would be helpful to make them public. > It may make sense to put them in a new class with a better name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3693) Duplicate parameters on service start for NM and RM
[ https://issues.apache.org/jira/browse/YARN-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-3693: --- Assignee: nijel > Duplicate parameters on service start for NM and RM > --- > > Key: YARN-3693 > URL: https://issues.apache.org/jira/browse/YARN-3693 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: nijel >Priority: Minor > > Steps to reproduce > = > 1.Install HA cluster with NM RM > 2.Check process id for the same > 3.ps -ef | grep > Actual result > = > Multiple parameters are duplicate like log file name, Logger type , log > directory etc. > Same is observed in RM process also > *Please find the logs below* > {quote} > dsperf 26076 1 0 12:43 ?00:00:26 > /opt/dsperf/jdk1.8.0_40//bin/java -Dproc_nodemanager -Xmx1000m > -Dhadoop.log.dir=/nodemanager/logs > -Dyarn.log.dir=/nodemanager/logs -Dhadoop.log.file=yarn.log > -Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str= > -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console > -Dyarn.policy.file=hadoop-policy.xml -Dlog4j.configuration.watch=true > -Dhadoop.log.dir=/nodemanager/logs > -Dyarn.log.dir=/nodemanager/logs > -Dhadoop.log.file=yarn-dsperf-nodemanager-host-name.log > -Dyarn.log.file=yarn-dsperf-nodemanager-host.log -Dyarn.home.dir= > -Dyarn.id.str=dsperf {color:red}-Dhadoop.root.logger=INFO,RFA > {color}-Dyarn.root.logger=INFO,RFA > -Djava.library.path=/nodemanager/lib/native > -Dyarn.policy.file=hadoop-policy.xml -server > -Dhadoop.log.dir=/nodemanager/logs > -Dyarn.log.dir=/nodemanager/logs > -Dhadoop.log.file=yarn-dsperf-nodemanager-host-name.log > -Dyarn.log.file=yarn-dsperf-nodemanager-host-name.log > -Dyarn.home.dir=/nodemanager -Dhadoop.home.dir=/nodemanager > {color:red}-Dhadoop.root.logger=INFO,RFA {color}-Dyarn.root.logger=INFO,RFA > -Dlog4j.configuration=log4j.properties > -Djava.library.path=/nodemanager/lib/native -classpath XXX > org.apache.hadoop.yarn.server.nodemanager.NodeManager > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and namenode is deployed on the same node.
[ https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541697#comment-14541697 ] nijel commented on YARN-3639: - hi [~xinxianyin] Thanks for reporting this issue. Can you attach the logs of this issue ? > It takes too long time for RM to recover all apps if the original active RM > and namenode is deployed on the same node. > -- > > Key: YARN-3639 > URL: https://issues.apache.org/jira/browse/YARN-3639 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Xianyin Xin > > If the node on which the active RM runs dies and if the active namenode is > running on the same node, the new RM will take long time to recover all apps. > After analysis, we found the root cause is renewing HDFS tokens in the > recovering process. The HDFS client created by the renewer would firstly try > to connect to the original namenode, the result of which is time-out after > 10~20s, and then the client tries to connect to the new namenode. The entire > recovery cost 15*#apps seconds according our test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541306#comment-14541306 ] nijel commented on YARN-3614: - One possible cause is discussed in YARN-868 Can you try the solution given in this issue. > FileSystemRMStateStore throw exception when failed to remove application, > that cause resourcemanager to crash > - > > Key: YARN-3614 > URL: https://issues.apache.org/jira/browse/YARN-3614 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > FileSystemRMStateStore is only a accessorial plug-in of rmstore. > When it failed to remove application, I think warning is enough, but now > resourcemanager crashed. > Recently, I configure > "yarn.resourcemanager.state-store.max-completed-applications" to limit > applications number in rmstore. when applications number exceed the limit, > some old applications will be removed. If failed to remove, resourcemanager > will crash. > The following is log: > 2015-05-11 06:58:43,815 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing > info for app: application_1430994493305_0053 > 2015-05-11 06:58:43,815 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Removing info for app: application_1430994493305_0053 at: > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > 2015-05-11 06:58:43,816 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > removing app: application_1430994493305_0053 > java.lang.Exception: Failed to delete > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2015-05-11 06:58:43,819 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.lang.Exception: Failed to delete > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMach
[jira] [Commented] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541225#comment-14541225 ] nijel commented on YARN-3629: - Thanks [~devaraj.k] for the reviewing and committing the patch > NodeID is always printed as "null" in node manager initialization log. > -- > > Key: YARN-3629 > URL: https://issues.apache.org/jira/browse/YARN-3629 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Fix For: 2.8.0 > > Attachments: YARN-3629-1.patch > > > In Node manager log during startup the following logs is printed > 2015-05-12 11:20:02,347 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized > nodemanager for *null* : physical-memory=4096 virtual-memory=8602 > virtual-cores=8 > This line is printed from NodeStatusUpdaterImpl.serviceInit. > But the nodeid assignment is happening only in > NodeStatusUpdaterImpl.serviceStart > {code} > protected void serviceStart() throws Exception { > // NodeManager is the last service to start, so NodeId is available. > this.nodeId = this.context.getNodeId(); > {code} > Assigning the node id in serviceinit is not feasible since it is generated by > ContainerManagerImpl.serviceStart. > The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541223#comment-14541223 ] nijel commented on YARN-3613: - Thanks [~kasha] for reviewing and committing the patch > TestContainerManagerSecurity should init and start Yarn cluster in setup > instead of individual methods > -- > > Key: YARN-3613 > URL: https://issues.apache.org/jira/browse/YARN-3613 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: nijel >Priority: Minor > Labels: newbie > Fix For: 2.8.0 > > Attachments: YARN-3613-1.patch, yarn-3613-2.patch > > > In TestContainerManagerSecurity, individual tests init and start Yarn > cluster. This duplication can be avoided by moving that to setup. > Further, one could merge the two @Test methods to avoid bringing up another > mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539788#comment-14539788 ] nijel commented on YARN-3629: - bq.-1 tests included This is a log message change. So tests are not required > NodeID is always printed as "null" in node manager initialization log. > -- > > Key: YARN-3629 > URL: https://issues.apache.org/jira/browse/YARN-3629 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3629-1.patch > > > In Node manager log during startup the following logs is printed > 2015-05-12 11:20:02,347 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized > nodemanager for *null* : physical-memory=4096 virtual-memory=8602 > virtual-cores=8 > This line is printed from NodeStatusUpdaterImpl.serviceInit. > But the nodeid assignment is happening only in > NodeStatusUpdaterImpl.serviceStart > {code} > protected void serviceStart() throws Exception { > // NodeManager is the last service to start, so NodeId is available. > this.nodeId = this.context.getNodeId(); > {code} > Assigning the node id in serviceinit is not feasible since it is generated by > ContainerManagerImpl.serviceStart. > The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539495#comment-14539495 ] nijel commented on YARN-3629: - Moving the logs message is bit tricky since it logs some parameters which is not available in serviceStart. So keeping this log as it is Adding a new log message to print the nodeid for information purpose Any different thought ? > NodeID is always printed as "null" in node manager initialization log. > -- > > Key: YARN-3629 > URL: https://issues.apache.org/jira/browse/YARN-3629 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3629-1.patch > > > In Node manager log during startup the following logs is printed > 2015-05-12 11:20:02,347 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized > nodemanager for *null* : physical-memory=4096 virtual-memory=8602 > virtual-cores=8 > This line is printed from NodeStatusUpdaterImpl.serviceInit. > But the nodeid assignment is happening only in > NodeStatusUpdaterImpl.serviceStart > {code} > protected void serviceStart() throws Exception { > // NodeManager is the last service to start, so NodeId is available. > this.nodeId = this.context.getNodeId(); > {code} > Assigning the node id in serviceinit is not feasible since it is generated by > ContainerManagerImpl.serviceStart. > The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3629: Attachment: YARN-3629-1.patch Please review > NodeID is always printed as "null" in node manager initialization log. > -- > > Key: YARN-3629 > URL: https://issues.apache.org/jira/browse/YARN-3629 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: YARN-3629-1.patch > > > In Node manager log during startup the following logs is printed > 2015-05-12 11:20:02,347 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized > nodemanager for *null* : physical-memory=4096 virtual-memory=8602 > virtual-cores=8 > This line is printed from NodeStatusUpdaterImpl.serviceInit. > But the nodeid assignment is happening only in > NodeStatusUpdaterImpl.serviceStart > {code} > protected void serviceStart() throws Exception { > // NodeManager is the last service to start, so NodeId is available. > this.nodeId = this.context.getNodeId(); > {code} > Assigning the node id in serviceinit is not feasible since it is generated by > ContainerManagerImpl.serviceStart. > The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.
nijel created YARN-3629: --- Summary: NodeID is always printed as "null" in node manager initialization log. Key: YARN-3629 URL: https://issues.apache.org/jira/browse/YARN-3629 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel In Node manager log during startup the following logs is printed 2015-05-12 11:20:02,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for *null* : physical-memory=4096 virtual-memory=8602 virtual-cores=8 This line is printed from NodeStatusUpdaterImpl.serviceInit. But the nodeid assignment is happening only in NodeStatusUpdaterImpl.serviceStart {code} protected void serviceStart() throws Exception { // NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); {code} Assigning the node id in serviceinit is not feasible since it is generated by ContainerManagerImpl.serviceStart. The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3613: Attachment: YARN-3613-1.patch Please review the patch. Removed 2 unused imports. Test time reduced from ~130 to ~80 sec > TestContainerManagerSecurity should init and start Yarn cluster in setup > instead of individual methods > -- > > Key: YARN-3613 > URL: https://issues.apache.org/jira/browse/YARN-3613 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: nijel >Priority: Minor > Labels: newbie > Attachments: YARN-3613-1.patch > > > In TestContainerManagerSecurity, individual tests init and start Yarn > cluster. This duplication can be avoided by moving that to setup. > Further, one could merge the two @Test methods to avoid bringing up another > mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537837#comment-14537837 ] nijel commented on YARN-3614: - hi @lachisis bq.when standby resourcemanager try to transitiontoActive, it will cost more than ten minutes to load applications Is this a secure cluster ? > FileSystemRMStateStore throw exception when failed to remove application, > that cause resourcemanager to crash > - > > Key: YARN-3614 > URL: https://issues.apache.org/jira/browse/YARN-3614 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > FileSystemRMStateStore is only a accessorial plug-in of rmstore. > When it failed to remove application, I think warning is enough, but now > resourcemanager crashed. > Recently, I configure > "yarn.resourcemanager.state-store.max-completed-applications" to limit > applications number in rmstore. when applications number exceed the limit, > some old applications will be removed. If failed to remove, resourcemanager > will crash. > The following is log: > 2015-05-11 06:58:43,815 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing > info for app: application_1430994493305_0053 > 2015-05-11 06:58:43,815 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Removing info for app: application_1430994493305_0053 at: > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > 2015-05-11 06:58:43,816 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > removing app: application_1430994493305_0053 > java.lang.Exception: Failed to delete > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2015-05-11 06:58:43,819 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.lang.Exception: Failed to delete > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at >
[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537514#comment-14537514 ] nijel commented on YARN-3613: - i will update the patch. > TestContainerManagerSecurity should init and start Yarn cluster in setup > instead of individual methods > -- > > Key: YARN-3613 > URL: https://issues.apache.org/jira/browse/YARN-3613 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: nijel >Priority: Minor > Labels: newbie > > In TestContainerManagerSecurity, individual tests init and start Yarn > cluster. This duplication can be avoided by moving that to setup. > Further, one could merge the two @Test methods to avoid bringing up another > mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-3613: --- Assignee: nijel > TestContainerManagerSecurity should init and start Yarn cluster in setup > instead of individual methods > -- > > Key: YARN-3613 > URL: https://issues.apache.org/jira/browse/YARN-3613 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: nijel >Priority: Minor > Labels: newbie > > In TestContainerManagerSecurity, individual tests init and start Yarn > cluster. This duplication can be avoided by moving that to setup. > Further, one could merge the two @Test methods to avoid bringing up another > mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
[ https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3584: Attachment: YARN-3584-2.patch Thanks [~jianhe] Updated patch to fix the comment. > [Log mesage correction] : MIssing space in Diagnostics message > -- > > Key: YARN-3584 > URL: https://issues.apache.org/jira/browse/YARN-3584 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel >Priority: Trivial > Labels: newbie > Attachments: YARN-3584-1.patch, YARN-3584-2.patch > > > For more detailed output, check application tracking page: > https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, > click on links to logs of each attempt. > In this Then is not part of thr URL. Better to use a space in between so that > the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file
[ https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3018: Attachment: YARN-3018-5.patch Thanks [~vinodkv]. I completely missed this JIRA. Updated the patch to keep the value as 40. > Unify the default value for yarn.scheduler.capacity.node-locality-delay in > code and default xml file > > > Key: YARN-3018 > URL: https://issues.apache.org/jira/browse/YARN-3018 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, > YARN-3018-4.patch, YARN-3018-5.patch > > > For the configuration item "yarn.scheduler.capacity.node-locality-delay" the > default value given in code is "-1" > public static final int DEFAULT_NODE_LOCALITY_DELAY = -1; > In the default capacity-scheduler.xml file in the resource manager config > directory it is 40. > Can it be unified to avoid confusion when the user creates the file without > this configuration. IF he expects the values in the file to be default > values, then it will be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file
[ https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530699#comment-14530699 ] nijel commented on YARN-3018: - The test failure can be solved by changing the following lines in TestLeafQueue.testLocalityConstraints() -verify(app_0,never()).allocate(eq(NodeType.RACK_LOCAL), eq(node_1_1), - line number 2394 +verify(app_0, never()).allocate(eq(NodeType.NODE_LOCAL), eq(node_1_1), any(Priority.class), any(ResourceRequest.class), any(Container.class)); assertEquals(0, app_0.getSchedulingOpportunities(priority)); -assertEquals(1, app_0.getTotalRequiredResources(priority));- line number 2397 +assertEquals(0, app_0.getTotalRequiredResources(priority)); But i am not sure about the impact. Can any one help me in this ? > Unify the default value for yarn.scheduler.capacity.node-locality-delay in > code and default xml file > > > Key: YARN-3018 > URL: https://issues.apache.org/jira/browse/YARN-3018 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, > YARN-3018-4.patch > > > For the configuration item "yarn.scheduler.capacity.node-locality-delay" the > default value given in code is "-1" > public static final int DEFAULT_NODE_LOCALITY_DELAY = -1; > In the default capacity-scheduler.xml file in the resource manager config > directory it is 40. > Can it be unified to avoid confusion when the user creates the file without > this configuration. IF he expects the values in the file to be default > values, then it will be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability
[ https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3271: Attachment: YARN-3271.4.patch Thanks [~kasha] for the comments bq.tearDown need not explicitly stop the scheduler. Stopping the RM should take care of the scheduler as well. Done bq.testNotUserAsDefaultQueue and testDontAllowUndeclaredPools need not stop the RM and re-instantiate it. We could just call scheduler.reinitialize I tried this. As per my analysis the reinitialize call will not consider the conf object. {code} FairScheduler.java @Override public void reinitialize(Configuration conf, RMContext rmContext) throws IOException { try { allocsLoader.reloadAllocations(); } catch (Exception e) { LOG.error("Failed to reload allocations file", e); } } Here conf is not used. {code} bq.testMoveRunnableApp - Remove commented out scheduler.init and start Done > FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler > to TestAppRunnability > --- > > Key: YARN-3271 > URL: https://issues.apache.org/jira/browse/YARN-3271 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Karthik Kambatla >Assignee: nijel > Labels: BB2015-05-TBR > Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch, > YARN-3271.4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file
[ https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3018: Attachment: YARN-3018-4.patch Thanks [~jianhe] for the comment Agree with you. Updated the patch with the changes > Unify the default value for yarn.scheduler.capacity.node-locality-delay in > code and default xml file > > > Key: YARN-3018 > URL: https://issues.apache.org/jira/browse/YARN-3018 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, > YARN-3018-4.patch > > > For the configuration item "yarn.scheduler.capacity.node-locality-delay" the > default value given in code is "-1" > public static final int DEFAULT_NODE_LOCALITY_DELAY = -1; > In the default capacity-scheduler.xml file in the resource manager config > directory it is 40. > Can it be unified to avoid confusion when the user creates the file without > this configuration. IF he expects the values in the file to be default > values, then it will be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
[ https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530149#comment-14530149 ] nijel commented on YARN-3584: - Test failures are not related to this patch Checkstyle is showing wrong warning i think. The lines starting at indent 10. > [Log mesage correction] : MIssing space in Diagnostics message > -- > > Key: YARN-3584 > URL: https://issues.apache.org/jira/browse/YARN-3584 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3584-1.patch > > > For more detailed output, check application tracking page: > https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, > click on links to logs of each attempt. > In this Then is not part of thr URL. Better to use a space in between so that > the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
[ https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3584: Attachment: (was: YARN-3584-1.patch) > [Log mesage correction] : MIssing space in Diagnostics message > -- > > Key: YARN-3584 > URL: https://issues.apache.org/jira/browse/YARN-3584 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3584-1.patch > > > For more detailed output, check application tracking page: > https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, > click on links to logs of each attempt. > In this Then is not part of thr URL. Better to use a space in between so that > the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
[ https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3584: Attachment: YARN-3584-1.patch updated patch > [Log mesage correction] : MIssing space in Diagnostics message > -- > > Key: YARN-3584 > URL: https://issues.apache.org/jira/browse/YARN-3584 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3584-1.patch > > > For more detailed output, check application tracking page: > https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, > click on links to logs of each attempt. > In this Then is not part of thr URL. Better to use a space in between so that > the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
[ https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3584: Attachment: YARN-3584-1.patch Attached the change. Please review > [Log mesage correction] : MIssing space in Diagnostics message > -- > > Key: YARN-3584 > URL: https://issues.apache.org/jira/browse/YARN-3584 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3584-1.patch > > > For more detailed output, check application tracking page: > https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, > click on links to logs of each attempt. > In this Then is not part of thr URL. Better to use a space in between so that > the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
nijel created YARN-3584: --- Summary: [Log mesage correction] : MIssing space in Diagnostics message Key: YARN-3584 URL: https://issues.apache.org/jira/browse/YARN-3584 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Trivial For more detailed output, check application tracking page: https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, click on links to logs of each attempt. In this Then is not part of thr URL. Better to use a space in between so that the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file
[ https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3018: Attachment: YARN-3018-3.patch Re trigger the CIS. Patch was wrongly generated sorry for the noise > Unify the default value for yarn.scheduler.capacity.node-locality-delay in > code and default xml file > > > Key: YARN-3018 > URL: https://issues.apache.org/jira/browse/YARN-3018 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: nijel >Assignee: nijel >Priority: Trivial > Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch > > > For the configuration item "yarn.scheduler.capacity.node-locality-delay" the > default value given in code is "-1" > public static final int DEFAULT_NODE_LOCALITY_DELAY = -1; > In the default capacity-scheduler.xml file in the resource manager config > directory it is 40. > Can it be unified to avoid confusion when the user creates the file without > this configuration. IF he expects the values in the file to be default > values, then it will be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)