[jira] [Assigned] (YARN-1948) Expose utility methods in Apps.java publically

2021-06-26 Thread nijel (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-1948:
---

Assignee: (was: nijel)

> Expose utility methods in Apps.java publically
> --
>
> Key: YARN-1948
> URL: https://issues.apache.org/jira/browse/YARN-1948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Sandy Ryza
>Priority: Major
>  Labels: newbie
> Attachments: YARN-1948-1.patch
>
>
> Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
> MapReduce, Spark, and Tez that are currently marked private.  As these are 
> useful for any YARN app that wants to allow users to augment container 
> environments, it would be helpful to make them public.
> It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4303) Confusing help message if AM logs cant be retrieved via yarn logs command

2021-06-26 Thread nijel (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4303:
---

Assignee: (was: nijel)

> Confusing help message if AM logs cant be retrieved via yarn logs command
> -
>
> Key: YARN-4303
> URL: https://issues.apache.org/jira/browse/YARN-4303
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Priority: Minor
>
> {noformat}
> yarn@BLR102525:~/test/install/hadoop/resourcemanager/bin> ./yarn logs 
> --applicationId application_1445832014581_0028 -am ALL
> Can not get AMContainers logs for the 
> application:application_1445832014581_0028
> This application:application_1445832014581_0028 is finished. Please enable 
> the application history service. Or Using yarn logs -applicationId  
> -containerId  --nodeAddress  to get the 
> container logs
> {noformat}
> Part of the command output mentioned above indicates that using {{yarn logs 
> -applicationId  -containerId  --nodeAddress 
> }} will fetch desired result. It asks you to specify 
> nodeHttpAddress which makes it sound like we have to connect to nodemanager's 
> webapp address.
> This help message should be changed to include command as {{yarn logs 
> -applicationId  -containerId  --nodeAddress  Address>}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2021-06-26 Thread nijel (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4110:
---

Assignee: (was: nijel)

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-4110_1.patch
>
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4249) Many options in "yarn application" command is not documented

2017-12-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4249:
---

Assignee: (was: nijel)

> Many options in "yarn application" command is not documented
> 
>
> Key: YARN-4249
> URL: https://issues.apache.org/jira/browse/YARN-4249
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>
> in document only few options are specified.
> {code}
> Usage: `yarn application [options] `
> | COMMAND\_OPTIONS | Description |
> |: |: |
> | -appStates \ | Works with -list to filter applications based on 
> input comma-separated list of application states. The valid application state 
> can be one of the following:  ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, 
> RUNNING, FINISHED, FAILED, KILLED |
> | -appTypes \ | Works with -list to filter applications based on 
> input comma-separated list of application types. |
> | -list | Lists applications from the RM. Supports optional use of -appTypes 
> to filter applications based on application type, and -appStates to filter 
> applications based on application state. |
> | -kill \ | Kills the application. |
> | -status \ | Prints the status of the application. |
> {code}
> some options are missing like
> -appId  Specify Application Id to be operated
> -help   Displays help for all commands.
> -movetoqueueMoves the application to a different queue.
> -queue  Works with the movetoqueue command to specify 
> which queue to move an application to.
> -updatePriority   update priority of an 
> application.ApplicationId can be passed using 'appId' option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-11-05 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991374#comment-14991374
 ] 

nijel commented on YARN-2934:
-

thanks [~Naganarasimha] for the patch

Few minor comments/doubts

1.
{code}
FileStatus[] listStatus =
   fileSystem.listStatus(containerLogDir, new PathFilter() {
 @Override
 public boolean accept(Path path) {
   return FilenameUtils.wildcardMatch(path.getName(),
   errorFileNamePattern, IOCase.INSENSITIVE);
 }
   });
{code}
What if this give multiple error files ? 


2. 
{code}
 } catch (IOException e) {
LOG.warn("Failed while trying to read container's error log", e);
  }
{code}
Can this be error log ? I think there should not be any exception in reading 
the file. If there is an error, better to log error log



> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-28 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978231#comment-14978231
 ] 

nijel commented on YARN-4246:
-

thanks [~varun_saxena] and [~rohithsharma] for the review and commit.

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Fix For: 2.8.0
>
> Attachments: YARN-4246_1.patch, YARN-4246_2.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4303) Confusing help message if AM logs cant be retrieved via yarn logs command

2015-10-27 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4303:
---

Assignee: nijel

> Confusing help message if AM logs cant be retrieved via yarn logs command
> -
>
> Key: YARN-4303
> URL: https://issues.apache.org/jira/browse/YARN-4303
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
>Priority: Minor
>
> {noformat}
> yarn@BLR102525:~/test/install/hadoop/resourcemanager/bin> ./yarn logs 
> --applicationId application_1445832014581_0028 -am ALL
> Can not get AMContainers logs for the 
> application:application_1445832014581_0028
> This application:application_1445832014581_0028 is finished. Please enable 
> the application history service. Or Using yarn logs -applicationId  
> -containerId  --nodeAddress  to get the 
> container logs
> {noformat}
> Part of the command output mentioned above indicates that using {{yarn logs 
> -applicationId  -containerId  --nodeAddress 
> }} will fetch desired result. It asks you to specify 
> nodeHttpAddress which makes it sound like we have to connect to nodemanager's 
> webapp address.
> This help message should be changed to include command as {{yarn logs 
> -applicationId  -containerId  --nodeAddress  Address>}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-13 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955054#comment-14955054
 ] 

nijel commented on YARN-4246:
-

bq. -1  yarn tests

as per my analysis test fails are not related to this patch.
Please review

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Attachments: YARN-4246_1.patch, YARN-4246_2.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4246) NPE while listing app attempt

2015-10-12 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4246:

Attachment: YARN-4246_2.patch

Thanks [~steve_l] for pointing out the mistake.
Updated the patch with the comment fix.

thanks

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Attachments: YARN-4246_1.patch, YARN-4246_2.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4246) NPE while listing app attempt

2015-10-10 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4246:

Attachment: YARN-4246_1.patch

patch.

Please review

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Attachments: YARN-4246_1.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4249) Many options in "yarn application" command is not documented

2015-10-09 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4249:

Summary: Many options in "yarn application" command is not documented  
(was: Many options in "yarn application" command is not documents)

> Many options in "yarn application" command is not documented
> 
>
> Key: YARN-4249
> URL: https://issues.apache.org/jira/browse/YARN-4249
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>
> in document only few options are specified.
> {code}
> Usage: `yarn application [options] `
> | COMMAND\_OPTIONS | Description |
> |: |: |
> | -appStates \ | Works with -list to filter applications based on 
> input comma-separated list of application states. The valid application state 
> can be one of the following:  ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, 
> RUNNING, FINISHED, FAILED, KILLED |
> | -appTypes \ | Works with -list to filter applications based on 
> input comma-separated list of application types. |
> | -list | Lists applications from the RM. Supports optional use of -appTypes 
> to filter applications based on application type, and -appStates to filter 
> applications based on application state. |
> | -kill \ | Kills the application. |
> | -status \ | Prints the status of the application. |
> {code}
> some options are missing like
> -appId  Specify Application Id to be operated
> -help   Displays help for all commands.
> -movetoqueueMoves the application to a different queue.
> -queue  Works with the movetoqueue command to specify 
> which queue to move an application to.
> -updatePriority   update priority of an 
> application.ApplicationId can be passed using 'appId' option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4249) Many options in "yarn application" command is not documents

2015-10-09 Thread nijel (JIRA)
nijel created YARN-4249:
---

 Summary: Many options in "yarn application" command is not 
documents
 Key: YARN-4249
 URL: https://issues.apache.org/jira/browse/YARN-4249
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


in document only few options are specified.
{code}
Usage: `yarn application [options] `

| COMMAND\_OPTIONS | Description |
|: |: |
| -appStates \ | Works with -list to filter applications based on 
input comma-separated list of application states. The valid application state 
can be one of the following:  ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, 
RUNNING, FINISHED, FAILED, KILLED |
| -appTypes \ | Works with -list to filter applications based on input 
comma-separated list of application types. |
| -list | Lists applications from the RM. Supports optional use of -appTypes to 
filter applications based on application type, and -appStates to filter 
applications based on application state. |
| -kill \ | Kills the application. |
| -status \ | Prints the status of the application. |
{code}


some options are missing like
-appId  Specify Application Id to be operated
-help   Displays help for all commands.
-movetoqueueMoves the application to a different queue.
-queue  Works with the movetoqueue command to specify 
which queue to move an application to.
-updatePriority   update priority of an application.ApplicationId 
can be passed using 'appId' option.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-09 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950215#comment-14950215
 ] 

nijel commented on YARN-4246:
-

thanks [~varun_saxena] for reporting
 the same issue is there in applicationattempt  status also

{noformat}
 ./yarn applicationattempt -status appattempt_1444389134985_0001_01
15/10/09 16:53:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/10/09 16:53:20 INFO impl.TimelineClientImpl: Timeline service address: 
http://10.18.130.110:55033/ws/v1/timeline/
15/10/09 16:53:20 INFO client.RMProxy: Connecting to ResourceManager at 
host-10-18-130-110/10.18.130.110:8032
15/10/09 16:53:21 INFO client.AHSProxy: Connecting to Application History 
server at /10.18.130.110:55034
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationAttemptReport(ApplicationCLI.java:352)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:182)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
{noformat}

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2015-09-28 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4205:

Attachment: YARN-4205_03.patch

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch, 
> YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2015-09-28 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933513#comment-14933513
 ] 

nijel commented on YARN-4205:
-

Thanks [~rohithsharma] for the comments
Updated the patch.


> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909121#comment-14909121
 ] 

nijel commented on YARN-4205:
-

Test cases failing as "method not found" for the method added in api project.
These tests passing locally !

I am not getting the reason for this fail. Any issue with build can cause this 
? 

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-25 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909120#comment-14909120
 ] 

nijel commented on YARN-4111:
-

Thanks [~rohithsharma] and [~sunilg] for the comments
If we add the new constructor to add the message, other event classes like 
RMAppRejectedEvent and RMAppFinishedAttemptEvent can be removed ? these are 
also added to handle the message.

Or these classes can be kept as it is event separation for future updations.

what you say ?

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, 
> YARN-4111_4.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4205:

Description: 
This JIRA intend to provide a lifetime monitor service. 
The service will monitor the applications where the life time is configured. If 
the application is running beyond the lifetime, it will be killed. 
The lifetime will be considered from the submit time.

The thread monitoring interval is configurable.


> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908204#comment-14908204
 ] 

nijel commented on YARN-4205:
-

Thanks [~leftnoteasy] for the comments
Sorry for the small mistakes

Updated the patch
bq. RMAppLifeTimeMonitorService.rmApps -> applicationIdToLifetime? (or shorter 
name if you prefer), it's not rmApps actually
changed to monitoredapps

bq. public synchronized void unregister, synchronized could be removed?
Done

Other comments are fixed.

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4205:

Attachment: YARN-4205_02.patch

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2015-09-24 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4205:

Attachment: YARN-4205_01.patch

Uploading initial version.

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4206) Add life time value in Application report and web UI

2015-09-24 Thread nijel (JIRA)
nijel created YARN-4206:
---

 Summary: Add life time value in Application report and web UI
 Key: YARN-4206
 URL: https://issues.apache.org/jira/browse/YARN-4206
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: nijel
Assignee: nijel






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4205) Add a service for monitoring application life time out

2015-09-24 Thread nijel (JIRA)
nijel created YARN-4205:
---

 Summary: Add a service for monitoring application life time out
 Key: YARN-4205
 URL: https://issues.apache.org/jira/browse/YARN-4205
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: nijel
Assignee: nijel






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-09-24 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906252#comment-14906252
 ] 

nijel commented on YARN-3813:
-

Thanks [~leftnoteasy] for the comments
I will update the patch with the code comments

bq. Do you plan to support updating lifetime when the application is running?
As per our understanding the following 2 are the use cases for this
1. User can increase the life time after some time and seeing the progress 
which already being monitored by timeout monitor
2. User can add a timeout for a running application so that this also will 
monitored. 

In both these cases the updated time out will be the total life time (from 
submitted time)

Please let us know you are thinking of any other scenario so that we can pla 
the interfaces accordingly.

bq. Do you plan to get lifetime via ApplicationReport, CLI, REST API?
Yes. As of now we plan for ApplicationReport. Based on the dynamic update the 
interfaces can be defined and handle as a subtask.

We had some offline chat as well. Few subtsks raised for the planned work. 
Please give your opinion

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, 0002_YARN-3813.patch, YARN 
> Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4202) TestYarnClient#testReservationAPIs fails intermittently

2015-09-23 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4202:
---

Assignee: nijel

> TestYarnClient#testReservationAPIs fails intermittently
> ---
>
> Key: YARN-4202
> URL: https://issues.apache.org/jira/browse/YARN-4202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: nijel
> Fix For: 3.0.0
>
>
> Found this failure while looking at the Pre-run on one of my Jiras.
> {noformat}
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1149)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:428)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:465)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224)
> Caused by: 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:69)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1132)
>  ... 10 more
> {noformat}
> TestReport Link: 
> https://builds.apache.org/job/PreCommit-YARN-Build/9243/testReport/
> When I ran this on my local box branch-2, it succeeds.
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.999 sec - 
> in org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Results :
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 52.029 s
> [INFO] Finished at: 2015-09-23T11:25:04-06:00
> [INFO] Final Memory: 31M/391M
> [INFO] 
> 
> {noformat}
> Haven't tried if it is a problem in branch-2.7 or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-09-23 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: 0002_YARN-3813.patch

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, 0002_YARN-3813.patch, YARN 
> Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-09-23 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904688#comment-14904688
 ] 

nijel commented on YARN-3813:
-


thanks [~rohithsharma] and [~sunilg] for the comments
Updated patch with the the comment fix and test case for recovery.

bq. we are starting the monitor thread always regardless whether application 
demands for applicationtimeout or not. I feel we can have a configuration to 
enable this feature in RM level. Thoughts?
As i pinged you offline, this service will consider only apps which are 
configured with a timeout. So leaving as a default service.

bq.RMAppTimeOutMonitor : When InterruptedException is thrown in the below code, 
thread should break or throw back exception. So, thread will die else thread 
wil be alive for ever
The while loop is guarded for interrupted state



> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-20 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877467#comment-14877467
 ] 

nijel commented on YARN-4135:
-

thanks [~rohithsharma] and [~adhoot]

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4192) Add YARN metric logging periodically to a seperate file

2015-09-20 Thread nijel (JIRA)
nijel created YARN-4192:
---

 Summary: Add YARN metric logging periodically to a seperate file
 Key: YARN-4192
 URL: https://issues.apache.org/jira/browse/YARN-4192
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: nijel
Assignee: nijel
Priority: Minor


HDFS-8880 added a framework for logging metrics in a given interval.
This can be added to YARN as well

Any thoughts ? 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-16 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747106#comment-14747106
 ] 

nijel commented on YARN-4135:
-

bq.-1   yarn tests  54m 15s Tests failed in 
hadoop-yarn-server-resourcemanager.
Test skip is not related to this change

Thanks

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-15 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4135:

Attachment: YARN-4135_2.patch

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-15 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746870#comment-14746870
 ] 

nijel commented on YARN-4135:
-

thanks [~adhoot] for the comment
Updated the patch. Please review

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4146) getServiceState command is missing in yarnadmin command help

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4146:
---

Assignee: (was: nijel)

> getServiceState command  is missing in yarnadmin command help
> -
>
> Key: YARN-4146
> URL: https://issues.apache.org/jira/browse/YARN-4146
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Priority: Minor
>  Labels: help, script
>
> In yarnadmin command help getServiceState command is not mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4146) getServiceState command is missing in yarnadmin command help

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel resolved YARN-4146.
-
Resolution: Invalid

Sorry.
My env was in non HA mode ! 

> getServiceState command  is missing in yarnadmin command help
> -
>
> Key: YARN-4146
> URL: https://issues.apache.org/jira/browse/YARN-4146
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: help, script
>
> In yarnadmin command help getServiceState command is not mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_4.patch

updated the javadoc for missing "."

Test skip is not related to patch

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, 
> YARN-4111_4.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4146) getServiceState command is missing in yarnadmin command help

2015-09-11 Thread nijel (JIRA)
nijel created YARN-4146:
---

 Summary: getServiceState command  is missing in yarnadmin command 
help
 Key: YARN-4146
 URL: https://issues.apache.org/jira/browse/YARN-4146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Minor


In yarnadmin command help getServiceState command is not mentioned.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_3.patch

Updated javadoc comments

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-11 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740313#comment-14740313
 ] 

nijel commented on YARN-4111:
-

Thanks [~sunilg] for the comments
bq. RMAppKilledAttemptEvent is used for both RMApp and RMAppAttempt. Name is 
slightly confusing. I think we can use this only for RMApp.
This is the same as failed and finished event. So i think this is ok.

bq. Also in RMAppAttempt, RMAppFailedAttemptEvent is changed to 
RMAppKilledAttemptEvent. Could we generalize RMAppFailedAttemptEvent for both 
Failed and Killed, and it can also take diagnostics.
before this fix failed event is raised with KILLED as state. SInce now the new 
event for kill is available it is changed.

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-09 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_2.patch

Updated the patch with checkstyle fix. 
The test failures are not related. 
Tried executing locally and are passing.

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-09 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_1.patch

Attaching the patch
Please review

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-09 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4135:

Attachment: YARN-4135_1.patch

attached the patch
please review

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-09 Thread nijel (JIRA)
nijel created YARN-4135:
---

 Summary: Improve the assertion message in MockRM while failing 
after waiting for the state.
 Key: YARN-4135
 URL: https://issues.apache.org/jira/browse/YARN-4135
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: nijel
Assignee: nijel
Priority: Minor


In MockRM when the test is failed after waiting for the given state, the 
application id or the attempt id can be printed for easy debug

As of now if it hard to track the test fail in log since there is no relation 
with test case and the application id.

Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-09-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734602#comment-14734602
 ] 

nijel commented on YARN-3771:
-

hi all,
any comment on this change ? 

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3771.patch
>
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4110:

Attachment: YARN-4110_1.patch

Attached the patch
Please review

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4110_1.patch
>
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-03 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729078#comment-14729078
 ] 

nijel commented on YARN-4110:
-

Sorry attached wrong patch so deleting the same

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4110:

Attachment: (was: 01-YARN-4110.patch)

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4110:

Attachment: 01-YARN-4110.patch

Thanks [~rohithsharma] for reporting
Attached the patch


> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: 01-YARN-4110.patch
>
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-09-03 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728898#comment-14728898
 ] 

nijel commented on YARN-3813:
-

This patch will address the initial issue. But this will kill the application 
even it is in RUNNING state.

As i understand the idea is to configure the states which the monitor needs to 
consider to kill the application. correct ?

But one doubt i have is whether the user will be aware of all the intermediate 
states for an app ?

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: 0001-YARN-3813.patch

Sorry for the long delay..

Adding an initial patch.
The action on timeout is considered as KILL.
Please have a look. I will update the patch with more test cases after initial 
review.

Thanks

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3813) Support Application timeout feature in YARN.

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3813:
---

Assignee: nijel

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3869) Add app name to RM audit log

2015-09-02 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3869:
---

Assignee: (was: nijel)

Keeping it open for further comments and opinion

> Add app name to RM audit log
> 
>
> Key: YARN-3869
> URL: https://issues.apache.org/jira/browse/YARN-3869
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shay Rojansky
>Priority: Minor
>
> The YARN resource manager audit log currently includes useful info such as 
> APPID, USER, etc. One crucial piece of information missing is the 
> user-supplied application name.
> Users are familiar with their application name as shown in the YARN UI, etc. 
> It's vital for something like logstash to be able to associate logs with the 
> application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619984#comment-14619984
 ] 

nijel commented on YARN-3813:
-

Thanks [~sunilg] and [~devaraj.k] for the comments

bq.How frequently are you going to check this condition for each application?
Plan is to have a configurable interval default to 30 sec 
(yarn.app.timeout.monitor.interval)

bq.Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we 
may not need a flag.
bq.I feel having a TIMEOUT state for RMAppImpl would be proper here. 

ok. We will add a TIMEOUT state and handle the changes
Due to this there will be few changes in app transitions, client package and 
the WEBUI

bq.I have a suggestion here.We can have a BasicAppMonitoringManager which can 
keep an entry of .
bq. when the application gets submitted to RM then we can register the 
application with RMAppTimeOutMonitor using the user specified timeout.

Yes. Good suggestion. This we will update as a registration mechanism. But 
since each application can have its own timeout period, the code reusability 
looks like minimal.

{code}
RMAppTimeOutMonitor 
local map (appid, timeout)
add/register(appid, timeout)  --> from RMAppImpl
Run -> if app is running/submitted and elapsed the time, kill it. If 
already completed, remove from map.
No delete/unregister method  --> this application will be be removed 
from map from run method
{code}

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: YARN Application Timeout .pdf

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: (was: YARN Application Timeout -3.pdf)

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618432#comment-14618432
 ] 

nijel commented on YARN-3813:
-

Attached initial draft for work details.
Please share your comments and thoughts

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout -3.pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: YARN Application Timeout -3.pdf

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout -3.pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3869) Add app name to RM audit log

2015-07-07 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616363#comment-14616363
 ] 

nijel commented on YARN-3869:
-

In Web it is shown as truncate.
But if the names are similar, will it serve the purpose ? 

Let us wait for few other opinion as well :) 

> Add app name to RM audit log
> 
>
> Key: YARN-3869
> URL: https://issues.apache.org/jira/browse/YARN-3869
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shay Rojansky
>Assignee: nijel
>Priority: Minor
>
> The YARN resource manager audit log currently includes useful info such as 
> APPID, USER, etc. One crucial piece of information missing is the 
> user-supplied application name.
> Users are familiar with their application name as shown in the YARN UI, etc. 
> It's vital for something like logstash to be able to associate logs with the 
> application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3869) Add app name to RM audit log

2015-07-06 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616282#comment-14616282
 ] 

nijel commented on YARN-3869:
-

hi
i started working on this. One observation is in some cases the application 
name will not be in readable format.
Like in HIVE, the name will the complete query string. In this case it will not 
be good to print this information in logs !
Any thoughts ? 

> Add app name to RM audit log
> 
>
> Key: YARN-3869
> URL: https://issues.apache.org/jira/browse/YARN-3869
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shay Rojansky
>Assignee: nijel
>Priority: Minor
>
> The YARN resource manager audit log currently includes useful info such as 
> APPID, USER, etc. One crucial piece of information missing is the 
> user-supplied application name.
> Users are familiar with their application name as shown in the YARN UI, etc. 
> It's vital for something like logstash to be able to associate logs with the 
> application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-07-01 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611374#comment-14611374
 ] 

nijel commented on YARN-3830:
-

Thanks [~devaraj.k] for the review and committing the patch.

> AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
> 
>
> Key: YARN-3830
> URL: https://issues.apache.org/jira/browse/YARN-3830
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Fix For: 2.8.0
>
> Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, 
> YARN-3830_4.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
> {code}
> protected void createReleaseCache() {
> // Cleanup the cache after nm expire interval.
> new Timer().schedule(new TimerTask() {
>   @Override
>   public void run() {
> for (SchedulerApplication app : applications.values()) {
>   T attempt = app.getCurrentAppAttempt();
>   synchronized (attempt) {
> for (ContainerId containerId : attempt.getPendingRelease()) {
>   RMAuditLogger.logFailure(
> {code}
> Here the attempt can be null since the attempt is created later. So null 
> pointer exception  will come
> {code}
> 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
> threw an Exception. | YarnUncaughtExceptionHandler.java:68
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> {code}
> This will skip the other applications in this run.
> Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3869) Add app name to RM audit log

2015-07-01 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3869:
---

Assignee: nijel

> Add app name to RM audit log
> 
>
> Key: YARN-3869
> URL: https://issues.apache.org/jira/browse/YARN-3869
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shay Rojansky
>Assignee: nijel
>Priority: Minor
>
> The YARN resource manager audit log currently includes useful info such as 
> APPID, USER, etc. One crucial piece of information missing is the 
> user-supplied application name.
> Users are familiar with their application name as shown in the YARN UI, etc. 
> It's vital for something like logstash to be able to associate logs with the 
> application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3869) Add app name to RM audit log

2015-07-01 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609812#comment-14609812
 ] 

nijel commented on YARN-3869:
-

hi [~roji]
i would like to work on this improvement.
Please let me know is you already started the work

> Add app name to RM audit log
> 
>
> Key: YARN-3869
> URL: https://issues.apache.org/jira/browse/YARN-3869
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shay Rojansky
>Priority: Minor
>
> The YARN resource manager audit log currently includes useful info such as 
> APPID, USER, etc. One crucial piece of information missing is the 
> user-supplied application name.
> Users are familiar with their application name as shown in the YARN UI, etc. 
> It's vital for something like logstash to be able to associated logs with the 
> application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-07-01 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_4.patch

Thanks [~devaraj.k] for the suggestion
Updated patch with test case
Please review

> AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
> 
>
> Key: YARN-3830
> URL: https://issues.apache.org/jira/browse/YARN-3830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, 
> YARN-3830_4.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
> {code}
> protected void createReleaseCache() {
> // Cleanup the cache after nm expire interval.
> new Timer().schedule(new TimerTask() {
>   @Override
>   public void run() {
> for (SchedulerApplication app : applications.values()) {
>   T attempt = app.getCurrentAppAttempt();
>   synchronized (attempt) {
> for (ContainerId containerId : attempt.getPendingRelease()) {
>   RMAuditLogger.logFailure(
> {code}
> Here the attempt can be null since the attempt is created later. So null 
> pointer exception  will come
> {code}
> 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
> threw an Exception. | YarnUncaughtExceptionHandler.java:68
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> {code}
> This will skip the other applications in this run.
> Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2953) TestWorkPreservingRMRestart fails on trunk

2015-07-01 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-2953:
---

Assignee: nijel

> TestWorkPreservingRMRestart fails on trunk
> --
>
> Key: YARN-2953
> URL: https://issues.apache.org/jira/browse/YARN-2953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
>
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
>   Time elapsed: 30.031 sec  <<< ERROR!
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2953) TestWorkPreservingRMRestart fails on trunk

2015-07-01 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609771#comment-14609771
 ] 

nijel commented on YARN-2953:
-

Hi [~rohithsharma]
This test cases is passing in recent code and i see the time out is increased ( 
@Test (timeout = 5)). This happened on the following check-in
{code}
Revision: 5f57b904f550515693d93a2959e663b0d0260696
Author: Jian He 
Date: 31-12-2014 05:05:45
Message:
YARN-2492. Added node-labels page on RM web UI. Contributed by Wangda Tan
{code}
Can you please validate this issue ? 


> TestWorkPreservingRMRestart fails on trunk
> --
>
> Key: YARN-2953
> URL: https://issues.apache.org/jira/browse/YARN-2953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
>   Time elapsed: 30.031 sec  <<< ERROR!
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-30 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608093#comment-14608093
 ] 

nijel commented on YARN-3830:
-

Thanks [~devaraj.k] for the review
Test case looks bit tricky :)
i will update the patch soon.

> AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
> 
>
> Key: YARN-3830
> URL: https://issues.apache.org/jira/browse/YARN-3830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
> {code}
> protected void createReleaseCache() {
> // Cleanup the cache after nm expire interval.
> new Timer().schedule(new TimerTask() {
>   @Override
>   public void run() {
> for (SchedulerApplication app : applications.values()) {
>   T attempt = app.getCurrentAppAttempt();
>   synchronized (attempt) {
> for (ContainerId containerId : attempt.getPendingRelease()) {
>   RMAuditLogger.logFailure(
> {code}
> Here the attempt can be null since the attempt is created later. So null 
> pointer exception  will come
> {code}
> 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
> threw an Exception. | YarnUncaughtExceptionHandler.java:68
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> {code}
> This will skip the other applications in this run.
> Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_3.patch

Sorry for the small mistake
Line limit is corrected

Test fail is not related to this patch. Verified locally. It is passing

> AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
> 
>
> Key: YARN-3830
> URL: https://issues.apache.org/jira/browse/YARN-3830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
> {code}
> protected void createReleaseCache() {
> // Cleanup the cache after nm expire interval.
> new Timer().schedule(new TimerTask() {
>   @Override
>   public void run() {
> for (SchedulerApplication app : applications.values()) {
>   T attempt = app.getCurrentAppAttempt();
>   synchronized (attempt) {
> for (ContainerId containerId : attempt.getPendingRelease()) {
>   RMAuditLogger.logFailure(
> {code}
> Here the attempt can be null since the attempt is created later. So null 
> pointer exception  will come
> {code}
> 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
> threw an Exception. | YarnUncaughtExceptionHandler.java:68
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> {code}
> This will skip the other applications in this run.
> Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_2.patch

Thanks [~xgong] for the comment.
Updated the patch
Please review

> AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
> 
>
> Key: YARN-3830
> URL: https://issues.apache.org/jira/browse/YARN-3830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3830_1.patch, YARN-3830_2.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
> {code}
> protected void createReleaseCache() {
> // Cleanup the cache after nm expire interval.
> new Timer().schedule(new TimerTask() {
>   @Override
>   public void run() {
> for (SchedulerApplication app : applications.values()) {
>   T attempt = app.getCurrentAppAttempt();
>   synchronized (attempt) {
> for (ContainerId containerId : attempt.getPendingRelease()) {
>   RMAuditLogger.logFailure(
> {code}
> Here the attempt can be null since the attempt is created later. So null 
> pointer exception  will come
> {code}
> 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
> threw an Exception. | YarnUncaughtExceptionHandler.java:68
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> {code}
> This will skip the other applications in this run.
> Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-18 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_1.patch

Updated the patch.
Please review

> AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
> 
>
> Key: YARN-3830
> URL: https://issues.apache.org/jira/browse/YARN-3830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3830_1.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
> {code}
> protected void createReleaseCache() {
> // Cleanup the cache after nm expire interval.
> new Timer().schedule(new TimerTask() {
>   @Override
>   public void run() {
> for (SchedulerApplication app : applications.values()) {
>   T attempt = app.getCurrentAppAttempt();
>   synchronized (attempt) {
> for (ContainerId containerId : attempt.getPendingRelease()) {
>   RMAuditLogger.logFailure(
> {code}
> Here the attempt can be null since the attempt is created later. So null 
> pointer exception  will come
> {code}
> 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
> threw an Exception. | YarnUncaughtExceptionHandler.java:68
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> {code}
> This will skip the other applications in this run.
> Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-18 Thread nijel (JIRA)
nijel created YARN-3830:
---

 Summary: AbstractYarnScheduler.createReleaseCache may try to clean 
a null attempt
 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
{code}
protected void createReleaseCache() {
// Cleanup the cache after nm expire interval.
new Timer().schedule(new TimerTask() {
  @Override
  public void run() {
for (SchedulerApplication app : applications.values()) {

  T attempt = app.getCurrentAppAttempt();
  synchronized (attempt) {
for (ContainerId containerId : attempt.getPendingRelease()) {
  RMAuditLogger.logFailure(
{code}

Here the attempt can be null since the attempt is created later. So null 
pointer exception  will come
{code}
2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw 
an Exception. | YarnUncaughtExceptionHandler.java:68
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
{code}

This will skip the other applications in this run.
Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1948) Expose utility methods in Apps.java publically

2015-06-17 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589782#comment-14589782
 ] 

nijel commented on YARN-1948:
-

thanks [~vinodkv] for the comment
I am thinking of changing both method names as "*updateEnv*". Not getting any 
better name :(
public static void updateEnv(

Another option is to leave the env related stuff and name it from map 
perspective since the env is represented as map in this function.

Any thoughts ? 

> Expose utility methods in Apps.java publically
> --
>
> Key: YARN-1948
> URL: https://issues.apache.org/jira/browse/YARN-1948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Sandy Ryza
>Assignee: nijel
>  Labels: newbie
> Attachments: YARN-1948-1.patch
>
>
> Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
> MapReduce, Spark, and Tez that are currently marked private.  As these are 
> useful for any YARN app that wants to allow users to augment container 
> environments, it would be helpful to make them public.
> It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3813) Support Application timeout feature in YARN.

2015-06-17 Thread nijel (JIRA)
nijel created YARN-3813:
---

 Summary: Support Application timeout feature in YARN. 
 Key: YARN-3813
 URL: https://issues.apache.org/jira/browse/YARN-3813
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: nijel
 Fix For: 2.8.0


It will be useful to support Application Timeout in YARN. Some use cases are 
not worried about the output of the applications if the application is not 
completed in a specific time. 

*Background:*
The requirement is to show the CDR statistics of last few  minutes, say for 
every 5 minutes. The same Job will run continuously with different dataset.
So one job will be started in every 5 minutes. The estimate time for this task 
is 2 minutes or lesser time. 
If the application is not completing in the given time the output is not useful.

*Proposal*
So idea is to support application timeout, with which timeout parameter is 
given while submitting the job. 
Here, user is expecting to finish (complete or kill) the application in the 
given time.


One option for us is to move this logic to Application client (who submit the 
job). 
But it will be nice if it can be generic logic and can make more robust.

Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1948) Expose utility methods in Apps.java publically

2015-06-16 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-1948:

Attachment: YARN-1948-1.patch

Attached the file with modification
Please review

> Expose utility methods in Apps.java publically
> --
>
> Key: YARN-1948
> URL: https://issues.apache.org/jira/browse/YARN-1948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Sandy Ryza
>Assignee: nijel
>  Labels: newbie
> Attachments: YARN-1948-1.patch
>
>
> Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
> MapReduce, Spark, and Tez that are currently marked private.  As these are 
> useful for any YARN app that wants to allow users to augment container 
> environments, it would be helpful to make them public.
> It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3796) Support User level Quota for space and Name (count)

2015-06-11 Thread nijel (JIRA)
nijel created YARN-3796:
---

 Summary: Support User level Quota for space and Name (count)
 Key: YARN-3796
 URL: https://issues.apache.org/jira/browse/YARN-3796
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: nijel
Assignee: nijel


I would like to have one feature in HDFS to have quota management at user 
level. 

Background :
When the customer uses a multi tenant solution it will have many Hadoop eco 
system components like HIVE, HBASE, yarn etc. The base folder of these 
components are different like /hive - Hive , /hbase -HBase. 
Now if a user creates some file  or table these will be under the folder 
specific to component. If the user name is taken into account it looks like
{code}
/hive/user1/table1
/hive/user2/table1
/hbase/user1/Htable1
/hbase/user2/Htable1
 
Same for yarn/map-reduce data and logs
{code}
 
In this case restricting the user to use a certain amount of disk/file is very 
difficult since the current quota management is at folder level.
 
Requirement: User level Quota for space and Name (count). Say user1 can have 
100G irrespective of the folder or location used.
 
Here the idea to consider the file owner ad the key and attribute the quota to 
it.  So the current quota system can have a initial check for the user quota if 
defined, before validating the folder quota.

Note:
This need a change in fsimage to store the user and quota information


Please have a look on this scenario. If it sounds good, i will create the tasks 
and the update the design and prototype.

Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-06-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3771:

Attachment: 0001-YARN-3771.patch

Attached the patch. Please review

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3771.patch
>
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-06-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel moved HDFS-8526 to YARN-3771:
---

Key: YARN-3771  (was: HDFS-8526)
Project: Hadoop YARN  (was: Hadoop HDFS)

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1948) Expose utility methods in Apps.java publically

2015-05-21 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-1948:
---

Assignee: nijel

> Expose utility methods in Apps.java publically
> --
>
> Key: YARN-1948
> URL: https://issues.apache.org/jira/browse/YARN-1948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Sandy Ryza
>Assignee: nijel
>  Labels: newbie
>
> Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
> MapReduce, Spark, and Tez that are currently marked private.  As these are 
> useful for any YARN app that wants to allow users to augment container 
> environments, it would be helpful to make them public.
> It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3693) Duplicate parameters on service start for NM and RM

2015-05-21 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3693:
---

Assignee: nijel

> Duplicate parameters on service start for NM and RM
> ---
>
> Key: YARN-3693
> URL: https://issues.apache.org/jira/browse/YARN-3693
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: nijel
>Priority: Minor
>
> Steps to reproduce
> =
> 1.Install HA cluster with NM RM
> 2.Check process id for the same
> 3.ps -ef | grep 
> Actual result
> =
> Multiple parameters are duplicate like log file name, Logger type , log 
> directory etc.
> Same is observed in RM process also
> *Please find the logs below*
> {quote}
> dsperf   26076 1  0 12:43 ?00:00:26 
> /opt/dsperf/jdk1.8.0_40//bin/java -Dproc_nodemanager -Xmx1000m 
> -Dhadoop.log.dir=/nodemanager/logs 
> -Dyarn.log.dir=/nodemanager/logs -Dhadoop.log.file=yarn.log 
> -Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str= 
> -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console 
> -Dyarn.policy.file=hadoop-policy.xml -Dlog4j.configuration.watch=true 
> -Dhadoop.log.dir=/nodemanager/logs 
> -Dyarn.log.dir=/nodemanager/logs 
> -Dhadoop.log.file=yarn-dsperf-nodemanager-host-name.log 
> -Dyarn.log.file=yarn-dsperf-nodemanager-host.log -Dyarn.home.dir= 
> -Dyarn.id.str=dsperf {color:red}-Dhadoop.root.logger=INFO,RFA 
> {color}-Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/nodemanager/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -server 
> -Dhadoop.log.dir=/nodemanager/logs 
> -Dyarn.log.dir=/nodemanager/logs 
> -Dhadoop.log.file=yarn-dsperf-nodemanager-host-name.log 
> -Dyarn.log.file=yarn-dsperf-nodemanager-host-name.log 
> -Dyarn.home.dir=/nodemanager -Dhadoop.home.dir=/nodemanager 
> {color:red}-Dhadoop.root.logger=INFO,RFA {color}-Dyarn.root.logger=INFO,RFA 
> -Dlog4j.configuration=log4j.properties 
> -Djava.library.path=/nodemanager/lib/native -classpath XXX 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and namenode is deployed on the same node.

2015-05-13 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541697#comment-14541697
 ] 

nijel commented on YARN-3639:
-

hi [~xinxianyin]
Thanks for reporting this issue.
Can you attach the logs of this issue ? 

> It takes too long time for RM to recover all apps if the original active RM 
> and namenode is deployed on the same node.
> --
>
> Key: YARN-3639
> URL: https://issues.apache.org/jira/browse/YARN-3639
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Xianyin Xin
>
> If the node on which the active RM runs dies and if the active namenode is 
> running on the same node, the new RM will take long time to recover all apps. 
> After analysis, we found the root cause is renewing HDFS tokens in the 
> recovering process. The HDFS client created by the renewer would firstly try 
> to connect to the original namenode, the result of which is time-out after 
> 10~20s, and then the client tries to connect to the new namenode. The entire 
> recovery cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541306#comment-14541306
 ] 

nijel commented on YARN-3614:
-

One possible cause is discussed in YARN-868
Can you try the solution given in this issue.

> FileSystemRMStateStore throw exception when failed to remove application, 
> that cause resourcemanager to crash
> -
>
> Key: YARN-3614
> URL: https://issues.apache.org/jira/browse/YARN-3614
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: lachisis
>Priority: Critical
>
> FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
> When it failed to remove application, I think warning is enough, but now 
> resourcemanager crashed.
> Recently, I configure 
> "yarn.resourcemanager.state-store.max-completed-applications"  to limit 
> applications number in rmstore. when applications number exceed the limit, 
> some old applications will be removed. If failed to remove, resourcemanager 
> will crash.
> The following is log: 
> 2015-05-11 06:58:43,815 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
> info for app: application_1430994493305_0053
> 2015-05-11 06:58:43,815 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
>  Removing info for app: application_1430994493305_0053 at: 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> 2015-05-11 06:58:43,816 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> removing app: application_1430994493305_0053
> java.lang.Exception: Failed to delete 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> 2015-05-11 06:58:43,819 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> java.lang.Exception: Failed to delete 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMach

[jira] [Commented] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541225#comment-14541225
 ] 

nijel commented on YARN-3629:
-

Thanks [~devaraj.k] for the reviewing and committing the patch

> NodeID is always printed as "null" in node manager initialization log.
> --
>
> Key: YARN-3629
> URL: https://issues.apache.org/jira/browse/YARN-3629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Fix For: 2.8.0
>
> Attachments: YARN-3629-1.patch
>
>
> In Node manager log during startup the following logs is printed
> 2015-05-12 11:20:02,347 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
> nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
> virtual-cores=8
> This line is printed from NodeStatusUpdaterImpl.serviceInit.
> But the nodeid assignment is happening only in 
> NodeStatusUpdaterImpl.serviceStart
> {code}
>   protected void serviceStart() throws Exception {
> // NodeManager is the last service to start, so NodeId is available.
> this.nodeId = this.context.getNodeId();
> {code}
> Assigning the node id in serviceinit is not feasible since it is generated by 
>  ContainerManagerImpl.serviceStart.
> The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541223#comment-14541223
 ] 

nijel commented on YARN-3613:
-

Thanks [~kasha] for reviewing and committing the patch

> TestContainerManagerSecurity should init and start Yarn cluster in setup 
> instead of individual methods
> --
>
> Key: YARN-3613
> URL: https://issues.apache.org/jira/browse/YARN-3613
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: nijel
>Priority: Minor
>  Labels: newbie
> Fix For: 2.8.0
>
> Attachments: YARN-3613-1.patch, yarn-3613-2.patch
>
>
> In TestContainerManagerSecurity, individual tests init and start Yarn 
> cluster. This duplication can be avoided by moving that to setup. 
> Further, one could merge the two @Test methods to avoid bringing up another 
> mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539788#comment-14539788
 ] 

nijel commented on YARN-3629:
-

bq.-1   tests included
This is a log message change. So tests are not required

> NodeID is always printed as "null" in node manager initialization log.
> --
>
> Key: YARN-3629
> URL: https://issues.apache.org/jira/browse/YARN-3629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3629-1.patch
>
>
> In Node manager log during startup the following logs is printed
> 2015-05-12 11:20:02,347 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
> nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
> virtual-cores=8
> This line is printed from NodeStatusUpdaterImpl.serviceInit.
> But the nodeid assignment is happening only in 
> NodeStatusUpdaterImpl.serviceStart
> {code}
>   protected void serviceStart() throws Exception {
> // NodeManager is the last service to start, so NodeId is available.
> this.nodeId = this.context.getNodeId();
> {code}
> Assigning the node id in serviceinit is not feasible since it is generated by 
>  ContainerManagerImpl.serviceStart.
> The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539495#comment-14539495
 ] 

nijel commented on YARN-3629:
-

Moving the logs message is bit tricky since it logs some parameters which is 
not available in serviceStart. So keeping this log as it is 
Adding a new log message to print the nodeid for information purpose

Any different thought ? 

> NodeID is always printed as "null" in node manager initialization log.
> --
>
> Key: YARN-3629
> URL: https://issues.apache.org/jira/browse/YARN-3629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3629-1.patch
>
>
> In Node manager log during startup the following logs is printed
> 2015-05-12 11:20:02,347 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
> nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
> virtual-cores=8
> This line is printed from NodeStatusUpdaterImpl.serviceInit.
> But the nodeid assignment is happening only in 
> NodeStatusUpdaterImpl.serviceStart
> {code}
>   protected void serviceStart() throws Exception {
> // NodeManager is the last service to start, so NodeId is available.
> this.nodeId = this.context.getNodeId();
> {code}
> Assigning the node id in serviceinit is not feasible since it is generated by 
>  ContainerManagerImpl.serviceStart.
> The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3629:

Attachment: YARN-3629-1.patch

Please review

> NodeID is always printed as "null" in node manager initialization log.
> --
>
> Key: YARN-3629
> URL: https://issues.apache.org/jira/browse/YARN-3629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-3629-1.patch
>
>
> In Node manager log during startup the following logs is printed
> 2015-05-12 11:20:02,347 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
> nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
> virtual-cores=8
> This line is printed from NodeStatusUpdaterImpl.serviceInit.
> But the nodeid assignment is happening only in 
> NodeStatusUpdaterImpl.serviceStart
> {code}
>   protected void serviceStart() throws Exception {
> // NodeManager is the last service to start, so NodeId is available.
> this.nodeId = this.context.getNodeId();
> {code}
> Assigning the node id in serviceinit is not feasible since it is generated by 
>  ContainerManagerImpl.serviceStart.
> The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3629) NodeID is always printed as "null" in node manager initialization log.

2015-05-11 Thread nijel (JIRA)
nijel created YARN-3629:
---

 Summary: NodeID is always printed as "null" in node manager 
initialization log.
 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


In Node manager log during startup the following logs is printed

2015-05-12 11:20:02,347 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
virtual-cores=8

This line is printed from NodeStatusUpdaterImpl.serviceInit.
But the nodeid assignment is happening only in 
NodeStatusUpdaterImpl.serviceStart
{code}
  protected void serviceStart() throws Exception {

// NodeManager is the last service to start, so NodeId is available.
this.nodeId = this.context.getNodeId();
{code}

Assigning the node id in serviceinit is not feasible since it is generated by  
ContainerManagerImpl.serviceStart.

The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3613:

Attachment: YARN-3613-1.patch

Please review the patch.
Removed 2 unused imports.
Test time reduced from ~130 to ~80 sec

> TestContainerManagerSecurity should init and start Yarn cluster in setup 
> instead of individual methods
> --
>
> Key: YARN-3613
> URL: https://issues.apache.org/jira/browse/YARN-3613
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: nijel
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-3613-1.patch
>
>
> In TestContainerManagerSecurity, individual tests init and start Yarn 
> cluster. This duplication can be avoided by moving that to setup. 
> Further, one could merge the two @Test methods to avoid bringing up another 
> mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537837#comment-14537837
 ] 

nijel commented on YARN-3614:
-

hi @lachisis
bq.when standby resourcemanager try to transitiontoActive, it will cost more 
than ten minutes to load applications
Is this a secure cluster ? 

> FileSystemRMStateStore throw exception when failed to remove application, 
> that cause resourcemanager to crash
> -
>
> Key: YARN-3614
> URL: https://issues.apache.org/jira/browse/YARN-3614
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: lachisis
>Priority: Critical
>
> FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
> When it failed to remove application, I think warning is enough, but now 
> resourcemanager crashed.
> Recently, I configure 
> "yarn.resourcemanager.state-store.max-completed-applications"  to limit 
> applications number in rmstore. when applications number exceed the limit, 
> some old applications will be removed. If failed to remove, resourcemanager 
> will crash.
> The following is log: 
> 2015-05-11 06:58:43,815 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
> info for app: application_1430994493305_0053
> 2015-05-11 06:58:43,815 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
>  Removing info for app: application_1430994493305_0053 at: 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> 2015-05-11 06:58:43,816 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> removing app: application_1430994493305_0053
> java.lang.Exception: Failed to delete 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> 2015-05-11 06:58:43,819 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> java.lang.Exception: Failed to delete 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
>

[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-10 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537514#comment-14537514
 ] 

nijel commented on YARN-3613:
-

i will update the patch.

> TestContainerManagerSecurity should init and start Yarn cluster in setup 
> instead of individual methods
> --
>
> Key: YARN-3613
> URL: https://issues.apache.org/jira/browse/YARN-3613
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: nijel
>Priority: Minor
>  Labels: newbie
>
> In TestContainerManagerSecurity, individual tests init and start Yarn 
> cluster. This duplication can be avoided by moving that to setup. 
> Further, one could merge the two @Test methods to avoid bringing up another 
> mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-10 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3613:
---

Assignee: nijel

> TestContainerManagerSecurity should init and start Yarn cluster in setup 
> instead of individual methods
> --
>
> Key: YARN-3613
> URL: https://issues.apache.org/jira/browse/YARN-3613
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: nijel
>Priority: Minor
>  Labels: newbie
>
> In TestContainerManagerSecurity, individual tests init and start Yarn 
> cluster. This duplication can be avoided by moving that to setup. 
> Further, one could merge the two @Test methods to avoid bringing up another 
> mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-07 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: YARN-3584-2.patch

Thanks [~jianhe]
Updated patch to fix the comment.

> [Log mesage correction] : MIssing space in Diagnostics message
> --
>
> Key: YARN-3584
> URL: https://issues.apache.org/jira/browse/YARN-3584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-3584-1.patch, YARN-3584-2.patch
>
>
> For more detailed output, check application tracking page: 
> https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
>  click on links to logs of each attempt.
> In this Then is not part of thr URL. Better to use a space in between so that 
> the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-07 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-5.patch

Thanks [~vinodkv].
I completely missed this JIRA.

Updated the patch to keep the value as 40.

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, 
> YARN-3018-4.patch, YARN-3018-5.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-06 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530699#comment-14530699
 ] 

nijel commented on YARN-3018:
-

The test failure can be solved by changing the following lines in 
TestLeafQueue.testLocalityConstraints() 

-verify(app_0,never()).allocate(eq(NodeType.RACK_LOCAL), eq(node_1_1),  - 
line number 2394
+verify(app_0, never()).allocate(eq(NodeType.NODE_LOCAL), eq(node_1_1),  
 any(Priority.class), any(ResourceRequest.class), any(Container.class));
 assertEquals(0, app_0.getSchedulingOpportunities(priority)); 
-assertEquals(1, app_0.getTotalRequiredResources(priority));- line 
number 2397
+assertEquals(0, app_0.getTotalRequiredResources(priority));

But i am not sure about the impact. Can any one help me in this ? 

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, 
> YARN-3018-4.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-05-06 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.4.patch

Thanks [~kasha] for the comments
 bq.tearDown need not explicitly stop the scheduler. Stopping the RM should 
take care of the scheduler as well.
Done
bq.testNotUserAsDefaultQueue and testDontAllowUndeclaredPools need not stop the 
RM and re-instantiate it. We could just call scheduler.reinitialize
I tried this. As per my analysis the reinitialize call will not consider the 
conf object.  
{code}
FairScheduler.java

  @Override
  public void reinitialize(Configuration conf, RMContext rmContext)
  throws IOException {
try {
  allocsLoader.reloadAllocations();
} catch (Exception e) {
  LOG.error("Failed to reload allocations file", e);
}
  }
Here conf is not used.
{code}
bq.testMoveRunnableApp - Remove commented out scheduler.init and start
Done

> FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
> to TestAppRunnability
> ---
>
> Key: YARN-3271
> URL: https://issues.apache.org/jira/browse/YARN-3271
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: nijel
>  Labels: BB2015-05-TBR
> Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch, 
> YARN-3271.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-06 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-4.patch

Thanks [~jianhe] for the comment
Agree with you. Updated the patch with the changes

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, 
> YARN-3018-4.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-06 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530149#comment-14530149
 ] 

nijel commented on YARN-3584:
-

Test failures are not related to this patch
Checkstyle is showing wrong warning i think. The lines starting at indent 10.

> [Log mesage correction] : MIssing space in Diagnostics message
> --
>
> Key: YARN-3584
> URL: https://issues.apache.org/jira/browse/YARN-3584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3584-1.patch
>
>
> For more detailed output, check application tracking page: 
> https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
>  click on links to logs of each attempt.
> In this Then is not part of thr URL. Better to use a space in between so that 
> the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: (was: YARN-3584-1.patch)

> [Log mesage correction] : MIssing space in Diagnostics message
> --
>
> Key: YARN-3584
> URL: https://issues.apache.org/jira/browse/YARN-3584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3584-1.patch
>
>
> For more detailed output, check application tracking page: 
> https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
>  click on links to logs of each attempt.
> In this Then is not part of thr URL. Better to use a space in between so that 
> the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: YARN-3584-1.patch

updated patch

> [Log mesage correction] : MIssing space in Diagnostics message
> --
>
> Key: YARN-3584
> URL: https://issues.apache.org/jira/browse/YARN-3584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3584-1.patch
>
>
> For more detailed output, check application tracking page: 
> https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
>  click on links to logs of each attempt.
> In this Then is not part of thr URL. Better to use a space in between so that 
> the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: YARN-3584-1.patch

Attached the change. Please review

> [Log mesage correction] : MIssing space in Diagnostics message
> --
>
> Key: YARN-3584
> URL: https://issues.apache.org/jira/browse/YARN-3584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3584-1.patch
>
>
> For more detailed output, check application tracking page: 
> https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
>  click on links to logs of each attempt.
> In this Then is not part of thr URL. Better to use a space in between so that 
> the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)
nijel created YARN-3584:
---

 Summary: [Log mesage correction] : MIssing space in Diagnostics 
message
 Key: YARN-3584
 URL: https://issues.apache.org/jira/browse/YARN-3584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial



For more detailed output, check application tracking page: 
https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
 click on links to logs of each attempt.


In this Then is not part of thr URL. Better to use a space in between so that 
the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-3.patch

Re trigger the CIS. Patch was wrongly generated
sorry for the noise

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >