[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-09-24 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776113#comment-13776113
 ] 

nijel commented on YARN-90:
---

To handle this we can check the failed dirs first in 
DirectoryCollection.checkDirs() and add back to localDirs if the directories 
are recovered from error.


 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi

 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-09-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-90:
--

Attachment: YARN-90.1.patch

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
 Attachments: YARN-90.1.patch, YARN-90.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-01-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3018:
---

Assignee: nijel

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial

 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-01-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269058#comment-14269058
 ] 

nijel commented on YARN-3018:
-

Please give your opinion.
I prefer to have the value as -1 in file also

If it sounds good, i can upload a patch

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Priority: Trivial

 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-01-08 Thread nijel (JIRA)
nijel created YARN-3018:
---

 Summary: Unify the default value for 
yarn.scheduler.capacity.node-locality-delay in code and default xml file
 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Priority: Trivial


For the configuration item yarn.scheduler.capacity.node-locality-delay the 
default value given in code is -1

public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;

In the default capacity-scheduler.xml file in the resource manager config 
directory it is 40.

Can it be unified to avoid confusion when the user creates the file without 
this configuration. IF he expects the values in the file to be default values, 
then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-03-04 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346588#comment-14346588
 ] 

nijel commented on YARN-3271:
-

I would like to work no this task
As per initial analysis the following test cases using the concept of runnable 
apps.

testUserAsDefaultQueue
testNotUserAsDefaultQueue
testAppAdditionAndRemoval
testPreemptionVariablesForQueueCreatedRuntime
testDontAllowUndeclaredPools
testMoveRunnableApp
testMoveNonRunnableApp
testMoveMakesAppRunnable

Can i move these tests to the new class ? Correct me if i misunderstood the task

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-03-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3271:
---

Assignee: nijel

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.1.patch

Attaching the patch
kindly review

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-30 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.3.patch

Updated patch for whitespace fix and test fix

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-1.patch

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3018-1.patch


 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526307#comment-14526307
 ] 

nijel commented on YARN-3018:
-

Thanks [~leftnoteasy]
Uploaded the patch

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3018-1.patch


 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-2.patch

Updated the patch to remove the white spaces

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3018-1.patch, YARN-3018-2.patch


 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-3.patch

Re trigger the CIS. Patch was wrongly generated
sorry for the noise

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch


 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.2.patch

thanks [~kasha] for the comments
Updated patch as per your comment.
Please review


 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch, YARN-3271.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)
nijel created YARN-3584:
---

 Summary: [Log mesage correction] : MIssing space in Diagnostics 
message
 Key: YARN-3584
 URL: https://issues.apache.org/jira/browse/YARN-3584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial



For more detailed output, check application tracking page: 
https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
 click on links to logs of each attempt.


In this Then is not part of thr URL. Better to use a space in between so that 
the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-06 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530149#comment-14530149
 ] 

nijel commented on YARN-3584:
-

Test failures are not related to this patch
Checkstyle is showing wrong warning i think. The lines starting at indent 10.

 [Log mesage correction] : MIssing space in Diagnostics message
 --

 Key: YARN-3584
 URL: https://issues.apache.org/jira/browse/YARN-3584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3584-1.patch


 For more detailed output, check application tracking page: 
 https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
  click on links to logs of each attempt.
 In this Then is not part of thr URL. Better to use a space in between so that 
 the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: YARN-3584-1.patch

Attached the change. Please review

 [Log mesage correction] : MIssing space in Diagnostics message
 --

 Key: YARN-3584
 URL: https://issues.apache.org/jira/browse/YARN-3584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3584-1.patch


 For more detailed output, check application tracking page: 
 https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
  click on links to logs of each attempt.
 In this Then is not part of thr URL. Better to use a space in between so that 
 the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: YARN-3584-1.patch

updated patch

 [Log mesage correction] : MIssing space in Diagnostics message
 --

 Key: YARN-3584
 URL: https://issues.apache.org/jira/browse/YARN-3584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3584-1.patch


 For more detailed output, check application tracking page: 
 https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
  click on links to logs of each attempt.
 In this Then is not part of thr URL. Better to use a space in between so that 
 the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: (was: YARN-3584-1.patch)

 [Log mesage correction] : MIssing space in Diagnostics message
 --

 Key: YARN-3584
 URL: https://issues.apache.org/jira/browse/YARN-3584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3584-1.patch


 For more detailed output, check application tracking page: 
 https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
  click on links to logs of each attempt.
 In this Then is not part of thr URL. Better to use a space in between so that 
 the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-06 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-4.patch

Thanks [~jianhe] for the comment
Agree with you. Updated the patch with the changes

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, 
 YARN-3018-4.patch


 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-05-06 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.4.patch

Thanks [~kasha] for the comments
 bq.tearDown need not explicitly stop the scheduler. Stopping the RM should 
take care of the scheduler as well.
Done
bq.testNotUserAsDefaultQueue and testDontAllowUndeclaredPools need not stop the 
RM and re-instantiate it. We could just call scheduler.reinitialize
I tried this. As per my analysis the reinitialize call will not consider the 
conf object.  
{code}
FairScheduler.java

  @Override
  public void reinitialize(Configuration conf, RMContext rmContext)
  throws IOException {
try {
  allocsLoader.reloadAllocations();
} catch (Exception e) {
  LOG.error(Failed to reload allocations file, e);
}
  }
Here conf is not used.
{code}
bq.testMoveRunnableApp - Remove commented out scheduler.init and start
Done

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
  Labels: BB2015-05-TBR
 Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch, 
 YARN-3271.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-07 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-5.patch

Thanks [~vinodkv].
I completely missed this JIRA.

Updated the patch to keep the value as 40.

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, 
 YARN-3018-4.patch, YARN-3018-5.patch


 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message

2015-05-07 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3584:

Attachment: YARN-3584-2.patch

Thanks [~jianhe]
Updated patch to fix the comment.

 [Log mesage correction] : MIssing space in Diagnostics message
 --

 Key: YARN-3584
 URL: https://issues.apache.org/jira/browse/YARN-3584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
  Labels: newbie
 Attachments: YARN-3584-1.patch, YARN-3584-2.patch


 For more detailed output, check application tracking page: 
 https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color},
  click on links to logs of each attempt.
 In this Then is not part of thr URL. Better to use a space in between so that 
 the URL can be copied directly for analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-10 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537514#comment-14537514
 ] 

nijel commented on YARN-3613:
-

i will update the patch.

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie

 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-10 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3613:
---

Assignee: nijel

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie

 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537837#comment-14537837
 ] 

nijel commented on YARN-3614:
-

hi @lachisis
bq.when standby resourcemanager try to transitiontoActive, it will cost more 
than ten minutes to load applications
Is this a secure cluster ? 

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 

[jira] [Updated] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3613:

Attachment: YARN-3613-1.patch

Please review the patch.
Removed 2 unused imports.
Test time reduced from ~130 to ~80 sec

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie
 Attachments: YARN-3613-1.patch


 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3629) NodeID is always printed as null in node manager initialization log.

2015-05-12 Thread nijel (JIRA)
nijel created YARN-3629:
---

 Summary: NodeID is always printed as null in node manager 
initialization log.
 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


In Node manager log during startup the following logs is printed

2015-05-12 11:20:02,347 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
virtual-cores=8

This line is printed from NodeStatusUpdaterImpl.serviceInit.
But the nodeid assignment is happening only in 
NodeStatusUpdaterImpl.serviceStart
{code}
  protected void serviceStart() throws Exception {

// NodeManager is the last service to start, so NodeId is available.
this.nodeId = this.context.getNodeId();
{code}

Assigning the node id in serviceinit is not feasible since it is generated by  
ContainerManagerImpl.serviceStart.

The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-06 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530699#comment-14530699
 ] 

nijel commented on YARN-3018:
-

The test failure can be solved by changing the following lines in 
TestLeafQueue.testLocalityConstraints() 

-verify(app_0,never()).allocate(eq(NodeType.RACK_LOCAL), eq(node_1_1),  - 
line number 2394
+verify(app_0, never()).allocate(eq(NodeType.NODE_LOCAL), eq(node_1_1),  
 any(Priority.class), any(ResourceRequest.class), any(Container.class));
 assertEquals(0, app_0.getSchedulingOpportunities(priority)); 
-assertEquals(1, app_0.getTotalRequiredResources(priority));- line 
number 2397
+assertEquals(0, app_0.getTotalRequiredResources(priority));

But i am not sure about the impact. Can any one help me in this ? 

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch, 
 YARN-3018-4.patch


 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539495#comment-14539495
 ] 

nijel commented on YARN-3629:
-

Moving the logs message is bit tricky since it logs some parameters which is 
not available in serviceStart. So keeping this log as it is 
Adding a new log message to print the nodeid for information purpose

Any different thought ? 

 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3629) NodeID is always printed as null in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3629:

Attachment: YARN-3629-1.patch

Please review

 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and namenode is deployed on the same node.

2015-05-13 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541697#comment-14541697
 ] 

nijel commented on YARN-3639:
-

hi [~xinxianyin]
Thanks for reporting this issue.
Can you attach the logs of this issue ? 

 It takes too long time for RM to recover all apps if the original active RM 
 and namenode is deployed on the same node.
 --

 Key: YARN-3639
 URL: https://issues.apache.org/jira/browse/YARN-3639
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Xianyin Xin

 If the node on which the active RM runs dies and if the active namenode is 
 running on the same node, the new RM will take long time to recover all apps. 
 After analysis, we found the root cause is renewing HDFS tokens in the 
 recovering process. The HDFS client created by the renewer would firstly try 
 to connect to the original namenode, the result of which is time-out after 
 10~20s, and then the client tries to connect to the new namenode. The entire 
 recovery cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541225#comment-14541225
 ] 

nijel commented on YARN-3629:
-

Thanks [~devaraj.k] for the reviewing and committing the patch

 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Fix For: 2.8.0

 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541223#comment-14541223
 ] 

nijel commented on YARN-3613:
-

Thanks [~kasha] for reviewing and committing the patch

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3613-1.patch, yarn-3613-2.patch


 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539788#comment-14539788
 ] 

nijel commented on YARN-3629:
-

bq.-1   tests included
This is a log message change. So tests are not required

 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-12 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541306#comment-14541306
 ] 

nijel commented on YARN-3614:
-

One possible cause is discussed in YARN-868
Can you try the solution given in this issue.

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 

[jira] [Assigned] (YARN-3693) Duplicate parameters on service start for NM and RM

2015-05-21 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3693:
---

Assignee: nijel

 Duplicate parameters on service start for NM and RM
 ---

 Key: YARN-3693
 URL: https://issues.apache.org/jira/browse/YARN-3693
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: nijel
Priority: Minor

 Steps to reproduce
 =
 1.Install HA cluster with NM RM
 2.Check process id for the same
 3.ps -ef | grep pid nm
 Actual result
 =
 Multiple parameters are duplicate like log file name, Logger type , log 
 directory etc.
 Same is observed in RM process also
 *Please find the logs below*
 {quote}
 dsperf   26076 1  0 12:43 ?00:00:26 
 /opt/dsperf/jdk1.8.0_40//bin/java -Dproc_nodemanager -Xmx1000m 
 -Dhadoop.log.dir=install/nodemanager/logs 
 -Dyarn.log.dir=install/nodemanager/logs -Dhadoop.log.file=yarn.log 
 -Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str= 
 -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console 
 -Dyarn.policy.file=hadoop-policy.xml -Dlog4j.configuration.watch=true 
 -Dhadoop.log.dir=install/nodemanager/logs 
 -Dyarn.log.dir=install/nodemanager/logs 
 -Dhadoop.log.file=yarn-dsperf-nodemanager-host-name.log 
 -Dyarn.log.file=yarn-dsperf-nodemanager-host.log -Dyarn.home.dir= 
 -Dyarn.id.str=dsperf {color:red}-Dhadoop.root.logger=INFO,RFA 
 {color}-Dyarn.root.logger=INFO,RFA 
 -Djava.library.path=install/nodemanager/lib/native 
 -Dyarn.policy.file=hadoop-policy.xml -server 
 -Dhadoop.log.dir=install/nodemanager/logs 
 -Dyarn.log.dir=install/nodemanager/logs 
 -Dhadoop.log.file=yarn-dsperf-nodemanager-host-name.log 
 -Dyarn.log.file=yarn-dsperf-nodemanager-host-name.log 
 -Dyarn.home.dir=install/nodemanager -Dhadoop.home.dir=install/nodemanager 
 {color:red}-Dhadoop.root.logger=INFO,RFA {color}-Dyarn.root.logger=INFO,RFA 
 -Dlog4j.configuration=log4j.properties 
 -Djava.library.path=install/nodemanager/lib/native -classpath XXX 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3771) final behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-06-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel moved HDFS-8526 to YARN-3771:
---

Key: YARN-3771  (was: HDFS-8526)
Project: Hadoop YARN  (was: Hadoop HDFS)

 final behavior is not honored for 
 YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
 

 Key: YARN-3771
 URL: https://issues.apache.org/jira/browse/YARN-3771
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel

 i was going through some find bugs rules. One issue reported in that is 
  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
 and 
   public static final String[] 
 DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
 is not honoring the final qualifier. The string array contents can be re 
 assigned !
 Simple test
 {code}
 public class TestClass {
   static final String[] t = { 1, 2 };
   public static void main(String[] args) {
 System.out.println(12  10);
 String[] t1={u};
 //t = t1; // this will show compilation  error
 t (1) = t1 (1) ; // But this works
   }
 }
 {code}
 One option is to use Collections.unmodifiableList
 any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3771) final behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-06-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3771:

Attachment: 0001-YARN-3771.patch

Attached the patch. Please review

 final behavior is not honored for 
 YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
 

 Key: YARN-3771
 URL: https://issues.apache.org/jira/browse/YARN-3771
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: 0001-YARN-3771.patch


 i was going through some find bugs rules. One issue reported in that is 
  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
 and 
   public static final String[] 
 DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
 is not honoring the final qualifier. The string array contents can be re 
 assigned !
 Simple test
 {code}
 public class TestClass {
   static final String[] t = { 1, 2 };
   public static void main(String[] args) {
 System.out.println(12  10);
 String[] t1={u};
 //t = t1; // this will show compilation  error
 t (1) = t1 (1) ; // But this works
   }
 }
 {code}
 One option is to use Collections.unmodifiableList
 any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_2.patch

Thanks [~xgong] for the comment.
Updated the patch
Please review

 AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
 

 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3830_1.patch, YARN-3830_2.patch


 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
 {code}
 protected void createReleaseCache() {
 // Cleanup the cache after nm expire interval.
 new Timer().schedule(new TimerTask() {
   @Override
   public void run() {
 for (SchedulerApplicationT app : applications.values()) {
   T attempt = app.getCurrentAppAttempt();
   synchronized (attempt) {
 for (ContainerId containerId : attempt.getPendingRelease()) {
   RMAuditLogger.logFailure(
 {code}
 Here the attempt can be null since the attempt is created later. So null 
 pointer exception  will come
 {code}
 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
 threw an Exception. | YarnUncaughtExceptionHandler.java:68
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {code}
 This will skip the other applications in this run.
 Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_3.patch

Sorry for the small mistake
Line limit is corrected

Test fail is not related to this patch. Verified locally. It is passing

 AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
 

 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch


 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
 {code}
 protected void createReleaseCache() {
 // Cleanup the cache after nm expire interval.
 new Timer().schedule(new TimerTask() {
   @Override
   public void run() {
 for (SchedulerApplicationT app : applications.values()) {
   T attempt = app.getCurrentAppAttempt();
   synchronized (attempt) {
 for (ContainerId containerId : attempt.getPendingRelease()) {
   RMAuditLogger.logFailure(
 {code}
 Here the attempt can be null since the attempt is created later. So null 
 pointer exception  will come
 {code}
 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
 threw an Exception. | YarnUncaughtExceptionHandler.java:68
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {code}
 This will skip the other applications in this run.
 Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-19 Thread nijel (JIRA)
nijel created YARN-3830:
---

 Summary: AbstractYarnScheduler.createReleaseCache may try to clean 
a null attempt
 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
{code}
protected void createReleaseCache() {
// Cleanup the cache after nm expire interval.
new Timer().schedule(new TimerTask() {
  @Override
  public void run() {
for (SchedulerApplicationT app : applications.values()) {

  T attempt = app.getCurrentAppAttempt();
  synchronized (attempt) {
for (ContainerId containerId : attempt.getPendingRelease()) {
  RMAuditLogger.logFailure(
{code}

Here the attempt can be null since the attempt is created later. So null 
pointer exception  will come
{code}
2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw 
an Exception. | YarnUncaughtExceptionHandler.java:68
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
{code}

This will skip the other applications in this run.
Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-19 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_1.patch

Updated the patch.
Please review

 AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
 

 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3830_1.patch


 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
 {code}
 protected void createReleaseCache() {
 // Cleanup the cache after nm expire interval.
 new Timer().schedule(new TimerTask() {
   @Override
   public void run() {
 for (SchedulerApplicationT app : applications.values()) {
   T attempt = app.getCurrentAppAttempt();
   synchronized (attempt) {
 for (ContainerId containerId : attempt.getPendingRelease()) {
   RMAuditLogger.logFailure(
 {code}
 Here the attempt can be null since the attempt is created later. So null 
 pointer exception  will come
 {code}
 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
 threw an Exception. | YarnUncaughtExceptionHandler.java:68
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {code}
 This will skip the other applications in this run.
 Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1948) Expose utility methods in Apps.java publically

2015-06-16 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-1948:

Attachment: YARN-1948-1.patch

Attached the file with modification
Please review

 Expose utility methods in Apps.java publically
 --

 Key: YARN-1948
 URL: https://issues.apache.org/jira/browse/YARN-1948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: nijel
  Labels: newbie
 Attachments: YARN-1948-1.patch


 Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
 MapReduce, Spark, and Tez that are currently marked private.  As these are 
 useful for any YARN app that wants to allow users to augment container 
 environments, it would be helpful to make them public.
 It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3813) Support Application timeout feature in YARN.

2015-06-17 Thread nijel (JIRA)
nijel created YARN-3813:
---

 Summary: Support Application timeout feature in YARN. 
 Key: YARN-3813
 URL: https://issues.apache.org/jira/browse/YARN-3813
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: nijel
 Fix For: 2.8.0


It will be useful to support Application Timeout in YARN. Some use cases are 
not worried about the output of the applications if the application is not 
completed in a specific time. 

*Background:*
The requirement is to show the CDR statistics of last few  minutes, say for 
every 5 minutes. The same Job will run continuously with different dataset.
So one job will be started in every 5 minutes. The estimate time for this task 
is 2 minutes or lesser time. 
If the application is not completing in the given time the output is not useful.

*Proposal*
So idea is to support application timeout, with which timeout parameter is 
given while submitting the job. 
Here, user is expecting to finish (complete or kill) the application in the 
given time.


One option for us is to move this logic to Application client (who submit the 
job). 
But it will be nice if it can be generic logic and can make more robust.

Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1948) Expose utility methods in Apps.java publically

2015-06-17 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589782#comment-14589782
 ] 

nijel commented on YARN-1948:
-

thanks [~vinodkv] for the comment
I am thinking of changing both method names as *updateEnv*. Not getting any 
better name :(
public static void updateEnv(

Another option is to leave the env related stuff and name it from map 
perspective since the env is represented as map in this function.

Any thoughts ? 

 Expose utility methods in Apps.java publically
 --

 Key: YARN-1948
 URL: https://issues.apache.org/jira/browse/YARN-1948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: nijel
  Labels: newbie
 Attachments: YARN-1948-1.patch


 Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
 MapReduce, Spark, and Tez that are currently marked private.  As these are 
 useful for any YARN app that wants to allow users to augment container 
 environments, it would be helpful to make them public.
 It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2953) TestWorkPreservingRMRestart fails on trunk

2015-07-01 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-2953:
---

Assignee: nijel

 TestWorkPreservingRMRestart fails on trunk
 --

 Key: YARN-2953
 URL: https://issues.apache.org/jira/browse/YARN-2953
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: nijel

 Running 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 30.031 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2953) TestWorkPreservingRMRestart fails on trunk

2015-07-01 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609771#comment-14609771
 ] 

nijel commented on YARN-2953:
-

Hi [~rohithsharma]
This test cases is passing in recent code and i see the time out is increased ( 
@Test (timeout = 5)). This happened on the following check-in
{code}
Revision: 5f57b904f550515693d93a2959e663b0d0260696
Author: Jian He jia...@apache.org
Date: 31-12-2014 05:05:45
Message:
YARN-2492. Added node-labels page on RM web UI. Contributed by Wangda Tan
{code}
Can you please validate this issue ? 


 TestWorkPreservingRMRestart fails on trunk
 --

 Key: YARN-2953
 URL: https://issues.apache.org/jira/browse/YARN-2953
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S

 Running 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 30.031 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-07-01 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3830:

Attachment: YARN-3830_4.patch

Thanks [~devaraj.k] for the suggestion
Updated patch with test case
Please review

 AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
 

 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, 
 YARN-3830_4.patch


 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
 {code}
 protected void createReleaseCache() {
 // Cleanup the cache after nm expire interval.
 new Timer().schedule(new TimerTask() {
   @Override
   public void run() {
 for (SchedulerApplicationT app : applications.values()) {
   T attempt = app.getCurrentAppAttempt();
   synchronized (attempt) {
 for (ContainerId containerId : attempt.getPendingRelease()) {
   RMAuditLogger.logFailure(
 {code}
 Here the attempt can be null since the attempt is created later. So null 
 pointer exception  will come
 {code}
 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
 threw an Exception. | YarnUncaughtExceptionHandler.java:68
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {code}
 This will skip the other applications in this run.
 Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3869) Add app name to RM audit log

2015-07-01 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609812#comment-14609812
 ] 

nijel commented on YARN-3869:
-

hi [~roji]
i would like to work on this improvement.
Please let me know is you already started the work

 Add app name to RM audit log
 

 Key: YARN-3869
 URL: https://issues.apache.org/jira/browse/YARN-3869
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shay Rojansky
Priority: Minor

 The YARN resource manager audit log currently includes useful info such as 
 APPID, USER, etc. One crucial piece of information missing is the 
 user-supplied application name.
 Users are familiar with their application name as shown in the YARN UI, etc. 
 It's vital for something like logstash to be able to associated logs with the 
 application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3869) Add app name to RM audit log

2015-07-01 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3869:
---

Assignee: nijel

 Add app name to RM audit log
 

 Key: YARN-3869
 URL: https://issues.apache.org/jira/browse/YARN-3869
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shay Rojansky
Assignee: nijel
Priority: Minor

 The YARN resource manager audit log currently includes useful info such as 
 APPID, USER, etc. One crucial piece of information missing is the 
 user-supplied application name.
 Users are familiar with their application name as shown in the YARN UI, etc. 
 It's vital for something like logstash to be able to associate logs with the 
 application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-07-01 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611374#comment-14611374
 ] 

nijel commented on YARN-3830:
-

Thanks [~devaraj.k] for the review and committing the patch.

 AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
 

 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: nijel
Assignee: nijel
 Fix For: 2.8.0

 Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, 
 YARN-3830_4.patch


 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
 {code}
 protected void createReleaseCache() {
 // Cleanup the cache after nm expire interval.
 new Timer().schedule(new TimerTask() {
   @Override
   public void run() {
 for (SchedulerApplicationT app : applications.values()) {
   T attempt = app.getCurrentAppAttempt();
   synchronized (attempt) {
 for (ContainerId containerId : attempt.getPendingRelease()) {
   RMAuditLogger.logFailure(
 {code}
 Here the attempt can be null since the attempt is created later. So null 
 pointer exception  will come
 {code}
 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
 threw an Exception. | YarnUncaughtExceptionHandler.java:68
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {code}
 This will skip the other applications in this run.
 Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt

2015-06-30 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608093#comment-14608093
 ] 

nijel commented on YARN-3830:
-

Thanks [~devaraj.k] for the review
Test case looks bit tricky :)
i will update the patch soon.

 AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
 

 Key: YARN-3830
 URL: https://issues.apache.org/jira/browse/YARN-3830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch


 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache()
 {code}
 protected void createReleaseCache() {
 // Cleanup the cache after nm expire interval.
 new Timer().schedule(new TimerTask() {
   @Override
   public void run() {
 for (SchedulerApplicationT app : applications.values()) {
   T attempt = app.getCurrentAppAttempt();
   synchronized (attempt) {
 for (ContainerId containerId : attempt.getPendingRelease()) {
   RMAuditLogger.logFailure(
 {code}
 Here the attempt can be null since the attempt is created later. So null 
 pointer exception  will come
 {code}
 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] 
 threw an Exception. | YarnUncaughtExceptionHandler.java:68
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {code}
 This will skip the other applications in this run.
 Can add a null check and continue with other applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3796) Support User level Quota for space and Name (count)

2015-06-11 Thread nijel (JIRA)
nijel created YARN-3796:
---

 Summary: Support User level Quota for space and Name (count)
 Key: YARN-3796
 URL: https://issues.apache.org/jira/browse/YARN-3796
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: nijel
Assignee: nijel


I would like to have one feature in HDFS to have quota management at user 
level. 

Background :
When the customer uses a multi tenant solution it will have many Hadoop eco 
system components like HIVE, HBASE, yarn etc. The base folder of these 
components are different like /hive - Hive , /hbase -HBase. 
Now if a user creates some file  or table these will be under the folder 
specific to component. If the user name is taken into account it looks like
{code}
/hive/user1/table1
/hive/user2/table1
/hbase/user1/Htable1
/hbase/user2/Htable1
 
Same for yarn/map-reduce data and logs
{code}
 
In this case restricting the user to use a certain amount of disk/file is very 
difficult since the current quota management is at folder level.
 
Requirement: User level Quota for space and Name (count). Say user1 can have 
100G irrespective of the folder or location used.
 
Here the idea to consider the file owner ad the key and attribute the quota to 
it.  So the current quota system can have a initial check for the user quota if 
defined, before validating the folder quota.

Note:
This need a change in fsimage to store the user and quota information


Please have a look on this scenario. If it sounds good, i will create the tasks 
and the update the design and prototype.

Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1948) Expose utility methods in Apps.java publically

2015-05-21 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-1948:
---

Assignee: nijel

 Expose utility methods in Apps.java publically
 --

 Key: YARN-1948
 URL: https://issues.apache.org/jira/browse/YARN-1948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: nijel
  Labels: newbie

 Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
 MapReduce, Spark, and Tez that are currently marked private.  As these are 
 useful for any YARN app that wants to allow users to augment container 
 environments, it would be helpful to make them public.
 It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3869) Add app name to RM audit log

2015-07-07 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616363#comment-14616363
 ] 

nijel commented on YARN-3869:
-

In Web it is shown as truncate.
But if the names are similar, will it serve the purpose ? 

Let us wait for few other opinion as well :) 

 Add app name to RM audit log
 

 Key: YARN-3869
 URL: https://issues.apache.org/jira/browse/YARN-3869
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shay Rojansky
Assignee: nijel
Priority: Minor

 The YARN resource manager audit log currently includes useful info such as 
 APPID, USER, etc. One crucial piece of information missing is the 
 user-supplied application name.
 Users are familiar with their application name as shown in the YARN UI, etc. 
 It's vital for something like logstash to be able to associate logs with the 
 application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3869) Add app name to RM audit log

2015-07-07 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616282#comment-14616282
 ] 

nijel commented on YARN-3869:
-

hi
i started working on this. One observation is in some cases the application 
name will not be in readable format.
Like in HIVE, the name will the complete query string. In this case it will not 
be good to print this information in logs !
Any thoughts ? 

 Add app name to RM audit log
 

 Key: YARN-3869
 URL: https://issues.apache.org/jira/browse/YARN-3869
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shay Rojansky
Assignee: nijel
Priority: Minor

 The YARN resource manager audit log currently includes useful info such as 
 APPID, USER, etc. One crucial piece of information missing is the 
 user-supplied application name.
 Users are familiar with their application name as shown in the YARN UI, etc. 
 It's vital for something like logstash to be able to associate logs with the 
 application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4303) Confusing help message if AM logs cant be retrieved via yarn logs command

2015-10-27 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4303:
---

Assignee: nijel

> Confusing help message if AM logs cant be retrieved via yarn logs command
> -
>
> Key: YARN-4303
> URL: https://issues.apache.org/jira/browse/YARN-4303
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
>Priority: Minor
>
> {noformat}
> yarn@BLR102525:~/test/install/hadoop/resourcemanager/bin> ./yarn logs 
> --applicationId application_1445832014581_0028 -am ALL
> Can not get AMContainers logs for the 
> application:application_1445832014581_0028
> This application:application_1445832014581_0028 is finished. Please enable 
> the application history service. Or Using yarn logs -applicationId  
> -containerId  --nodeAddress  to get the 
> container logs
> {noformat}
> Part of the command output mentioned above indicates that using {{yarn logs 
> -applicationId  -containerId  --nodeAddress 
> }} will fetch desired result. It asks you to specify 
> nodeHttpAddress which makes it sound like we have to connect to nodemanager's 
> webapp address.
> This help message should be changed to include command as {{yarn logs 
> -applicationId  -containerId  --nodeAddress  Address>}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-28 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978231#comment-14978231
 ] 

nijel commented on YARN-4246:
-

thanks [~varun_saxena] and [~rohithsharma] for the review and commit.

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Fix For: 2.8.0
>
> Attachments: YARN-4246_1.patch, YARN-4246_2.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-11-05 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991374#comment-14991374
 ] 

nijel commented on YARN-2934:
-

thanks [~Naganarasimha] for the patch

Few minor comments/doubts

1.
{code}
FileStatus[] listStatus =
   fileSystem.listStatus(containerLogDir, new PathFilter() {
 @Override
 public boolean accept(Path path) {
   return FilenameUtils.wildcardMatch(path.getName(),
   errorFileNamePattern, IOCase.INSENSITIVE);
 }
   });
{code}
What if this give multiple error files ? 


2. 
{code}
 } catch (IOException e) {
LOG.warn("Failed while trying to read container's error log", e);
  }
{code}
Can this be error log ? I think there should not be any exception in reading 
the file. If there is an error, better to log error log



> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4246) NPE while listing app attempt

2015-10-13 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4246:

Attachment: YARN-4246_2.patch

Thanks [~steve_l] for pointing out the mistake.
Updated the patch with the comment fix.

thanks

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Attachments: YARN-4246_1.patch, YARN-4246_2.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-13 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955054#comment-14955054
 ] 

nijel commented on YARN-4246:
-

bq. -1  yarn tests

as per my analysis test fails are not related to this patch.
Please review

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Attachments: YARN-4246_1.patch, YARN-4246_2.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4246) NPE while listing app attempt

2015-10-10 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4246:

Attachment: YARN-4246_1.patch

patch.

Please review

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Attachments: YARN-4246_1.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4249) Many options in "yarn application" command is not documented

2015-10-10 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4249:

Summary: Many options in "yarn application" command is not documented  
(was: Many options in "yarn application" command is not documents)

> Many options in "yarn application" command is not documented
> 
>
> Key: YARN-4249
> URL: https://issues.apache.org/jira/browse/YARN-4249
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>
> in document only few options are specified.
> {code}
> Usage: `yarn application [options] `
> | COMMAND\_OPTIONS | Description |
> |: |: |
> | -appStates \ | Works with -list to filter applications based on 
> input comma-separated list of application states. The valid application state 
> can be one of the following:  ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, 
> RUNNING, FINISHED, FAILED, KILLED |
> | -appTypes \ | Works with -list to filter applications based on 
> input comma-separated list of application types. |
> | -list | Lists applications from the RM. Supports optional use of -appTypes 
> to filter applications based on application type, and -appStates to filter 
> applications based on application state. |
> | -kill \ | Kills the application. |
> | -status \ | Prints the status of the application. |
> {code}
> some options are missing like
> -appId  Specify Application Id to be operated
> -help   Displays help for all commands.
> -movetoqueueMoves the application to a different queue.
> -queue  Works with the movetoqueue command to specify 
> which queue to move an application to.
> -updatePriority   update priority of an 
> application.ApplicationId can be passed using 'appId' option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4249) Many options in "yarn application" command is not documents

2015-10-10 Thread nijel (JIRA)
nijel created YARN-4249:
---

 Summary: Many options in "yarn application" command is not 
documents
 Key: YARN-4249
 URL: https://issues.apache.org/jira/browse/YARN-4249
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


in document only few options are specified.
{code}
Usage: `yarn application [options] `

| COMMAND\_OPTIONS | Description |
|: |: |
| -appStates \ | Works with -list to filter applications based on 
input comma-separated list of application states. The valid application state 
can be one of the following:  ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, 
RUNNING, FINISHED, FAILED, KILLED |
| -appTypes \ | Works with -list to filter applications based on input 
comma-separated list of application types. |
| -list | Lists applications from the RM. Supports optional use of -appTypes to 
filter applications based on application type, and -appStates to filter 
applications based on application state. |
| -kill \ | Kills the application. |
| -status \ | Prints the status of the application. |
{code}


some options are missing like
-appId  Specify Application Id to be operated
-help   Displays help for all commands.
-movetoqueueMoves the application to a different queue.
-queue  Works with the movetoqueue command to specify 
which queue to move an application to.
-updatePriority   update priority of an application.ApplicationId 
can be passed using 'appId' option.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-09 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950215#comment-14950215
 ] 

nijel commented on YARN-4246:
-

thanks [~varun_saxena] for reporting
 the same issue is there in applicationattempt  status also

{noformat}
 ./yarn applicationattempt -status appattempt_1444389134985_0001_01
15/10/09 16:53:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/10/09 16:53:20 INFO impl.TimelineClientImpl: Timeline service address: 
http://10.18.130.110:55033/ws/v1/timeline/
15/10/09 16:53:20 INFO client.RMProxy: Connecting to ResourceManager at 
host-10-18-130-110/10.18.130.110:8032
15/10/09 16:53:21 INFO client.AHSProxy: Connecting to Application History 
server at /10.18.130.110:55034
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationAttemptReport(ApplicationCLI.java:352)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:182)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
{noformat}

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618432#comment-14618432
 ] 

nijel commented on YARN-3813:
-

Attached initial draft for work details.
Please share your comments and thoughts

 Support Application timeout feature in YARN. 
 -

 Key: YARN-3813
 URL: https://issues.apache.org/jira/browse/YARN-3813
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: nijel
 Attachments: YARN Application Timeout -3.pdf


 It will be useful to support Application Timeout in YARN. Some use cases are 
 not worried about the output of the applications if the application is not 
 completed in a specific time. 
 *Background:*
 The requirement is to show the CDR statistics of last few  minutes, say for 
 every 5 minutes. The same Job will run continuously with different dataset.
 So one job will be started in every 5 minutes. The estimate time for this 
 task is 2 minutes or lesser time. 
 If the application is not completing in the given time the output is not 
 useful.
 *Proposal*
 So idea is to support application timeout, with which timeout parameter is 
 given while submitting the job. 
 Here, user is expecting to finish (complete or kill) the application in the 
 given time.
 One option for us is to move this logic to Application client (who submit the 
 job). 
 But it will be nice if it can be generic logic and can make more robust.
 Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
 will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: (was: YARN Application Timeout -3.pdf)

 Support Application timeout feature in YARN. 
 -

 Key: YARN-3813
 URL: https://issues.apache.org/jira/browse/YARN-3813
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: nijel

 It will be useful to support Application Timeout in YARN. Some use cases are 
 not worried about the output of the applications if the application is not 
 completed in a specific time. 
 *Background:*
 The requirement is to show the CDR statistics of last few  minutes, say for 
 every 5 minutes. The same Job will run continuously with different dataset.
 So one job will be started in every 5 minutes. The estimate time for this 
 task is 2 minutes or lesser time. 
 If the application is not completing in the given time the output is not 
 useful.
 *Proposal*
 So idea is to support application timeout, with which timeout parameter is 
 given while submitting the job. 
 Here, user is expecting to finish (complete or kill) the application in the 
 given time.
 One option for us is to move this logic to Application client (who submit the 
 job). 
 But it will be nice if it can be generic logic and can make more robust.
 Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
 will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: YARN Application Timeout -3.pdf

 Support Application timeout feature in YARN. 
 -

 Key: YARN-3813
 URL: https://issues.apache.org/jira/browse/YARN-3813
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: nijel
 Attachments: YARN Application Timeout -3.pdf


 It will be useful to support Application Timeout in YARN. Some use cases are 
 not worried about the output of the applications if the application is not 
 completed in a specific time. 
 *Background:*
 The requirement is to show the CDR statistics of last few  minutes, say for 
 every 5 minutes. The same Job will run continuously with different dataset.
 So one job will be started in every 5 minutes. The estimate time for this 
 task is 2 minutes or lesser time. 
 If the application is not completing in the given time the output is not 
 useful.
 *Proposal*
 So idea is to support application timeout, with which timeout parameter is 
 given while submitting the job. 
 Here, user is expecting to finish (complete or kill) the application in the 
 given time.
 One option for us is to move this logic to Application client (who submit the 
 job). 
 But it will be nice if it can be generic logic and can make more robust.
 Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
 will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: YARN Application Timeout .pdf

 Support Application timeout feature in YARN. 
 -

 Key: YARN-3813
 URL: https://issues.apache.org/jira/browse/YARN-3813
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: nijel
 Attachments: YARN Application Timeout .pdf


 It will be useful to support Application Timeout in YARN. Some use cases are 
 not worried about the output of the applications if the application is not 
 completed in a specific time. 
 *Background:*
 The requirement is to show the CDR statistics of last few  minutes, say for 
 every 5 minutes. The same Job will run continuously with different dataset.
 So one job will be started in every 5 minutes. The estimate time for this 
 task is 2 minutes or lesser time. 
 If the application is not completing in the given time the output is not 
 useful.
 *Proposal*
 So idea is to support application timeout, with which timeout parameter is 
 given while submitting the job. 
 Here, user is expecting to finish (complete or kill) the application in the 
 given time.
 One option for us is to move this logic to Application client (who submit the 
 job). 
 But it will be nice if it can be generic logic and can make more robust.
 Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
 will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-09 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619984#comment-14619984
 ] 

nijel commented on YARN-3813:
-

Thanks [~sunilg] and [~devaraj.k] for the comments

bq.How frequently are you going to check this condition for each application?
Plan is to have a configurable interval default to 30 sec 
(yarn.app.timeout.monitor.interval)

bq.Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we 
may not need a flag.
bq.I feel having a TIMEOUT state for RMAppImpl would be proper here. 

ok. We will add a TIMEOUT state and handle the changes
Due to this there will be few changes in app transitions, client package and 
the WEBUI

bq.I have a suggestion here.We can have a BasicAppMonitoringManager which can 
keep an entry of appId, app.getSubmissionTime.
bq. when the application gets submitted to RM then we can register the 
application with RMAppTimeOutMonitor using the user specified timeout.

Yes. Good suggestion. This we will update as a registration mechanism. But 
since each application can have its own timeout period, the code reusability 
looks like minimal.

{code}
RMAppTimeOutMonitor 
local map (appid, timeout)
add/register(appid, timeout)  -- from RMAppImpl
Run - if app is running/submitted and elapsed the time, kill it. If 
already completed, remove from map.
No delete/unregister method  -- this application will be be removed 
from map from run method
{code}

 Support Application timeout feature in YARN. 
 -

 Key: YARN-3813
 URL: https://issues.apache.org/jira/browse/YARN-3813
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: nijel
 Attachments: YARN Application Timeout .pdf


 It will be useful to support Application Timeout in YARN. Some use cases are 
 not worried about the output of the applications if the application is not 
 completed in a specific time. 
 *Background:*
 The requirement is to show the CDR statistics of last few  minutes, say for 
 every 5 minutes. The same Job will run continuously with different dataset.
 So one job will be started in every 5 minutes. The estimate time for this 
 task is 2 minutes or lesser time. 
 If the application is not completing in the given time the output is not 
 useful.
 *Proposal*
 So idea is to support application timeout, with which timeout parameter is 
 given while submitting the job. 
 Here, user is expecting to finish (complete or kill) the application in the 
 given time.
 One option for us is to move this logic to Application client (who submit the 
 job). 
 But it will be nice if it can be generic logic and can make more robust.
 Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
 will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-03 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729078#comment-14729078
 ] 

nijel commented on YARN-4110:
-

Sorry attached wrong patch so deleting the same

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4110:

Attachment: (was: 01-YARN-4110.patch)

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-09-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734602#comment-14734602
 ] 

nijel commented on YARN-3771:
-

hi all,
any comment on this change ? 

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3771.patch
>
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3869) Add app name to RM audit log

2015-09-02 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3869:
---

Assignee: (was: nijel)

Keeping it open for further comments and opinion

> Add app name to RM audit log
> 
>
> Key: YARN-3869
> URL: https://issues.apache.org/jira/browse/YARN-3869
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shay Rojansky
>Priority: Minor
>
> The YARN resource manager audit log currently includes useful info such as 
> APPID, USER, etc. One crucial piece of information missing is the 
> user-supplied application name.
> Users are familiar with their application name as shown in the YARN UI, etc. 
> It's vital for something like logstash to be able to associate logs with the 
> application name for later searching in something like kibana.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3813) Support Application timeout feature in YARN.

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3813:
---

Assignee: nijel

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3813) Support Application timeout feature in YARN.

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3813:

Attachment: 0001-YARN-3813.patch

Sorry for the long delay..

Adding an initial patch.
The action on timeout is considered as KILL.
Please have a look. I will update the patch with more test cases after initial 
review.

Thanks

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-09-03 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728898#comment-14728898
 ] 

nijel commented on YARN-3813:
-

This patch will address the initial issue. But this will kill the application 
even it is in RUNNING state.

As i understand the idea is to configure the states which the monitor needs to 
consider to kill the application. correct ?

But one doubt i have is whether the user will be aware of all the intermediate 
states for an app ?

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-03 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4110:

Attachment: 01-YARN-4110.patch

Thanks [~rohithsharma] for reporting
Attached the patch


> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: 01-YARN-4110.patch
>
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4110:

Attachment: YARN-4110_1.patch

Attached the patch
Please review

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4110_1.patch
>
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-09 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4135:

Attachment: YARN-4135_1.patch

attached the patch
please review

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-09 Thread nijel (JIRA)
nijel created YARN-4135:
---

 Summary: Improve the assertion message in MockRM while failing 
after waiting for the state.
 Key: YARN-4135
 URL: https://issues.apache.org/jira/browse/YARN-4135
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: nijel
Assignee: nijel
Priority: Minor


In MockRM when the test is failed after waiting for the given state, the 
application id or the attempt id can be printed for easy debug

As of now if it hard to track the test fail in log since there is no relation 
with test case and the application id.

Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-09 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_1.patch

Attaching the patch
Please review

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-15 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4135:

Attachment: YARN-4135_2.patch

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-15 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746870#comment-14746870
 ] 

nijel commented on YARN-4135:
-

thanks [~adhoot] for the comment
Updated the patch. Please review

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-16 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747106#comment-14747106
 ] 

nijel commented on YARN-4135:
-

bq.-1   yarn tests  54m 15s Tests failed in 
hadoop-yarn-server-resourcemanager.
Test skip is not related to this change

Thanks

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-11 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740313#comment-14740313
 ] 

nijel commented on YARN-4111:
-

Thanks [~sunilg] for the comments
bq. RMAppKilledAttemptEvent is used for both RMApp and RMAppAttempt. Name is 
slightly confusing. I think we can use this only for RMApp.
This is the same as failed and finished event. So i think this is ok.

bq. Also in RMAppAttempt, RMAppFailedAttemptEvent is changed to 
RMAppKilledAttemptEvent. Could we generalize RMAppFailedAttemptEvent for both 
Failed and Killed, and it can also take diagnostics.
before this fix failed event is raised with KILLED as state. SInce now the new 
event for kill is available it is changed.

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_3.patch

Updated javadoc comments

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4146) getServiceState command is missing in yarnadmin command help

2015-09-11 Thread nijel (JIRA)
nijel created YARN-4146:
---

 Summary: getServiceState command  is missing in yarnadmin command 
help
 Key: YARN-4146
 URL: https://issues.apache.org/jira/browse/YARN-4146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Minor


In yarnadmin command help getServiceState command is not mentioned.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_4.patch

updated the javadoc for missing "."

Test skip is not related to patch

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, 
> YARN-4111_4.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4146) getServiceState command is missing in yarnadmin command help

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel resolved YARN-4146.
-
Resolution: Invalid

Sorry.
My env was in non HA mode ! 

> getServiceState command  is missing in yarnadmin command help
> -
>
> Key: YARN-4146
> URL: https://issues.apache.org/jira/browse/YARN-4146
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: help, script
>
> In yarnadmin command help getServiceState command is not mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4146) getServiceState command is missing in yarnadmin command help

2015-09-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-4146:
---

Assignee: (was: nijel)

> getServiceState command  is missing in yarnadmin command help
> -
>
> Key: YARN-4146
> URL: https://issues.apache.org/jira/browse/YARN-4146
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Priority: Minor
>  Labels: help, script
>
> In yarnadmin command help getServiceState command is not mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-10 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4111:

Attachment: YARN-4111_2.patch

Updated the patch with checkstyle fix. 
The test failures are not related. 
Tried executing locally and are passing.

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-26 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909120#comment-14909120
 ] 

nijel commented on YARN-4111:
-

Thanks [~rohithsharma] and [~sunilg] for the comments
If we add the new constructor to add the message, other event classes like 
RMAppRejectedEvent and RMAppFinishedAttemptEvent can be removed ? these are 
also added to handle the message.

Or these classes can be kept as it is event separation for future updations.

what you say ?

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, 
> YARN-4111_4.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2015-09-26 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909121#comment-14909121
 ] 

nijel commented on YARN-4205:
-

Test cases failing as "method not found" for the method added in api project.
These tests passing locally !

I am not getting the reason for this fail. Any issue with build can cause this 
? 

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2015-09-28 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4205:

Attachment: YARN-4205_03.patch

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch, 
> YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2015-09-28 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933513#comment-14933513
 ] 

nijel commented on YARN-4205:
-

Thanks [~rohithsharma] for the comments
Updated the patch.


> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4192) Add YARN metric logging periodically to a seperate file

2015-09-20 Thread nijel (JIRA)
nijel created YARN-4192:
---

 Summary: Add YARN metric logging periodically to a seperate file
 Key: YARN-4192
 URL: https://issues.apache.org/jira/browse/YARN-4192
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: nijel
Assignee: nijel
Priority: Minor


HDFS-8880 added a framework for logging metrics in a given interval.
This can be added to YARN as well

Any thoughts ? 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-20 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877467#comment-14877467
 ] 

nijel commented on YARN-4135:
-

thanks [~rohithsharma] and [~adhoot]

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-09-23 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904688#comment-14904688
 ] 

nijel commented on YARN-3813:
-


thanks [~rohithsharma] and [~sunilg] for the comments
Updated patch with the the comment fix and test case for recovery.

bq. we are starting the monitor thread always regardless whether application 
demands for applicationtimeout or not. I feel we can have a configuration to 
enable this feature in RM level. Thoughts?
As i pinged you offline, this service will consider only apps which are 
configured with a timeout. So leaving as a default service.

bq.RMAppTimeOutMonitor : When InterruptedException is thrown in the below code, 
thread should break or throw back exception. So, thread will die else thread 
wil be alive for ever
The while loop is guarded for interrupted state



> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3813.patch, YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >