[jira] [Updated] (YARN-2192) TestRMHA fails when run with a mix of Schedulers

2014-06-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2192:


Description: 
If the test is run with FairSchedulers, some of the tests fail because the 
metricsssytem objects are shared across tests and not destroyed completely.
{code}
Error Message

Metrics source QueueMetrics,q0=root already exists!
Stacktrace

org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
{code}

  was:
Some TestRMHA assume CapacityScheduler. If the test is run with multiple 
schedulers, some of the tests fail because the metricsssytem objects that are 
shared across tests and fail as below.

{code}
Error Message

Metrics source QueueMetrics,q0=root already exists!
Stacktrace

org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
{code}


 TestRMHA fails when run with a mix of Schedulers
 

 Key: YARN-2192
 URL: https://issues.apache.org/jira/browse/YARN-2192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 If the test is run with FairSchedulers, some of the tests fail because the 
 metricsssytem objects are shared across tests and not destroyed completely.
 {code}
 Error Message
 Metrics source QueueMetrics,q0=root already exists!
 Stacktrace
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2192) TestRMHA fails when run with a mix of Schedulers

2014-06-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2192:


Attachment: YARN-2192.patch

Fix the cleanup of the metrics by removing the conditional that would not work 
in FairScheduler.

 TestRMHA fails when run with a mix of Schedulers
 

 Key: YARN-2192
 URL: https://issues.apache.org/jira/browse/YARN-2192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2192.patch


 If the test is run with FairSchedulers, some of the tests fail because the 
 metricsssytem objects are shared across tests and not destroyed completely.
 {code}
 Error Message
 Metrics source QueueMetrics,q0=root already exists!
 Stacktrace
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2192) TestRMHA fails when run with a mix of Schedulers

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040476#comment-14040476
 ] 

Hadoop QA commented on YARN-2192:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651933/YARN-2192.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4045//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4045//console

This message is automatically generated.

 TestRMHA fails when run with a mix of Schedulers
 

 Key: YARN-2192
 URL: https://issues.apache.org/jira/browse/YARN-2192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2192.patch


 If the test is run with FairSchedulers, some of the tests fail because the 
 metricsssytem objects are shared across tests and not destroyed completely.
 {code}
 Error Message
 Metrics source QueueMetrics,q0=root already exists!
 Stacktrace
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2191:
-

Attachment: YARN-2191.patch

Uploaded a simplified patch and re-kick jenkins.

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040551#comment-14040551
 ] 

Remus Rusanu commented on YARN-1972:


[~vinodkv] I see there is no container executor topic at src/site/apt. I'm 
thinking to write the WCE as part of a 'secure container' topic, which would 
describe LCE as well. Is this OK?

 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1972.1.patch, YARN-1972.2.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
 and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
 Windows security group name that is the nodemanager service principal is a 
 member of (equivalent of LCE 
 `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
 does not require any configuration outside of the Hadoop own's yar-site.xml.
 For WCE to work the nodemanager must run as a service principal that is 
 member of the local Administrators group or LocalSystem. this is derived from 
 the need to invoke LoadUserProfile API which mention these requirements in 
 the specifications. This is in addition to the SE_TCB privilege mentioned in 
 YARN-1063, but this requirement will automatically imply that the SE_TCB 
 privilege is held by the nodemanager. For the Linux speakers in the audience, 
 the requirement is basically to run NM as root.
 h2. Dedicated high privilege Service
 Due to the high privilege required by the WCE we had discussed the need to 
 isolate the high privilege operations into a separate process, an 'executor' 
 service that is solely responsible to start the containers (incloding the 
 localizer). The NM would have to authenticate, authorize and communicate with 
 this service via an IPC mechanism and use this service to launch the 
 containers. I still believe we'll end up deploying such a service, but the 
 effort to onboard such a new platfrom specific new service on the project are 
 not trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-06-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040567#comment-14040567
 ] 

Wangda Tan commented on YARN-2181:
--

Offine discussed with [~tassapola], requirements of this JIRA:
*App page:*
1) Total number of task containers preempted in this app
2) Total number of am containers preempted in this app
3) Total resource preempted in this app
4) Total number of task containers preempted in latest attempt
5) Total number of am containers preempted in latest attempt
6) Total resource preempted in latest attempt

*Queue page:*
1) Total number of task containers preempted in this queue
2) Total number of am containers preempted in this queue
3) Total resource preempted in this queue

Please let me know if you have any comment.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan

 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app/queue, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040589#comment-14040589
 ] 

Hadoop QA commented on YARN-2191:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651949/YARN-2191.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4046//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4046//console

This message is automatically generated.

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2193) Job history UI value are wrongly rendered

2014-06-23 Thread Ashutosh Jindal (JIRA)
Ashutosh Jindal created YARN-2193:
-

 Summary: Job history UI value are wrongly rendered
 Key: YARN-2193
 URL: https://issues.apache.org/jira/browse/YARN-2193
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashutosh Jindal


Job history UI value are wrongly rendered because some fields are missing in 
jhist file



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2193) Job history UI value are wrongly rendered

2014-06-23 Thread Ashutosh Jindal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated YARN-2193:
--

Attachment: issue.jpg

 Job history UI value are wrongly rendered
 -

 Key: YARN-2193
 URL: https://issues.apache.org/jira/browse/YARN-2193
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashutosh Jindal
 Attachments: issue.jpg


 Job history UI value are wrongly rendered because some fields are missing in 
 jhist file



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-06-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040726#comment-14040726
 ] 

Zhijie Shen commented on YARN-2181:
---

Wangda, is it good to do something similar to the job page of JHS? Say we show 
the total number of task containers preempted in this app; this number is 
associated with a link, which redirect users to the list of all the preempted 
containers. Similar for other numbers.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan

 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app/queue, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2193) Job history UI value are wrongly rendered

2014-06-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040735#comment-14040735
 ] 

Zhijie Shen commented on YARN-2193:
---

It seems that the data in jobsDataTable has been corrupted.

bq.  because some fields are missing in jhist file

[~ashutosh_jindal], would you please share what was missing in jhist file?

 Job history UI value are wrongly rendered
 -

 Key: YARN-2193
 URL: https://issues.apache.org/jira/browse/YARN-2193
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashutosh Jindal
 Attachments: issue.jpg


 Job history UI value are wrongly rendered because some fields are missing in 
 jhist file



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler

2014-06-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040745#comment-14040745
 ] 

Chen He commented on YARN-2109:
---

Done

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test

 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040841#comment-14040841
 ] 

Sunil G commented on YARN-2022:
---

Thank you [~leftnoteasy] for the comments.

I will update the patch for handling changes of YARN-1368. 

bq.With this condition, container preemption will be interrupted when we have 
am-capacity reached maxAMCapacity or less, is it what the original design?
As per the discussion with Mayank and Carlo, it was decided to upload a simple 
patch by respecting the AM Resource percent only. I had offline discussion 
earlier with [~curino] regarding the Max Capacity and AM Resource percent. AM 
Resource percent considers max capacity of a Queue. There is scope of improving 
this solution in that aspect, that I feel we can do in another JIRA. I will 
raise a separate JIRA for same.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2072) RM/NM UIs and webservices are missing vcore information

2014-06-23 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-2072:
-

Attachment: YARN-2072.patch

Thanks for the review Tom!
I fixed the getReservedVirtualCores() bug and the typo.

I will file a followup jira for displaying the vcores the user would use (as 
opposed to today's default of 1) in the capacity and fifo schedulers.

 RM/NM UIs and webservices are missing vcore information
 ---

 Key: YARN-2072
 URL: https://issues.apache.org/jira/browse/YARN-2072
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2072.patch, YARN-2072.patch


 Change RM and NM UIs and webservices to include virtual cores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-06-23 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2130:
-

Attachment: YARN-2130.5.patch

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2072) RM/NM UIs and webservices are missing vcore information

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041012#comment-14041012
 ] 

Hadoop QA commented on YARN-2072:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651991/YARN-2072.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4047//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4047//console

This message is automatically generated.

 RM/NM UIs and webservices are missing vcore information
 ---

 Key: YARN-2072
 URL: https://issues.apache.org/jira/browse/YARN-2072
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2072.patch, YARN-2072.patch


 Change RM and NM UIs and webservices to include virtual cores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-06-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041013#comment-14041013
 ] 

Tsuyoshi OZAWA commented on YARN-2130:
--

[~kkambatl], thank you for the review. Updated a patch to address the comments:
1. Made RMAppManager's and ResourceTrackerService's constructor minimal.
2. Changed to leave the fields in ClientRMService.
3. Fixed to pass tests including initialization order of mocks and pointing 
correct objects from mocks. TestClientRMService#mockResourceScheduler is one of 
them.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler

2014-06-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2109:


Attachment: YARN-2109.001.patch

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler

2014-06-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2109:


Attachment: YARN-2109.001.patch

Submitting the patch

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch, YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2144) Add logs when preemption occurs

2014-06-23 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2144:
--

Attachment: YARN-2144.patch

Looks good overall, did some minor edits myself.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041063#comment-14041063
 ] 

Jian He commented on YARN-2191:
---

looks good, +1

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041077#comment-14041077
 ] 

Junping Du commented on YARN-1341:
--

bq. Yes, applications should be like containers. If we fail to store an 
application start in the state store then we should fail the container launch 
that triggered the application to be added. This already happens in the current 
patch for YARN-1354. If we fail to store the completion of an application then 
worst-case we will report an application to the RM on restart that isn't 
active, and the RM will correct the NM when it re-registers.
That make sense. I guess we should do additional work to check if the behavior 
is as our expected.

bq. I wasn't planning on persisting metrics during restart, as there are quite 
a few (e.g.: RPC metrics, etc.), and I'm not sure it's critical that they be 
preserved across a restart. Does RM restart do this or are there plans to do so?
I think these metrics are important especially for user's monitoring tools and 
we should make these info consistent during restart. So far from I know, RM 
restart didn't track this because these metrics will be recover during events 
recovery in RM restart. In current NM restart, some metrics could be lost, i.e. 
allocatedContainers, etc. I think we should either count them back as part of 
events during recovery or persistent them. Thoughts?

bq. Therefore I don't believe the effort to maintain a stale tag is going to be 
worth it. Also if we refuse to load a state store that's stale then we are 
going to leak containers because we won't try to recover anything from a stale 
state store.
If so, how about we don't apply these changes until these changes can be 
persistent? If so, we still keep consistent between state store and NM's 
current state. Even we choose to fail the NM, we still can load state and 
recover the working.  

bq. Instead I think we should decide in the various store failure cases whether 
the error should be fatal to the operation (which may lead to it being fatal to 
the NM overall) or if we feel the recovery with stale information is a better 
outcome than taking the NM down. In the latter case we should just log the 
error and move on.
Do we expect some operations can be failed while other operation can be 
successful? If this means short-term unavailable for persistent effort, we can 
just handle it by adding retry. If not, we should expect other operations that 
fetal get failed soon enough, and in this case, log error and move on in 
non-fatal operations don't have many differences. No? 



 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-23 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022.7.patch

I have updated the patch w.r.t YARN-1368. Also added test case to verify 
whether RMContainer is marked as AM Container even after RM restart/failover. 
Thank you [~leftnoteasy] for pointing this. Please review.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041086#comment-14041086
 ] 

Craig Welch commented on YARN-1039:
---

[~ste...@apache.org] wrt the need for a container level flag / a way for the 
application master to launch long lived containers - definitely, but the idea 
was for that to come as a later step - although that may be short-sighted, as 
it may be better to come up with a common way to do this for the application 
master container and the containers it later launches now instead of ending up 
with unmatched approaches later...

This first step is to provide a way for the application master to be launched 
in a long lived container (generally, an application master for a long lived 
application will need to itself be launched in a long lived container - at 
least, it needs to be possible to do so) - which is why there needs to be some 
way to indicate the need for a long lived container during application 
submission (necessary but not sufficient overall...)

[~zjshen] I was also wondering about using the tags, but after talking with 
[~xgong] we are not thinking that is the way to go because tags don't seem to 
be about changing behavior but only about freeform way to enable 
search/display/etc.

After this discussion and some looking around it really seems that what we are 
after is a way to communicate a quality of the needed container to the resource 
manager both at application submission (for the application master container) 
and also for later container launches by the master, kind of like the 
ResourceProto, which is also already present in both cases for the same reason 
(I suggested adding it there, actually, as something necessary for the 
container but [~xgong] objected, thinking it is really specific to metric 
qualities (cpu, memory...).

I'm going to take a look at adding something alongside /similar to the 
ResourceProto to indicate constraints/requirements for the container, starting 
with long lived, that can be common to application submission and when the 
containers are started later by the application, not necessarily a long field 
for bit manipulation but something which is also extensible 


 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2191:
--

Attachment: YARN-2191.patch

Changed the test name to be more accurate

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041096#comment-14041096
 ] 

Hadoop QA commented on YARN-2109:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652006/YARN-2109.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4049//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4049//console

This message is automatically generated.

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch, YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041095#comment-14041095
 ] 

Hadoop QA commented on YARN-2130:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652002/YARN-2130.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4048//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4048//console

This message is automatically generated.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041105#comment-14041105
 ] 

Steve Loughran commented on YARN-1039:
--

I see. I'd assume that the service flag would imply long-lived, but maybe 
they could be separated.

I'd like to see a {{long}} enum of flags here as its easier to be forwards 
compatible

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041110#comment-14041110
 ] 

Craig Welch commented on YARN-1039:
---

The more I look around, the better I like the idea of adding it to the resource 
proto.  It is the same kind of thing as the items already in there - it's a 
characteristic required for the container (it isn't a metric style quality, but 
still, it's a characteristic of the resource needed) and it is already present 
everywhere the information is needed (at application submission and when 
containers are requested).  Adding something so similar alongside the resource 
proto seems unnecessary.  Do you agree with [~xgong]'s concerns or do you think 
it makes sense to add it there?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-06-23 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041128#comment-14041128
 ] 

Lohit Vijayarenu commented on YARN-796:
---

As [~tucu00] mentioned, label sounds closely related to affinity and should be 
treated less off a resource. It becomes closely related to resources when it 
comes to exposing them on scheduler queues and exposing that to users who wish 
to schedule their jobs on certain set of labeled nodes. This is definitely very 
useful feature to have. Looking forward for design document. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041161#comment-14041161
 ] 

Hadoop QA commented on YARN-2109:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652007/YARN-2109.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4050//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4050//console

This message is automatically generated.

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch, YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.008.patch

Addressed all comments 

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2144) Add logs when preemption occurs

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041172#comment-14041172
 ] 

Hadoop QA commented on YARN-2144:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652015/YARN-2144.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4051//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4051//console

This message is automatically generated.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2194) Add Cgroup support for RedHat 7

2014-06-23 Thread Wei Yan (JIRA)
Wei Yan created YARN-2194:
-

 Summary: Add Cgroup support for RedHat 7
 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan


In previous versions of RedHat, we can build custom cgroup hierarchies with use 
of the cgconfig command from the libcgroup package. From RedHat 7, package 
libcgroup is deprecated and it is not recommended to use it since it can easily 
create conflicts with the default cgroup hierarchy. The systemd is provided and 
recommended for cgroup management. We need to add support for this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041228#comment-14041228
 ] 

Hadoop QA commented on YARN-2022:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652024/YARN-2022.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4052//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4052//console

This message is automatically generated.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041239#comment-14041239
 ] 

Hadoop QA commented on YARN-2191:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652025/YARN-2191.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4053//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4053//console

This message is automatically generated.

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-06-23 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041255#comment-14041255
 ] 

Jon Bringhurst commented on YARN-2194:
--

It might also be useful to have a SystemdNspawnContainerExectuor for 
yarn.nodemanager.container-executor.class. I don't know how many people would 
be interesting it using it however.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-06-23 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041263#comment-14041263
 ] 

Wei Yan commented on YARN-2194:
---

SystemdNspawnContainerExectuor is a good idea. Just add one for systemd besides 
the standard CgroupsLCEHandler.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041267#comment-14041267
 ] 

Hadoop QA commented on YARN-1365:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652034/YARN-1365.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4054//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4054//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2195) Clean a piece of code in ResourceRequest

2014-06-23 Thread Wei Yan (JIRA)
Wei Yan created YARN-2195:
-

 Summary: Clean a piece of code in ResourceRequest
 Key: YARN-2195
 URL: https://issues.apache.org/jira/browse/YARN-2195
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor


{code}
if (numContainersComparison == 0) {
  return 0;
} else {
  return numContainersComparison;
}
{code}

This code should be cleaned as 
{code}
return numContainersComparison;
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2195) Clean a piece of code in ResourceRequest

2014-06-23 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2195:
--

Attachment: YARN-2195.patch

 Clean a piece of code in ResourceRequest
 

 Key: YARN-2195
 URL: https://issues.apache.org/jira/browse/YARN-2195
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2195.patch


 {code}
 if (numContainersComparison == 0) {
   return 0;
 } else {
   return numContainersComparison;
 }
 {code}
 This code should be cleaned as 
 {code}
 return numContainersComparison;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2195) Clean a piece of code in ResourceRequest

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041331#comment-14041331
 ] 

Hadoop QA commented on YARN-2195:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652055/YARN-2195.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4055//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4055//console

This message is automatically generated.

 Clean a piece of code in ResourceRequest
 

 Key: YARN-2195
 URL: https://issues.apache.org/jira/browse/YARN-2195
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2195.patch


 {code}
 if (numContainersComparison == 0) {
   return 0;
 } else {
   return numContainersComparison;
 }
 {code}
 This code should be cleaned as 
 {code}
 return numContainersComparison;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041355#comment-14041355
 ] 

Anubhav Dhoot commented on YARN-1365:
-

The changes for addApplication caused the failures. I am going to open a 
separate jira to fix that as per Jian's suggest and undo those changes here. 

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.008.patch

Without the addApplication changes. Those will be covered in YARN-2196

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2196) Add Application duplicate APP_ACCEPTED events can be prevented with a flag

2014-06-23 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2196:
---

 Summary: Add Application duplicate APP_ACCEPTED events can be 
prevented with a flag 
 Key: YARN-2196
 URL: https://issues.apache.org/jira/browse/YARN-2196
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot


YARN-1365 adds a flag to AddApplicationAttemptSchedulerEvent that prevents a 
duplicate ATTEMPT_ADDED event  in recovery. We can do something similar to 
AddApplicationSchedulerEvent to avoid the following transition.
{code}
// ACCECPTED state can once again receive APP_ACCEPTED event, because on
// recovery the app returns ACCEPTED state and the app once again go
// through the scheduler and triggers one more APP_ACCEPTED event at
// ACCEPTED state.
.addTransition(RMAppState.ACCEPTED, RMAppState.ACCEPTED,
RMAppEventType.APP_ACCEPTED)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041433#comment-14041433
 ] 

Hadoop QA commented on YARN-1365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652074/YARN-1365.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4056//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4056//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-23 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041435#comment-14041435
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~vinodkv]
Is this ok with you if we commit this patch? As you have concerns before.
I think we need to still avoid killing AM's even if we have patch for not 
killing applications if AM gets killed.
Please suggest.

Thanks,
Mayank



 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented

2014-06-23 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Description: 
We should document the condition when uber mode is enabled. Currently, users 
need to read following code to understand the condition.

{code}
boolean smallMemory =
( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
= sysMemSizeForUberSlot)
|| (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
   boolean smallCpu =
Math.max(
conf.getInt(
MRJobConfig.MAP_CPU_VCORES, 
MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
conf.getInt(
MRJobConfig.REDUCE_CPU_VCORES, 
MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
 = sysCPUSizeForUberSlot
{code}

  was:
We should document the condition when uber mode is enabled. If not, users need 
to read code.

{code}
boolean smallMemory =
( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
= sysMemSizeForUberSlot)
|| (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
   boolean smallCpu =
Math.max(
conf.getInt(
MRJobConfig.MAP_CPU_VCORES, 
MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
conf.getInt(
MRJobConfig.REDUCE_CPU_VCORES, 
MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
 = sysCPUSizeForUberSlot
{code}


 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. Currently, users 
 need to read following code to understand the condition.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041494#comment-14041494
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

Jian, thank you for clarifying. I'm working to address the comments. Please 
wait a moment.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041506#comment-14041506
 ] 

Jian He commented on YARN-1365:
---

we can revert RMAppImpl changes also ?

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc

2014-06-23 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-2197:
---

 Summary: Add a link to YARN CHANGES.txt in the left side of doc
 Key: YARN-2197
 URL: https://issues.apache.org/jira/browse/YARN-2197
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Priority: Minor


Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left 
side of the document (hadoop-project/src/site/site.xml), but YARN does not 
exist.
{code}
  item name=Common CHANGES.txt 
href=hadoop-project-dist/hadoop-common/CHANGES.txt/
  item name=HDFS CHANGES.txt 
href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/
  item name=MapReduce CHANGES.txt 
href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/
  item name=Metrics 
href=hadoop-project-dist/hadoop-common/Metrics.html/
{code}
A link to YARN CHANGES.txt should be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2144) Add logs when preemption occurs

2014-06-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041543#comment-14041543
 ] 

Wangda Tan commented on YARN-2144:
--

Thanks Jian for review!

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041537#comment-14041537
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

Brief design is as follows:

1. Adding getter method for epoch like {{getEpoch}} to RMContext.
2. Adding {{loadEpoch}} to RMStateStore and set the epoch value to RMContext in 
{{ResourceManager#serviceStart}}.

One discussion point is how to serialize the epoch. Can we add epoch definition 
to yarn_server_resourcemanager_service_protos.proto? [~jianhe], what do you 
think?

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041540#comment-14041540
 ] 

Hudson commented on YARN-2191:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5756 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5756/])
YARN-2191. Added a new test to ensure NM will clean up completed applications 
in the case of RM restart. Contributed by Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604949)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java


 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041544#comment-14041544
 ] 

Wangda Tan commented on YARN-2191:
--

Thanks for Jian's review and commit!

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041548#comment-14041548
 ] 

Jian He commented on YARN-2052:
---

sounds good

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.009.patch

Without the RMAppImpl changes

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.009.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc

2014-06-23 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2197:


Attachment: YARN-2197.patch

Attaching a patch.

 Add a link to YARN CHANGES.txt in the left side of doc
 --

 Key: YARN-2197
 URL: https://issues.apache.org/jira/browse/YARN-2197
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: YARN-2197.patch


 Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left 
 side of the document (hadoop-project/src/site/site.xml), but YARN does not 
 exist.
 {code}
   item name=Common CHANGES.txt 
 href=hadoop-project-dist/hadoop-common/CHANGES.txt/
   item name=HDFS CHANGES.txt 
 href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/
   item name=MapReduce CHANGES.txt 
 href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/
   item name=Metrics 
 href=hadoop-project-dist/hadoop-common/Metrics.html/
 {code}
 A link to YARN CHANGES.txt should be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041565#comment-14041565
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

OK!

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041587#comment-14041587
 ] 

Hadoop QA commented on YARN-2197:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652107/YARN-2197.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4058//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4058//console

This message is automatically generated.

 Add a link to YARN CHANGES.txt in the left side of doc
 --

 Key: YARN-2197
 URL: https://issues.apache.org/jira/browse/YARN-2197
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: YARN-2197.patch


 Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left 
 side of the document (hadoop-project/src/site/site.xml), but YARN does not 
 exist.
 {code}
   item name=Common CHANGES.txt 
 href=hadoop-project-dist/hadoop-common/CHANGES.txt/
   item name=HDFS CHANGES.txt 
 href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/
   item name=MapReduce CHANGES.txt 
 href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/
   item name=Metrics 
 href=hadoop-project-dist/hadoop-common/Metrics.html/
 {code}
 A link to YARN CHANGES.txt should be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041600#comment-14041600
 ] 

Hadoop QA commented on YARN-1365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652104/YARN-1365.009.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4057//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4057//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.009.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2069:
---

Assignee: Mayank Bansal  (was: Vinod Kumar Vavilapalli)

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal

 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041617#comment-14041617
 ] 

Mayank Bansal commented on YARN-2069:
-

Taking it over.

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-1.patch

Attaching patch

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041650#comment-14041650
 ] 

Hadoop QA commented on YARN-2069:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12652121/YARN-2069-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4059//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4059//console

This message is automatically generated.

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041652#comment-14041652
 ] 

Vinod Kumar Vavilapalli commented on YARN-1039:
---

I am not against a container/resource level definition of whether that 
container is long lived or not, but I think it is equally important to mark at 
the application level if _at least_ one container in the application is 
considered long lived. So, to summarize, how about
 - an app-level isLongRunning() that indicates _if at least one container of 
this application will be long-running_ and
 - a resource-request level isLongRunning() that indicates _if this container 
is long running or not_.

The app-level flag can help UIs, making very quick scheduling distinctions etc.

Thoughts?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1773) ShuffleHeader should have a format that can inform about errors

2014-06-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated YARN-1773:
--

 Target Version/s: 2.5.0  (was: 2.4.0)
Affects Version/s: 2.4.0

 ShuffleHeader should have a format that can inform about errors
 ---

 Key: YARN-1773
 URL: https://issues.apache.org/jira/browse/YARN-1773
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Bikas Saha
Priority: Critical

 Currently, the ShuffleHeader (which is a Writable) simply tries to read the 
 successful header (mapid, reduceid etc). If there is an error then the input 
 will have an error message instead of (mapid, reducedid etc). Thus parsing 
 the ShuffleHeader fails and since we dont know where the error message ends, 
 we cannot consume the remaining input stream which may have good data from 
 the remaining map outputs. Being able to encode the error in the 
 ShuffleHeader will let us parse out the error correctly and move on to the 
 remaining data.
 The shuffle handler response should say which maps are in error and which are 
 fine, what the error was for the erroneous maps. These will help report 
 diagnostics for easier upstream reporting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041684#comment-14041684
 ] 

Hadoop QA commented on YARN-1327:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12609276/nodemgr-portability.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4060//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4060//console

This message is automatically generated.

 Fix nodemgr native compilation problems on FreeBSD9
 ---

 Key: YARN-1327
 URL: https://issues.apache.org/jira/browse/YARN-1327
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Radim Kolar
Assignee: Radim Kolar
 Fix For: 3.0.0, 2.5.0

 Attachments: nodemgr-portability.txt


 There are several portability problems preventing from compiling native 
 component on freebsd.
 1. libgen.h is not included. correct function prototype is there but linux 
 glibc has workaround to define it for user if libgen.h is not directly 
 included. Include this file directly.
 2. query max size of login name using sysconf. it follows same code style 
 like rest of code using sysconf too.
 3. cgroups are linux only feature, make conditional compile and return error 
 if mount_cgroup is attempted on non linux OS
 4. do not use posix function setpgrp() since it clashes with same function 
 from BSD 4.2, use equivalent function. After inspecting glibc sources its 
 just shortcut to setpgid(0,0)
 These changes makes it compile on both linux and freebsd.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041690#comment-14041690
 ] 

Sunil G commented on YARN-2069:
---

Hi [~mayank_bansal]
I guess this patch also has the code changes of YARN-2022. I think this can be 
separated.

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2193) Job history UI value are wrongly rendered

2014-06-23 Thread Ashutosh Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041700#comment-14041700
 ] 

Ashutosh Jindal commented on YARN-2193:
---

During applicationMaster start up, JobHistoryEventHandler initializes the 
writer.  This is one time initialization. If this fails because of NN problem, 
then none of the events are written. 

In this issue, because of NN safemode, writer is not initialized. Only 
Job_Finished event is written. At historyserver, it parse the jhist file. 
Job_Finished event does not contain all the fields. So, some of the field are 
missed.

 Job history UI value are wrongly rendered
 -

 Key: YARN-2193
 URL: https://issues.apache.org/jira/browse/YARN-2193
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashutosh Jindal
 Attachments: issue.jpg


 Job history UI value are wrongly rendered because some fields are missing in 
 jhist file



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command

2014-06-23 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1480:
--

Attachment: YARN-1480-6.patch

Thank you for reviewing, [~zjshen]. Attached an updated patch.
- Added tags option as appTags.
- The queue option is also available as an application filter in this patch.
- Removed local filter and changed to use 
ApplicationClientProtocol#getApplications via YarnClient. Only finalStatus 
filter is leaved because of unsupported operation.

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
 YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-06-23 Thread Steven Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041740#comment-14041740
 ] 

Steven Wong commented on YARN-1775:
---

[~rajesh.balamohan], can you explain why you want to exclude 'the read only 
shared memory mappings in the process (i.e r--s, r-xs)'? Thanks.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Fix For: 2.4.0

 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, 
 yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)