date:20140623


 [ 
https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2192:


Description: 
If the test is run with FairSchedulers, some of the tests fail because the 
metricsssytem objects are shared across tests and not destroyed completely.
{code}
Error Message

Metrics source QueueMetrics,q0=root already exists!
Stacktrace

org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
{code}

  was:
Some TestRMHA assume CapacityScheduler. If the test is run with multiple 
schedulers, some of the tests fail because the metricsssytem objects that are 
shared across tests and fail as below.

{code}
Error Message

Metrics source QueueMetrics,q0=root already exists!
Stacktrace

org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
{code}


 TestRMHA fails when run with a mix of Schedulers
 

 Key: YARN-2192
 URL: https://issues.apache.org/jira/browse/YARN-2192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 If the test is run with FairSchedulers, some of the tests fail because the 
 metricsssytem objects are shared across tests and not destroyed completely.
 {code}
 Error Message
 Metrics source QueueMetrics,q0=root already exists!
 Stacktrace
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2192) TestRMHA fails when run with a mix of Schedulers


 [ 
https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2192:


Attachment: YARN-2192.patch

Fix the cleanup of the metrics by removing the conditional that would not work 
in FairScheduler.

 TestRMHA fails when run with a mix of Schedulers
 

 Key: YARN-2192
 URL: https://issues.apache.org/jira/browse/YARN-2192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2192.patch


 If the test is run with FairSchedulers, some of the tests fail because the 
 metricsssytem objects are shared across tests and not destroyed completely.
 {code}
 Error Message
 Metrics source QueueMetrics,q0=root already exists!
 Stacktrace
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2192) TestRMHA fails when run with a mix of Schedulers


[ 
https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040476#comment-14040476
 ] 

Hadoop QA commented on YARN-2192:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651933/YARN-2192.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4045//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4045//console

This message is automatically generated.

 TestRMHA fails when run with a mix of Schedulers
 

 Key: YARN-2192
 URL: https://issues.apache.org/jira/browse/YARN-2192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2192.patch


 If the test is run with FairSchedulers, some of the tests fail because the 
 metricsssytem objects are shared across tests and not destroyed completely.
 {code}
 Error Message
 Metrics source QueueMetrics,q0=root already exists!
 Stacktrace
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed


 [ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2191:
-

Attachment: YARN-2191.patch

Uploaded a simplified patch and re-kick jenkins.

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-06-23 Thread Remus Rusanu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040551#comment-14040551
]

Remus Rusanu commented on YARN-1972:

[~vinodkv] I see there is no container executor topic at src/site/apt. I'm
thinking to write the WCE as part of a 'secure container' topic, which would
describe LCE as well. Is this OK?

Implement secure Windows Container Executor
---

Key: YARN-1972
URL: https://issues.apache.org/jira/browse/YARN-1972
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Labels: security, windows
Attachments: YARN-1972.1.patch, YARN-1972.2.patch

h1. Windows Secure Container Executor (WCE)
YARN-1063 adds the necessary infrasturcture to launch a process as a domain
user as a solution for the problem of having a security boundary between
processes executed in YARN containers and the Hadoop services. The WCE is a
container executor that leverages the winutils capabilities introduced in
YARN-1063 and launches containers as an OS process running as the job
submitter user. A description of the S4U infrastructure used by YARN-1063
alternatives considered can be read on that JIRA.
The WCE is based on the DefaultContainerExecutor. It relies on the DCE to
drive the flow of execution, but it overwrrides some emthods to the effect of:
* change the DCE created user cache directories to be owned by the job user
and by the nodemanager group.
* changes the actual container run command to use the 'createAsUser' command
of winutils task instead of 'create'
* runs the localization as standalone process instead of an in-process Java
method call. This in turn relies on the winutil createAsUser feature to run
the localization as the job user.

When compared to LinuxContainerExecutor (LCE), the WCE has some minor
differences:
* it does no delegate the creation of the user cache directories to the
native implementation.
* it does no require special handling to be able to delete user files
The approach on the WCE came from a practical trial-and-error approach. I had
to iron out some issues around the Windows script shell limitations (command
line length) to get it to work, the biggest issue being the huge CLASSPATH
that is commonplace in Hadoop environment container executions. The job
container itself is already dealing with this via a so called 'classpath
jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch
as a separate container the same issue had to be resolved and I used the same
'classpath jar' approach.
h2. Deployment Requirements
To use the WCE one needs to set the
`yarn.nodemanager.container-executor.class` to
`org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor`
and set the `yarn.nodemanager.windows-secure-container-executor.group` to a
Windows security group name that is the nodemanager service principal is a
member of (equivalent of LCE
`yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE
does not require any configuration outside of the Hadoop own's yar-site.xml.
For WCE to work the nodemanager must run as a service principal that is
member of the local Administrators group or LocalSystem. this is derived from
the need to invoke LoadUserProfile API which mention these requirements in
the specifications. This is in addition to the SE_TCB privilege mentioned in
YARN-1063, but this requirement will automatically imply that the SE_TCB
privilege is held by the nodemanager. For the Linux speakers in the audience,
the requirement is basically to run NM as root.
h2. Dedicated high privilege Service
Due to the high privilege required by the WCE we had discussed the need to
isolate the high privilege operations into a separate process, an 'executor'
service that is solely responsible to start the containers (incloding the
localizer). The NM would have to authenticate, authorize and communicate with
this service via an IPC mechanism and use this service to launch the
containers. I still believe we'll end up deploying such a service, but the
effort to onboard such a new platfrom specific new service on the project are
not trivial.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI


[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040567#comment-14040567
 ] 

Wangda Tan commented on YARN-2181:
--

Offine discussed with [~tassapola], requirements of this JIRA:
*App page:*
1) Total number of task containers preempted in this app
2) Total number of am containers preempted in this app
3) Total resource preempted in this app
4) Total number of task containers preempted in latest attempt
5) Total number of am containers preempted in latest attempt
6) Total resource preempted in latest attempt

*Queue page:*
1) Total number of task containers preempted in this queue
2) Total number of am containers preempted in this queue
3) Total resource preempted in this queue

Please let me know if you have any comment.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan

 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app/queue, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed


[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040589#comment-14040589
 ] 

Hadoop QA commented on YARN-2191:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651949/YARN-2191.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4046//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4046//console

This message is automatically generated.

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2193) Job history UI value are wrongly rendered

2014-06-23 Thread Ashutosh Jindal (JIRA)

Ashutosh Jindal created YARN-2193:
-

 Summary: Job history UI value are wrongly rendered
 Key: YARN-2193
 URL: https://issues.apache.org/jira/browse/YARN-2193
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashutosh Jindal


Job history UI value are wrongly rendered because some fields are missing in 
jhist file



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2193) Job history UI value are wrongly rendered

2014-06-23 Thread Ashutosh Jindal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated YARN-2193:
--

Attachment: issue.jpg

 Job history UI value are wrongly rendered
 -

 Key: YARN-2193
 URL: https://issues.apache.org/jira/browse/YARN-2193
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashutosh Jindal
 Attachments: issue.jpg


 Job history UI value are wrongly rendered because some fields are missing in 
 jhist file



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-06-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040726#comment-14040726
 ] 

Zhijie Shen commented on YARN-2181:
---

Wangda, is it good to do something similar to the job page of JHS? Say we show 
the total number of task containers preempted in this app; this number is 
associated with a link, which redirect users to the list of all the preempted 
containers. Similar for other numbers.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan

 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app/queue, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2193) Job history UI value are wrongly rendered

2014-06-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040735#comment-14040735
 ] 

Zhijie Shen commented on YARN-2193:
---

It seems that the data in jobsDataTable has been corrupted.

bq.  because some fields are missing in jhist file

[~ashutosh_jindal], would you please share what was missing in jhist file?

 Job history UI value are wrongly rendered
 -

 Key: YARN-2193
 URL: https://issues.apache.org/jira/browse/YARN-2193
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashutosh Jindal
 Attachments: issue.jpg


 Job history UI value are wrongly rendered because some fields are missing in 
 jhist file



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler

2014-06-23 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040745#comment-14040745
 ] 

Chen He commented on YARN-2109:
---

Done

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test

 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-23 Thread Sunil G (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040841#comment-14040841
]

Sunil G commented on YARN-2022:
---

Thank you [~leftnoteasy] for the comments.

I will update the patch for handling changes of YARN-1368.

bq.With this condition, container preemption will be interrupted when we have
am-capacity reached maxAMCapacity or less, is it what the original design?
As per the discussion with Mayank and Carlo, it was decided to upload a simple
patch by respecting the AM Resource percent only. I had offline discussion
earlier with [~curino] regarding the Max Capacity and AM Resource percent. AM
Resource percent considers max capacity of a Queue. There is scope of improving
this solution in that aspect, that I feel we can do in another JIRA. I will
raise a separate JIRA for same.

Preempting an Application Master container can be kept as least priority when
multiple applications are marked for preemption by
ProportionalCapacityPreemptionPolicy
-

Key: YARN-2022
URL: https://issues.apache.org/jira/browse/YARN-2022
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch,
YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch,
Yarn-2022.1.patch

Cluster Size = 16GB [2NM's]
Queue A Capacity = 50%
Queue B Capacity = 50%
Consider there are 3 applications running in Queue A which has taken the full
cluster capacity.
J1 = 2GB AM + 1GB * 4 Maps
J2 = 2GB AM + 1GB * 4 Maps
J3 = 2GB AM + 1GB * 2 Maps
Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
Currently in this scenario, Jobs J3 will get killed including its AM.
It is better if AM can be given least priority among multiple applications.
In this same scenario, map tasks from J3 and J2 can be preempted.
Later when cluster is free, maps can be allocated to these Jobs.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2072) RM/NM UIs and webservices are missing vcore information

2014-06-23 Thread Nathan Roberts (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-2072:
-

Attachment: YARN-2072.patch

Thanks for the review Tom!
I fixed the getReservedVirtualCores() bug and the typo.

I will file a followup jira for displaying the vcores the user would use (as 
opposed to today's default of 1) in the capacity and fifo schedulers.

 RM/NM UIs and webservices are missing vcore information
 ---

 Key: YARN-2072
 URL: https://issues.apache.org/jira/browse/YARN-2072
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2072.patch, YARN-2072.patch


 Change RM and NM UIs and webservices to include virtual cores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


 [ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2130:
-

Attachment: YARN-2130.5.patch

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2072) RM/NM UIs and webservices are missing vcore information


[ 
https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041012#comment-14041012
 ] 

Hadoop QA commented on YARN-2072:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651991/YARN-2072.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4047//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4047//console

This message is automatically generated.

 RM/NM UIs and webservices are missing vcore information
 ---

 Key: YARN-2072
 URL: https://issues.apache.org/jira/browse/YARN-2072
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2072.patch, YARN-2072.patch


 Change RM and NM UIs and webservices to include virtual cores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041013#comment-14041013
 ] 

Tsuyoshi OZAWA commented on YARN-2130:
--

[~kkambatl], thank you for the review. Updated a patch to address the comments:
1. Made RMAppManager's and ResourceTrackerService's constructor minimal.
2. Changed to leave the fields in ClientRMService.
3. Fixed to pass tests including initialization order of mocks and pointing 
correct objects from mocks. TestClientRMService#mockResourceScheduler is one of 
them.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2109:


Attachment: YARN-2109.001.patch

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2109:


Attachment: YARN-2109.001.patch

Submitting the patch

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch, YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2144) Add logs when preemption occurs


 [ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2144:
--

Attachment: YARN-2144.patch

Looks good overall, did some minor edits myself.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed


[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041063#comment-14041063
 ] 

Jian He commented on YARN-2191:
---

looks good, +1

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-23 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041077#comment-14041077
]

Junping Du commented on YARN-1341:
--

bq. Yes, applications should be like containers. If we fail to store an
application start in the state store then we should fail the container launch
that triggered the application to be added. This already happens in the current
patch for YARN-1354. If we fail to store the completion of an application then
worst-case we will report an application to the RM on restart that isn't
active, and the RM will correct the NM when it re-registers.
That make sense. I guess we should do additional work to check if the behavior
is as our expected.

bq. I wasn't planning on persisting metrics during restart, as there are quite
a few (e.g.: RPC metrics, etc.), and I'm not sure it's critical that they be
preserved across a restart. Does RM restart do this or are there plans to do so?
I think these metrics are important especially for user's monitoring tools and
we should make these info consistent during restart. So far from I know, RM
restart didn't track this because these metrics will be recover during events
recovery in RM restart. In current NM restart, some metrics could be lost, i.e.
allocatedContainers, etc. I think we should either count them back as part of
events during recovery or persistent them. Thoughts?

bq. Therefore I don't believe the effort to maintain a stale tag is going to be
worth it. Also if we refuse to load a state store that's stale then we are
going to leak containers because we won't try to recover anything from a stale
state store.
If so, how about we don't apply these changes until these changes can be
persistent? If so, we still keep consistent between state store and NM's
current state. Even we choose to fail the NM, we still can load state and
recover the working.

bq. Instead I think we should decide in the various store failure cases whether
the error should be fatal to the operation (which may lead to it being fatal to
the NM overall) or if we feel the recovery with stale information is a better
outcome than taking the NM down. In the latter case we should just log the
error and move on.
Do we expect some operations can be failed while other operation can be
successful? If this means short-term unavailable for persistent effort, we can
just handle it by adding retry. If not, we should expect other operations that
fetal get failed soon enough, and in this case, log error and move on in
non-fatal operations don't have many differences. No?

Recover NMTokens upon nodemanager restart
-

Key: YARN-1341
URL: https://issues.apache.org/jira/browse/YARN-1341
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch,
YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-23 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022.7.patch

I have updated the patch w.r.t YARN-1368. Also added test case to verify 
whether RMContainer is marked as AM Container even after RM restart/failover. 
Thank you [~leftnoteasy] for pointing this. Please review.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041086#comment-14041086
 ] 

Craig Welch commented on YARN-1039:
---

[~ste...@apache.org] wrt the need for a container level flag / a way for the 
application master to launch long lived containers - definitely, but the idea 
was for that to come as a later step - although that may be short-sighted, as 
it may be better to come up with a common way to do this for the application 
master container and the containers it later launches now instead of ending up 
with unmatched approaches later...

This first step is to provide a way for the application master to be launched 
in a long lived container (generally, an application master for a long lived 
application will need to itself be launched in a long lived container - at 
least, it needs to be possible to do so) - which is why there needs to be some 
way to indicate the need for a long lived container during application 
submission (necessary but not sufficient overall...)

[~zjshen] I was also wondering about using the tags, but after talking with 
[~xgong] we are not thinking that is the way to go because tags don't seem to 
be about changing behavior but only about freeform way to enable 
search/display/etc.

After this discussion and some looking around it really seems that what we are 
after is a way to communicate a quality of the needed container to the resource 
manager both at application submission (for the application master container) 
and also for later container launches by the master, kind of like the 
ResourceProto, which is also already present in both cases for the same reason 
(I suggested adding it there, actually, as something necessary for the 
container but [~xgong] objected, thinking it is really specific to metric 
qualities (cpu, memory...).

I'm going to take a look at adding something alongside /similar to the 
ResourceProto to indicate constraints/requirements for the container, starting 
with long lived, that can be common to application submission and when the 
containers are started later by the application, not necessarily a long field 
for bit manipulation but something which is also extensible 


 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed


 [ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2191:
--

Attachment: YARN-2191.patch

Changed the test name to be more accurate

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041096#comment-14041096
 ] 

Hadoop QA commented on YARN-2109:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652006/YARN-2109.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4049//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4049//console

This message is automatically generated.

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch, YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041095#comment-14041095
 ] 

Hadoop QA commented on YARN-2130:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652002/YARN-2130.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4048//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4048//console

This message is automatically generated.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041105#comment-14041105
 ] 

Steve Loughran commented on YARN-1039:
--

I see. I'd assume that the service flag would imply long-lived, but maybe 
they could be separated.

I'd like to see a {{long}} enum of flags here as its easier to be forwards 
compatible

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041110#comment-14041110
 ] 

Craig Welch commented on YARN-1039:
---

The more I look around, the better I like the idea of adding it to the resource 
proto.  It is the same kind of thing as the items already in there - it's a 
characteristic required for the container (it isn't a metric style quality, but 
still, it's a characteristic of the resource needed) and it is already present 
everywhere the information is needed (at application submission and when 
containers are requested).  Adding something so similar alongside the resource 
proto seems unnecessary.  Do you agree with [~xgong]'s concerns or do you think 
it makes sense to add it there?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-06-23 Thread Lohit Vijayarenu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041128#comment-14041128
 ] 

Lohit Vijayarenu commented on YARN-796:
---

As [~tucu00] mentioned, label sounds closely related to affinity and should be 
treated less off a resource. It becomes closely related to resources when it 
comes to exposing them on scheduler queues and exposing that to users who wish 
to schedule their jobs on certain set of labeled nodes. This is definitely very 
useful feature to have. Looking forward for design document. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041161#comment-14041161
 ] 

Hadoop QA commented on YARN-2109:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652007/YARN-2109.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4050//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4050//console

This message is automatically generated.

 TestRM fails some tests when some tests run with CapacityScheduler and some 
 with FairScheduler
 --

 Key: YARN-2109
 URL: https://issues.apache.org/jira/browse/YARN-2109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
  Labels: test
 Attachments: YARN-2109.001.patch, YARN-2109.001.patch


 testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in 
 [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set 
 it to be CapacityScheduler. But if the default scheduler is set to 
 FairScheduler then the rest of the tests that execute after this will fail 
 with invalid cast exceptions when getting queuemetrics. This is based on test 
 execution order as only the tests that execute after this test will fail. 
 This is because the queuemetrics will be initialized by this test to 
 QueueMetrics and shared by the subsequent tests. 
 We can explicitly clear the metrics at the end of this test to fix this.
 For example
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.008.patch

Addressed all comments 

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs


[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041172#comment-14041172
 ] 

Hadoop QA commented on YARN-2144:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652015/YARN-2144.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4051//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4051//console

This message is automatically generated.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2194) Add Cgroup support for RedHat 7

Wei Yan created YARN-2194:
-

 Summary: Add Cgroup support for RedHat 7
 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan


In previous versions of RedHat, we can build custom cgroup hierarchies with use 
of the cgconfig command from the libcgroup package. From RedHat 7, package 
libcgroup is deprecated and it is not recommended to use it since it can easily 
create conflicts with the default cgroup hierarchy. The systemd is provided and 
recommended for cgroup management. We need to add support for this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041228#comment-14041228
 ] 

Hadoop QA commented on YARN-2022:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652024/YARN-2022.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4052//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4052//console

This message is automatically generated.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed


[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041239#comment-14041239
 ] 

Hadoop QA commented on YARN-2191:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652025/YARN-2191.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4053//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4053//console

This message is automatically generated.

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-06-23 Thread Jon Bringhurst (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041255#comment-14041255
 ] 

Jon Bringhurst commented on YARN-2194:
--

It might also be useful to have a SystemdNspawnContainerExectuor for 
yarn.nodemanager.container-executor.class. I don't know how many people would 
be interesting it using it however.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041263#comment-14041263
 ] 

Wei Yan commented on YARN-2194:
---

SystemdNspawnContainerExectuor is a good idea. Just add one for systemd besides 
the standard CgroupsLCEHandler.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041267#comment-14041267
 ] 

Hadoop QA commented on YARN-1365:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652034/YARN-1365.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4054//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4054//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2195) Clean a piece of code in ResourceRequest

Wei Yan created YARN-2195:
-

 Summary: Clean a piece of code in ResourceRequest
 Key: YARN-2195
 URL: https://issues.apache.org/jira/browse/YARN-2195
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor


{code}
if (numContainersComparison == 0) {
  return 0;
} else {
  return numContainersComparison;
}
{code}

This code should be cleaned as 
{code}
return numContainersComparison;
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2195) Clean a piece of code in ResourceRequest


 [ 
https://issues.apache.org/jira/browse/YARN-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2195:
--

Attachment: YARN-2195.patch

 Clean a piece of code in ResourceRequest
 

 Key: YARN-2195
 URL: https://issues.apache.org/jira/browse/YARN-2195
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2195.patch


 {code}
 if (numContainersComparison == 0) {
   return 0;
 } else {
   return numContainersComparison;
 }
 {code}
 This code should be cleaned as 
 {code}
 return numContainersComparison;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2195) Clean a piece of code in ResourceRequest


[ 
https://issues.apache.org/jira/browse/YARN-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041331#comment-14041331
 ] 

Hadoop QA commented on YARN-2195:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652055/YARN-2195.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4055//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4055//console

This message is automatically generated.

 Clean a piece of code in ResourceRequest
 

 Key: YARN-2195
 URL: https://issues.apache.org/jira/browse/YARN-2195
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2195.patch


 {code}
 if (numContainersComparison == 0) {
   return 0;
 } else {
   return numContainersComparison;
 }
 {code}
 This code should be cleaned as 
 {code}
 return numContainersComparison;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041355#comment-14041355
 ] 

Anubhav Dhoot commented on YARN-1365:
-

The changes for addApplication caused the failures. I am going to open a 
separate jira to fix that as per Jian's suggest and undo those changes here. 

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.008.patch

Without the addApplication changes. Those will be covered in YARN-2196

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2196) Add Application duplicate APP_ACCEPTED events can be prevented with a flag

Anubhav Dhoot created YARN-2196:
---

 Summary: Add Application duplicate APP_ACCEPTED events can be 
prevented with a flag 
 Key: YARN-2196
 URL: https://issues.apache.org/jira/browse/YARN-2196
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot


YARN-1365 adds a flag to AddApplicationAttemptSchedulerEvent that prevents a 
duplicate ATTEMPT_ADDED event  in recovery. We can do something similar to 
AddApplicationSchedulerEvent to avoid the following transition.
{code}
// ACCECPTED state can once again receive APP_ACCEPTED event, because on
// recovery the app returns ACCEPTED state and the app once again go
// through the scheduler and triggers one more APP_ACCEPTED event at
// ACCEPTED state.
.addTransition(RMAppState.ACCEPTED, RMAppState.ACCEPTED,
RMAppEventType.APP_ACCEPTED)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041433#comment-14041433
 ] 

Hadoop QA commented on YARN-1365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652074/YARN-1365.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4056//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4056//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041435#comment-14041435
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~vinodkv]
Is this ok with you if we commit this patch? As you have concerns before.
I think we need to still avoid killing AM's even if we have patch for not 
killing applications if AM gets killed.
Please suggest.

Thanks,
Mayank



 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented


 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Description: 
We should document the condition when uber mode is enabled. Currently, users 
need to read following code to understand the condition.

{code}
boolean smallMemory =
( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
= sysMemSizeForUberSlot)
|| (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
   boolean smallCpu =
Math.max(
conf.getInt(
MRJobConfig.MAP_CPU_VCORES, 
MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
conf.getInt(
MRJobConfig.REDUCE_CPU_VCORES, 
MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
 = sysCPUSizeForUberSlot
{code}

  was:
We should document the condition when uber mode is enabled. If not, users need 
to read code.

{code}
boolean smallMemory =
( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
= sysMemSizeForUberSlot)
|| (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
   boolean smallCpu =
Math.max(
conf.getInt(
MRJobConfig.MAP_CPU_VCORES, 
MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
conf.getInt(
MRJobConfig.REDUCE_CPU_VCORES, 
MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
 = sysCPUSizeForUberSlot
{code}


 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. Currently, users 
 need to read following code to understand the condition.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041494#comment-14041494
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

Jian, thank you for clarifying. I'm working to address the comments. Please 
wait a moment.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041506#comment-14041506
 ] 

Jian He commented on YARN-1365:
---

we can revert RMAppImpl changes also ?

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc

2014-06-23 Thread Akira AJISAKA (JIRA)

Akira AJISAKA created YARN-2197:
---

 Summary: Add a link to YARN CHANGES.txt in the left side of doc
 Key: YARN-2197
 URL: https://issues.apache.org/jira/browse/YARN-2197
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Priority: Minor


Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left 
side of the document (hadoop-project/src/site/site.xml), but YARN does not 
exist.
{code}
  item name=Common CHANGES.txt 
href=hadoop-project-dist/hadoop-common/CHANGES.txt/
  item name=HDFS CHANGES.txt 
href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/
  item name=MapReduce CHANGES.txt 
href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/
  item name=Metrics 
href=hadoop-project-dist/hadoop-common/Metrics.html/
{code}
A link to YARN CHANGES.txt should be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs


[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041543#comment-14041543
 ] 

Wangda Tan commented on YARN-2144:
--

Thanks Jian for review!

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041537#comment-14041537
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

Brief design is as follows:

1. Adding getter method for epoch like {{getEpoch}} to RMContext.
2. Adding {{loadEpoch}} to RMStateStore and set the epoch value to RMContext in 
{{ResourceManager#serviceStart}}.

One discussion point is how to serialize the epoch. Can we add epoch definition 
to yarn_server_resourcemanager_service_protos.proto? [~jianhe], what do you 
think?

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041540#comment-14041540
 ] 

Hudson commented on YARN-2191:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5756 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5756/])
YARN-2191. Added a new test to ensure NM will clean up completed applications 
in the case of RM restart. Contributed by Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604949)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java


 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed


[ 
https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041544#comment-14041544
 ] 

Wangda Tan commented on YARN-2191:
--

Thanks for Jian's review and commit!

 Add a test to make sure NM will do application cleanup even if RM restarting 
 happens before application completed
 -

 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, 
 YARN-2191.patch


 In YARN-1885, there's a test in 
 TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, 
 we need one more test to make sure NM will do app cleanup when restart 
 happens before app finished. The sequence is,
 1. Submit app1 to RM1
 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
 3. Restart RM1
 4. Before RM1 finishes restarting, container-0 completed in NM1
 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
 completed
 6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041548#comment-14041548
 ] 

Jian He commented on YARN-2052:
---

sounds good

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.009.patch

Without the RMAppImpl changes

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.009.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc

2014-06-23 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2197:


Attachment: YARN-2197.patch

Attaching a patch.

 Add a link to YARN CHANGES.txt in the left side of doc
 --

 Key: YARN-2197
 URL: https://issues.apache.org/jira/browse/YARN-2197
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: YARN-2197.patch


 Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left 
 side of the document (hadoop-project/src/site/site.xml), but YARN does not 
 exist.
 {code}
   item name=Common CHANGES.txt 
 href=hadoop-project-dist/hadoop-common/CHANGES.txt/
   item name=HDFS CHANGES.txt 
 href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/
   item name=MapReduce CHANGES.txt 
 href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/
   item name=Metrics 
 href=hadoop-project-dist/hadoop-common/Metrics.html/
 {code}
 A link to YARN CHANGES.txt should be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041565#comment-14041565
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

OK!

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc


[ 
https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041587#comment-14041587
 ] 

Hadoop QA commented on YARN-2197:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652107/YARN-2197.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4058//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4058//console

This message is automatically generated.

 Add a link to YARN CHANGES.txt in the left side of doc
 --

 Key: YARN-2197
 URL: https://issues.apache.org/jira/browse/YARN-2197
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: YARN-2197.patch


 Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left 
 side of the document (hadoop-project/src/site/site.xml), but YARN does not 
 exist.
 {code}
   item name=Common CHANGES.txt 
 href=hadoop-project-dist/hadoop-common/CHANGES.txt/
   item name=HDFS CHANGES.txt 
 href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/
   item name=MapReduce CHANGES.txt 
 href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/
   item name=Metrics 
 href=hadoop-project-dist/hadoop-common/Metrics.html/
 {code}
 A link to YARN CHANGES.txt should be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041600#comment-14041600
 ] 

Hadoop QA commented on YARN-1365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652104/YARN-1365.009.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4057//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4057//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, 
 YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.009.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2069:
---

Assignee: Mayank Bansal  (was: Vinod Kumar Vavilapalli)

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal

 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041617#comment-14041617
 ] 

Mayank Bansal commented on YARN-2069:
-

Taking it over.

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-1.patch

Attaching patch

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041650#comment-14041650
 ] 

Hadoop QA commented on YARN-2069:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12652121/YARN-2069-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4059//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4059//console

This message is automatically generated.

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041652#comment-14041652
 ] 

Vinod Kumar Vavilapalli commented on YARN-1039:
---

I am not against a container/resource level definition of whether that 
container is long lived or not, but I think it is equally important to mark at 
the application level if _at least_ one container in the application is 
considered long lived. So, to summarize, how about
 - an app-level isLongRunning() that indicates _if at least one container of 
this application will be long-running_ and
 - a resource-request level isLongRunning() that indicates _if this container 
is long running or not_.

The app-level flag can help UIs, making very quick scheduling distinctions etc.

Thoughts?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1773) ShuffleHeader should have a format that can inform about errors

2014-06-23 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated YARN-1773:
--

 Target Version/s: 2.5.0  (was: 2.4.0)
Affects Version/s: 2.4.0

 ShuffleHeader should have a format that can inform about errors
 ---

 Key: YARN-1773
 URL: https://issues.apache.org/jira/browse/YARN-1773
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Bikas Saha
Priority: Critical

 Currently, the ShuffleHeader (which is a Writable) simply tries to read the 
 successful header (mapid, reduceid etc). If there is an error then the input 
 will have an error message instead of (mapid, reducedid etc). Thus parsing 
 the ShuffleHeader fails and since we dont know where the error message ends, 
 we cannot consume the remaining input stream which may have good data from 
 the remaining map outputs. Being able to encode the error in the 
 ShuffleHeader will let us parse out the error correctly and move on to the 
 remaining data.
 The shuffle handler response should say which maps are in error and which are 
 fine, what the error was for the erroneous maps. These will help report 
 diagnostics for easier upstream reporting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9