[jira] [Commented] (YARN-2102) More generalized timeline ACLs

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107898#comment-14107898
 ] 

Hadoop QA commented on YARN-2102:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663830/YARN-2102.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/4705//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4705//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4705//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4705//console

This message is automatically generated.

> More generalized timeline ACLs
> --
>
> Key: YARN-2102
> URL: https://issues.apache.org/jira/browse/YARN-2102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch
>
>
> We need to differentiate the access controls of reading and writing 
> operations, and we need to think about cross-entity access control. For 
> example, if we are executing a workflow of MR jobs, which writing the 
> timeline data of this workflow, we don't want other user to pollute the 
> timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue

2014-08-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107896#comment-14107896
 ] 

Sunil G commented on YARN-2385:
---

After checking code, *AbstractYarnScheduler#killAllAppsInQueue* and 
*ClientRMService#getApplications* can be changed by a combination of these apis 
as needed.
Currently the behavior is different for Fair and CS here in these cases. A 
uniform decision can be derived and then these two news apis can be used in 
this context as needed. i feel for *killAllAppsInQueue* and *getApplications* 
both pending and running applications are needed. 
[~zjshen], [~wangda] [~subru] please suggest your thoughts. If you agree to 
this, I would like to take up this Jira.

> Consider splitting getAppsinQueue to getRunningAppsInQueue + 
> getPendingAppsInQueue
> --
>
> Key: YARN-2385
> URL: https://issues.apache.org/jira/browse/YARN-2385
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler
>Reporter: Subramaniam Krishnan
>  Labels: abstractyarnscheduler
>
> Currently getAppsinQueue returns both pending & running apps. The purpose of 
> the JIRA is to explore splitting it to getRunningAppsInQueue + 
> getPendingAppsInQueue that will provide more flexibility to callers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2182) Update ContainerId#toString() to avoid conflicts before and after RM restart

2014-08-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107893#comment-14107893
 ] 

Tsuyoshi OZAWA commented on YARN-2182:
--

s/Changed to prefix epoch to ContainerId#toString()/Updated 
ContainerId#toString() to suffix epoch/

> Update ContainerId#toString() to avoid conflicts before and after RM restart
> 
>
> Key: YARN-2182
> URL: https://issues.apache.org/jira/browse/YARN-2182
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2182.1.patch
>
>
> ContainerId#toString() doesn't include any information about current cluster 
> id. This leads conflict between container ids. We can avoid the conflicts 
> without breaking backward compatibility by using epoch introduced on 
> YARN-2052.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2182) Update ContainerId#toString() to avoid conflicts before and after RM restart

2014-08-22 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2182:
-

Attachment: YARN-2182.1.patch

Changed to prefix epoch to ContainerId#toString().

> Update ContainerId#toString() to avoid conflicts before and after RM restart
> 
>
> Key: YARN-2182
> URL: https://issues.apache.org/jira/browse/YARN-2182
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2182.1.patch
>
>
> ContainerId#toString() doesn't include any information about current cluster 
> id. This leads conflict between container ids. We can avoid the conflicts 
> without breaking backward compatibility by using epoch introduced on 
> YARN-2052.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-08-22 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2446:
-

 Summary: Using TimelineNamespace to shield the entities of a user
 Key: YARN-2446
 URL: https://issues.apache.org/jira/browse/YARN-2446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
entities, preventing them from being accessed or affected by other users' 
operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2102) More generalized timeline ACLs

2014-08-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2102:
--

Attachment: YARN-2102.1.patch

I divided the work into two halves. In this Jira, I'd like to scope the work 
within  defining TimelineNamespace data model, reading from and writing into 
timeline store, making REST APIs for users to operate on the namespace and 
TimelineClient wrapper over the PUT method. In other word, this Jira focuses on 
making the new TimelineNamespace work end to end.

I'll create a follow up Jira to use TimelineNamespace to protect entities.

> More generalized timeline ACLs
> --
>
> Key: YARN-2102
> URL: https://issues.apache.org/jira/browse/YARN-2102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch
>
>
> We need to differentiate the access controls of reading and writing 
> operations, and we need to think about cross-entity access control. For 
> example, if we are executing a workflow of MR jobs, which writing the 
> timeline data of this workflow, we don't want other user to pollute the 
> timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-08-22 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-

Attachment: YARN-1879.9.patch

Refreshed a patch.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode

2014-08-22 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107859#comment-14107859
 ] 

Jonathan Eagles commented on YARN-2035:
---

[~zjshen], can you please review this new version of the patch?

> FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
> ---
>
> Key: YARN-2035
> URL: https://issues.apache.org/jira/browse/YARN-2035
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch
>
>
> Small bug that prevents ResourceManager and ApplicationHistoryService from 
> coming up while Namenode is in safemode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107852#comment-14107852
 ] 

Hadoop QA commented on YARN-2035:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663820/YARN-2035-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4703//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4703//console

This message is automatically generated.

> FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
> ---
>
> Key: YARN-2035
> URL: https://issues.apache.org/jira/browse/YARN-2035
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch
>
>
> Small bug that prevents ResourceManager and ApplicationHistoryService from 
> coming up while Namenode is in safemode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107851#comment-14107851
 ] 

zhihai xu commented on YARN-1458:
-

The test failure is not related to my change.
TestAMRestart is passed in my local build.


 T E S T S
---
Running 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 89.639 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode

2014-08-22 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2035:
--

Attachment: YARN-2035-v3.patch

Addressed failing tests with last patch.

> FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
> ---
>
> Key: YARN-2035
> URL: https://issues.apache.org/jira/browse/YARN-2035
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch
>
>
> Small bug that prevents ResourceManager and ApplicationHistoryService from 
> coming up while Namenode is in safemode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time

2014-08-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107823#comment-14107823
 ] 

Tsuyoshi OZAWA commented on YARN-1326:
--

A patch is ready for review. [~kkambatl], . could you check it?

> RM should log using RMStore at startup time
> ---
>
> Key: YARN-1326
> URL: https://issues.apache.org/jira/browse/YARN-1326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, 
> YARN-1326.4.patch, demo.png
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are no way to know which RMStore RM uses. It's useful to log 
> the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107822#comment-14107822
 ] 

Hadoop QA commented on YARN-1458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663814/YARN-1458.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4702//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4702//console

This message is automatically generated.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSh

[jira] [Commented] (YARN-2445) ATS does not reflect changes to uploaded TimelineEntity

2014-08-22 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107808#comment-14107808
 ] 

Billie Rinaldi commented on YARN-2445:
--

ATS is only designed to support aggregation.  In other words, each new primary 
filter or related entity is added to what is already there for the entity.  You 
cannot remove previously put information.  In this example, I would expect 
oldprop and newprop both to appear.

> ATS does not reflect changes to uploaded TimelineEntity
> ---
>
> Key: YARN-2445
> URL: https://issues.apache.org/jira/browse/YARN-2445
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Marcelo Vanzin
>Priority: Minor
> Attachments: ats2.java
>
>
> If you make a change to the TimelineEntity and send it to the ATS, that 
> change is not reflected in the stored data.
> For example, in the attached code, an existing primary filter is removed and 
> a new one is added. When you retrieve the entity from the ATS, it only 
> contains the old value:
> {noformat}
> {"entities":[{"events":[],"entitytype":"test","entity":"testid-ad5380c0-090e-4982-8da8-21676fe4e9f4","starttime":1408746026958,"relatedentities":{},"primaryfilters":{"oldprop":["val"]},"otherinfo":{}}]}
> {noformat}
> Perhaps this is what the design wanted, but from an API user standpoint, it's 
> really confusing, since to upload events I have to upload the entity itself, 
> and the changes are not reflected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107806#comment-14107806
 ] 

Hadoop QA commented on YARN-1326:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663809/YARN-1326.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4701//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4701//console

This message is automatically generated.

> RM should log using RMStore at startup time
> ---
>
> Key: YARN-1326
> URL: https://issues.apache.org/jira/browse/YARN-1326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, 
> YARN-1326.4.patch, demo.png
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are no way to know which RMStore RM uses. It's useful to log 
> the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107807#comment-14107807
 ] 

zhihai xu commented on YARN-1458:
-

I uploaded a new patch "YARN-1458.004.patch" to fix the test failure.
The test failure is the following:
Parent Queue: "root.parentB" have one Vcore steady fair share.
But root.parentB have two child queues:root.parentB.childB1 and 
root.parentB.childB2. we can't split one Vcore to two child queues.
The new patch will calculate conservatively to assign 0 Vcore to both child 
queues.
The old code will assign 1 Vcore to both child queues, which will be over total 
resource limit.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1458:


Attachment: YARN-1458.004.patch

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107785#comment-14107785
 ] 

Hadoop QA commented on YARN-2395:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663799/YARN-2395-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4700//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4700//console

This message is automatically generated.

> FairScheduler: Preemption timeout should be configurable per queue
> --
>
> Key: YARN-2395
> URL: https://issues.apache.org/jira/browse/YARN-2395
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2395-1.patch, YARN-2395-2.patch
>
>
> Currently in fair scheduler, the preemption logic considers fair share 
> starvation only at leaf queue level. This jira is created to implement it at 
> the parent queue as well.
> It involves :
> 1. Making "check for fair share starvation" and "amount of resource to 
> preempt"  recursive such that they traverse the queue hierarchy from root to 
> leaf.
> 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
> configurable on a per queue basis,so that we can specify different timeouts 
> for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1326) RM should log using RMStore at startup time

2014-08-22 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1326:
-

Attachment: YARN-1326.4.patch

Fixed failures of TestRMWebServices.

> RM should log using RMStore at startup time
> ---
>
> Key: YARN-1326
> URL: https://issues.apache.org/jira/browse/YARN-1326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, 
> YARN-1326.4.patch, demo.png
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are no way to know which RMStore RM uses. It's useful to log 
> the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107754#comment-14107754
 ] 

Hadoop QA commented on YARN-2360:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663761/YARN-2360-v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4699//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4699//console

This message is automatically generated.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
> YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-321) Generic application history service

2014-08-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-321:
-

Assignee: (was: Yu Gao)

> Generic application history service
> ---
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
> Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
> Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2395:
--

Attachment: YARN-2395-2.patch

Update a new patch which addresses Karthik's latest comments, and also add 
per-job preemption timeout configuration for min share.

> FairScheduler: Preemption timeout should be configurable per queue
> --
>
> Key: YARN-2395
> URL: https://issues.apache.org/jira/browse/YARN-2395
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2395-1.patch, YARN-2395-2.patch
>
>
> Currently in fair scheduler, the preemption logic considers fair share 
> starvation only at leaf queue level. This jira is created to implement it at 
> the parent queue as well.
> It involves :
> 1. Making "check for fair share starvation" and "amount of resource to 
> preempt"  recursive such that they traverse the queue hierarchy from root to 
> leaf.
> 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
> configurable on a per queue basis,so that we can specify different timeouts 
> for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107721#comment-14107721
 ] 

Hadoop QA commented on YARN-1458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663743/YARN-1458.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4698//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4698//console

This message is automatically generated.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParent

[jira] [Assigned] (YARN-321) Generic application history service

2014-08-22 Thread Yu Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Gao reassigned YARN-321:
---

Assignee: Yu Gao

> Generic application history service
> ---
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
>Assignee: Yu Gao
> Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
> Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2445) ATS does not reflect changes to uploaded TimelineEntity

2014-08-22 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated YARN-2445:
-

Attachment: ats2.java

> ATS does not reflect changes to uploaded TimelineEntity
> ---
>
> Key: YARN-2445
> URL: https://issues.apache.org/jira/browse/YARN-2445
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Marcelo Vanzin
>Priority: Minor
> Attachments: ats2.java
>
>
> If you make a change to the TimelineEntity and send it to the ATS, that 
> change is not reflected in the stored data.
> For example, in the attached code, an existing primary filter is removed and 
> a new one is added. When you retrieve the entity from the ATS, it only 
> contains the old value:
> {noformat}
> {"entities":[{"events":[],"entitytype":"test","entity":"testid-ad5380c0-090e-4982-8da8-21676fe4e9f4","starttime":1408746026958,"relatedentities":{},"primaryfilters":{"oldprop":["val"]},"otherinfo":{}}]}
> {noformat}
> Perhaps this is what the design wanted, but from an API user standpoint, it's 
> really confusing, since to upload events I have to upload the entity itself, 
> and the changes are not reflected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2445) ATS does not reflect changes to uploaded TimelineEntity

2014-08-22 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created YARN-2445:


 Summary: ATS does not reflect changes to uploaded TimelineEntity
 Key: YARN-2445
 URL: https://issues.apache.org/jira/browse/YARN-2445
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Marcelo Vanzin
Priority: Minor
 Attachments: ats2.java

If you make a change to the TimelineEntity and send it to the ATS, that change 
is not reflected in the stored data.

For example, in the attached code, an existing primary filter is removed and a 
new one is added. When you retrieve the entity from the ATS, it only contains 
the old value:

{noformat}
{"entities":[{"events":[],"entitytype":"test","entity":"testid-ad5380c0-090e-4982-8da8-21676fe4e9f4","starttime":1408746026958,"relatedentities":{},"primaryfilters":{"oldprop":["val"]},"otherinfo":{}}]}
{noformat}

Perhaps this is what the design wanted, but from an API user standpoint, it's 
really confusing, since to upload events I have to upload the entity itself, 
and the changes are not reflected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: YARN-2360-v5.patch

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
> YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: Screen_Shot_v5.png

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
> YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: (was: YARN-2360-v5.patch)

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
> YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2408) Resource Request REST API for YARN

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107563#comment-14107563
 ] 

Hadoop QA commented on YARN-2408:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663726/YARN-2408-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4697//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4697//console

This message is automatically generated.

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408-3.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API:
> {code:xml}
> 
>   96256
>   94
>   
> application_
> appattempt_
> default
> 96256
> 94
> 3
> 
>   
> 1024
> 1
> /default-rack
> 94
> true
> 20
>   
>   
> 1024
> 1
> *
> 94
> true
> 20
>   
>   
> 1024
> 1
> master
> 94
> true
> 20
>   
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: YARN-2360-v5.patch

A new patch that adds description in the fair scheduler .apt.vm file, also 
shows the description in the web UI when the mouse hover over the "steady fair 
share" label or "instantaneous fair share" label.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, 
> YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: (was: Screen_Shot_v5.png)

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, 
> YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: Screen_Shot_v5.png

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
> YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107479#comment-14107479
 ] 

Hadoop QA commented on YARN-2360:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663715/YARN-2360-v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4696//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4696//console

This message is automatically generated.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, 
> YARN-2360-v3.patch, YARN-2360-v4.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107471#comment-14107471
 ] 

zhihai xu commented on YARN-1458:
-

I uploaded a new patch "YARN-1458.003.patch" to resolve merge conflict after 
rebase to latest code.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1458:


Attachment: YARN-1458.003.patch

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107429#comment-14107429
 ] 

Hadoop QA commented on YARN-2440:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12663704/apache-yarn-2440.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4694//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4694//console

This message is automatically generated.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-08-22 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-

Attachment: YARN-2408-3.patch

Bug fix

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408-3.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API:
> {code:xml}
> 
>   96256
>   94
>   
> application_
> appattempt_
> default
> 96256
> 94
> 3
> 
>   
> 1024
> 1
> /default-rack
> 94
> true
> 20
>   
>   
> 1024
> 1
> *
> 94
> true
> 20
>   
>   
> 1024
> 1
> master
> 94
> true
> 20
>   
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-08-22 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-

Attachment: (was: YARN-2408-2.patch)

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408-3.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API:
> {code:xml}
> 
>   96256
>   94
>   
> application_
> appattempt_
> default
> 96256
> 94
> 3
> 
>   
> 1024
> 1
> /default-rack
> 94
> true
> 20
>   
>   
> 1024
> 1
> *
> 94
> true
> 20
>   
>   
> 1024
> 1
> master
> 94
> true
> 20
>   
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107379#comment-14107379
 ] 

Hadoop QA commented on YARN-1458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663617/YARN-1458.002.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4695//console

This message is automatically generated.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107354#comment-14107354
 ] 

Karthik Kambatla commented on YARN-2360:


Agree with Ashwin - we should definitely describe them in the apt.vm file, 
defining them on the UI is also very useful. 

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, 
> YARN-2360-v3.patch, YARN-2360-v4.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107351#comment-14107351
 ] 

Ashwin Shankar commented on YARN-2360:
--

[~ywskycn], patch looks good. Should we mention what "Instantaneous" and 
"Steady" fair share means in the fair scheduler doc ie apt.vm file, so that 
users know what it means ? I'm also torn on whether we should define these 
terms on the UI as part of the legend tool tip or some other way ?


> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, 
> YARN-2360-v3.patch, YARN-2360-v4.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107331#comment-14107331
 ] 

Jason Lowe commented on YARN-2440:
--

Sure for this JIRA we can go with a percent of total CPU to limit YARN.  For 
something like YARN-160 we'd need the user to specify some kind of relationship 
between vcores and  physical cores.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: Screen_Shot_v4.png

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, 
> YARN-2360-v3.patch, YARN-2360-v4.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: YARN-2360-v4.patch

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, 
> YARN-2360-v3.patch, YARN-2360-v4.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.

2014-08-22 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107276#comment-14107276
 ] 

Marcelo Vanzin commented on YARN-2444:
--

Ah, I'm using leveldb if that makes a difference.

> Primary filters added after first submission not indexed, cause exceptions in 
> logs.
> ---
>
> Key: YARN-2444
> URL: https://issues.apache.org/jira/browse/YARN-2444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Marcelo Vanzin
> Attachments: ats.java
>
>
> See attached code for an example. The code creates an entity with a primary 
> filter, submits it to the ATS. After that, a new primary filter value is 
> added and the entity is resubmitted. At that point two things can be seen:
> - Searching for the new primary filter value does not return the entity
> - The following exception shows up in the logs:
> {noformat}
> 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying 
> access for user dr.who (auth:SIMPLE) on the events of the timeline entity { 
> id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test }
> org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the 
> timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test 
> } is corrupted.
> at 
> org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107275#comment-14107275
 ] 

Varun Vasudev commented on YARN-2440:
-

It might make things easier to go with [~sandyr] idea to add a configuration to 
add a config which expresses a % of node's CPU that is used by YARN. [~jlowe] 
would that address your concerns?

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.

2014-08-22 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107269#comment-14107269
 ] 

Marcelo Vanzin commented on YARN-2444:
--

The following search causes the problem described above:

{noformat}/ws/v1/timeline/test?primaryFilter=prop2:val2{noformat}

The following one works as expected:

{noformat}/ws/v1/timeline/test?primaryFilter=prop1:val1{noformat}

> Primary filters added after first submission not indexed, cause exceptions in 
> logs.
> ---
>
> Key: YARN-2444
> URL: https://issues.apache.org/jira/browse/YARN-2444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Marcelo Vanzin
> Attachments: ats.java
>
>
> See attached code for an example. The code creates an entity with a primary 
> filter, submits it to the ATS. After that, a new primary filter value is 
> added and the entity is resubmitted. At that point two things can be seen:
> - Searching for the new primary filter value does not return the entity
> - The following exception shows up in the logs:
> {noformat}
> 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying 
> access for user dr.who (auth:SIMPLE) on the events of the timeline entity { 
> id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test }
> org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the 
> timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test 
> } is corrupted.
> at 
> org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.

2014-08-22 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created YARN-2444:


 Summary: Primary filters added after first submission not indexed, 
cause exceptions in logs.
 Key: YARN-2444
 URL: https://issues.apache.org/jira/browse/YARN-2444
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.5.0
Reporter: Marcelo Vanzin
 Attachments: ats.java

See attached code for an example. The code creates an entity with a primary 
filter, submits it to the ATS. After that, a new primary filter value is added 
and the entity is resubmitted. At that point two things can be seen:

- Searching for the new primary filter value does not return the entity
- The following exception shows up in the logs:

{noformat}
14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying access 
for user dr.who (auth:SIMPLE) on the events of the timeline entity { id: 
testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test }
org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the 
timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } 
is corrupted.
at 
org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67)
at 
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.

2014-08-22 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated YARN-2444:
-

Attachment: ats.java

> Primary filters added after first submission not indexed, cause exceptions in 
> logs.
> ---
>
> Key: YARN-2444
> URL: https://issues.apache.org/jira/browse/YARN-2444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Marcelo Vanzin
> Attachments: ats.java
>
>
> See attached code for an example. The code creates an entity with a primary 
> filter, submits it to the ATS. After that, a new primary filter value is 
> added and the entity is resubmitted. At that point two things can be seen:
> - Searching for the new primary filter value does not return the entity
> - The following exception shows up in the logs:
> {noformat}
> 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying 
> access for user dr.who (auth:SIMPLE) on the events of the timeline entity { 
> id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test }
> org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the 
> timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test 
> } is corrupted.
> at 
> org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107248#comment-14107248
 ] 

Sandy Ryza commented on YARN-2440:
--

We removed it because it wasn't consistent with the vmem-pmem-ratio and was an 
unnecessary layer of indirection. If automatically configuring a node's vcore 
resource based on its physical characteristics is a goal, I wouldn't be opposed 
to adding something back in.

For the purposes of this JIRA, might it be simpler to express a config in terms 
of the % of the node's CPU power that YARN gets?

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107252#comment-14107252
 ] 

Wei Yan commented on YARN-2360:
---

Thanks, Karthik. Will update a patch with changes, also another problem in the 
FairSchedulerQueueInfo.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107244#comment-14107244
 ] 

Karthik Kambatla commented on YARN-2360:


I would rename the legend to "Steady fairshare" and "Instantaneous fairshare". 

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-08-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107221#comment-14107221
 ] 

Wei Yan commented on YARN-810:
--

bq. With your current implementation, on a machine with 4 cores(and 4 vcores), 
a container which requests 2 vcores will have cfs_period_us set to 4096 and 
cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my 
understanding wrong?

Thanks, [~vvasudev]. I mentioned this problem after reading your YARN-2420 
patch. I'll double check this problem, and will update the patch.

> Support CGroup ceiling enforcement on CPU
> -
>
> Key: YARN-810
> URL: https://issues.apache.org/jira/browse/YARN-810
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.0.5-alpha
>Reporter: Chris Riccomini
>Assignee: Sandy Ryza
> Attachments: YARN-810.patch, YARN-810.patch
>
>
> Problem statement:
> YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
> Containers are then allowed to request vcores between the minimum and maximum 
> defined in the yarn-site.xml.
> In the case where a single-threaded container requests 1 vcore, with a 
> pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
> the core it's using, provided that no other container is also using it. This 
> happens, even though the only guarantee that YARN/CGroups is making is that 
> the container will get "at least" 1/4th of the core.
> If a second container then comes along, the second container can take 
> resources from the first, provided that the first container is still getting 
> at least its fair share (1/4th).
> There are certain cases where this is desirable. There are also certain cases 
> where it might be desirable to have a hard limit on CPU usage, and not allow 
> the process to go above the specified resource requirement, even if it's 
> available.
> Here's an RFC that describes the problem in more detail:
> http://lwn.net/Articles/336127/
> Solution:
> As it happens, when CFS is used in combination with CGroups, you can enforce 
> a ceiling using two files in cgroups:
> {noformat}
> cpu.cfs_quota_us
> cpu.cfs_period_us
> {noformat}
> The usage of these two files is documented in more detail here:
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
> Testing:
> I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
> it behaves as described above (it is a soft cap, and allows containers to use 
> more than they asked for). I then tested CFS CPU quotas manually with YARN.
> First, you can see that CFS is in use in the CGroup, based on the file names:
> {noformat}
> [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
> total 0
> -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
> drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
> -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
> -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
> -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
> [criccomi@eat1-qa464 ~]$ sudo -u app cat
> /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
> 10
> [criccomi@eat1-qa464 ~]$ sudo -u app cat
> /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
> -1
> {noformat}
> Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
> We can place processes in hard limits. I have process 4370 running YARN 
> container container_1371141151815_0003_01_03 on a host. By default, it's 
> running at ~300% cpu usage.
> {noformat}
> CPU
> 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
> {noformat}
> When I set the CFS quote:
> {noformat}
> echo 1000 > 
> /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
>  CPU
> 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
> {noformat}
> It drops to 1% usage, and you can see the box has room to spare:
> {noformat}
> Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
> 0.0%st
> {noformat}
> Turning the quota back to -1:
> {noformat}
> echo -1 > 
> /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
> {noformat}
> Burns the cores again:
> {noformat}
> Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
> 0.0%st
>  

[jira] [Updated] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2440:


Attachment: apache-yarn-2440.1.patch

Uploaded a new patch to address the issue raised by [~jlowe] on the max value 
of cfs_quota_us. I'll upload further versions once there's clarity on vcore to 
physical core mapping.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: Screen_Shot_v3.png

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: YARN-2360-v3.patch

Update a patch after YARN-2393. The Screen_Shot_v3.png is the fair scheduler 
web page.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: Screen_Shot_v3.png

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> YARN-2360-v1.txt, YARN-2360-v2.txt
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2360:
--

Attachment: (was: Screen_Shot_v3.png)

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> YARN-2360-v1.txt, YARN-2360-v2.txt
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107172#comment-14107172
 ] 

Varun Vasudev commented on YARN-810:


[~ywskycn] thanks for letting me know! Some comments on your patch -

1. In CgroupsLCEResourcesHandler.java, you set cfs_period_us to nmShares and 
cfs_quota_us to cpuShares. From the RedHat documentation, cfs_period_us and 
cfs_quota_us operate on a CPU basis. From the documentation
{quote}
   Note that the quota and period parameters operate on a CPU basis. To allow a 
process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 20 
and cpu.cfs_period_us to 10. 
{quote}
With your current implementation, on a machine with 4 cores(and 4 vcores), a 
container which requests 2 vcores will have cfs_period_us set to 4096 and 
cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my 
understanding wrong?

2. This is just nitpicking, but is it possible to change 
CpuEnforceCeilingEnabled(and its variants) to just CpuCeilingEnabled or 
CpuCeilingEnforced?

> Support CGroup ceiling enforcement on CPU
> -
>
> Key: YARN-810
> URL: https://issues.apache.org/jira/browse/YARN-810
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.0.5-alpha
>Reporter: Chris Riccomini
>Assignee: Sandy Ryza
> Attachments: YARN-810.patch, YARN-810.patch
>
>
> Problem statement:
> YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
> Containers are then allowed to request vcores between the minimum and maximum 
> defined in the yarn-site.xml.
> In the case where a single-threaded container requests 1 vcore, with a 
> pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
> the core it's using, provided that no other container is also using it. This 
> happens, even though the only guarantee that YARN/CGroups is making is that 
> the container will get "at least" 1/4th of the core.
> If a second container then comes along, the second container can take 
> resources from the first, provided that the first container is still getting 
> at least its fair share (1/4th).
> There are certain cases where this is desirable. There are also certain cases 
> where it might be desirable to have a hard limit on CPU usage, and not allow 
> the process to go above the specified resource requirement, even if it's 
> available.
> Here's an RFC that describes the problem in more detail:
> http://lwn.net/Articles/336127/
> Solution:
> As it happens, when CFS is used in combination with CGroups, you can enforce 
> a ceiling using two files in cgroups:
> {noformat}
> cpu.cfs_quota_us
> cpu.cfs_period_us
> {noformat}
> The usage of these two files is documented in more detail here:
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
> Testing:
> I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
> it behaves as described above (it is a soft cap, and allows containers to use 
> more than they asked for). I then tested CFS CPU quotas manually with YARN.
> First, you can see that CFS is in use in the CGroup, based on the file names:
> {noformat}
> [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
> total 0
> -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
> drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
> -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
> -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
> -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
> [criccomi@eat1-qa464 ~]$ sudo -u app cat
> /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
> 10
> [criccomi@eat1-qa464 ~]$ sudo -u app cat
> /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
> -1
> {noformat}
> Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
> We can place processes in hard limits. I have process 4370 running YARN 
> container container_1371141151815_0003_01_03 on a host. By default, it's 
> running at ~300% cpu usage.
> {noformat}
> CPU
> 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
> {noformat}
> When I set the CFS quote:
> {noformat}
> echo 1000 > 
> /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
>  CPU
> 4370 criccomi  20   0 1157m 563m  14m S  

[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107141#comment-14107141
 ] 

Jason Lowe commented on YARN-2440:
--

Interesting.  [~sandyr] could you comment?  I'm wondering how we're going to 
support automatically setting a node's vcore value based on the node's physical 
characteristics without some kind of property to specify how to convert from 
physical core to vcore.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1104) NMs to support rolling logs of stdout & stderr

2014-08-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1104:


Parent Issue: YARN-2443  (was: YARN-896)

> NMs to support rolling logs of stdout & stderr
> --
>
> Key: YARN-1104
> URL: https://issues.apache.org/jira/browse/YARN-1104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Steve Loughran
>Assignee: Xuan Gong
>
> Currently NMs stream the stdout and stderr streams of a container to a file. 
> For longer lived processes those files need to be rotated so that the log 
> doesn't overflow



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2443) Log Handling for Long Running Service

2014-08-22 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2443:
---

 Summary: Log Handling for Long Running Service
 Key: YARN-2443
 URL: https://issues.apache.org/jira/browse/YARN-2443
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107101#comment-14107101
 ] 

Varun Vasudev commented on YARN-2440:
-

There used to be a variable for that ratio but it was removed in YARN-782.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107093#comment-14107093
 ] 

Jason Lowe commented on YARN-2440:
--

bq. does it make sense to get the number of physical cores on the machine and 
derive the vcore to physical cpu ratio?

Only if the user can specify the multiplier between a vcore and a physical CPU. 
 Not all physical CPUs are created equal, and as I mentioned earlier, some 
sites will want to allow fractions of a physical CPU to be allocated.  
Otherwise we're limiting the number of containers to the number of physical 
cores, and not all tasks require a full core.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107069#comment-14107069
 ] 

Varun Vasudev commented on YARN-2440:
-

I'll update the patch to limit cfs_quota_us.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107068#comment-14107068
 ] 

Varun Vasudev commented on YARN-2440:
-

[~jlowe] does it make sense to get the number of physical cores on the machine 
and derive the vcore to physical cpu ratio?

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107066#comment-14107066
 ] 

zhihai xu commented on YARN-1458:
-

[~shurong.mai], YARN-1458.patch will cause regression. It won't work if all the 
weight and MinShare in the active queues are less than 1.
The type conversion from double to int in computeShare loses precision.
{code}
private static int computeShare(Schedulable sched, double w2rRatio,
  ResourceType type) {
double share = sched.getWeights().getWeight(type) * w2rRatio;
share = Math.max(share, getResourceValue(sched.getMinShare(), type));
share = Math.min(share, getResourceValue(sched.getMaxShare(), type));
return (int) share;
  }
{code}
In above code, the initial value w2rRatio is 1.0. If weight and MinShare are 
less than 1, computeShare will return 0.
resourceUsedWithWeightToResourceRatio will return the sum of all these return 
values from computeShare(after lose precision).
It will be zero if all the weight and MinShare in the active queues are less 
than 1. Then YARN-1458.patch will exit the loop earlier with
"rMax" value 1.0. Then "right" variable will be less than "rMax"(1.0). Then all 
queues' fair share will be set to 0 in the following code.
{code}
for (Schedulable sched : schedulables) {
  setResourceValue(computeShare(sched, right, type), sched.getFairShare(), 
type);
}
{code}

This is the reason why the TestFairScheduler is failed at line 1049.
testIsStarvedForFairShare configure the queueA weight 0.25 and queueB weight 
0.75 and total node resource 4 * 1024.
It creates two applications: one is assigned to queueA and the other is 
assigned to queueB.
After FaiScheduler(update) calculated the fair share,  queueA fair share should 
be 1 * 1024 and queueB fair share should be 3 * 1024.
but with YARN-1458.patch, both queueA fair share and queueB fair share are set 
to 0,
It is because in this test there are two active queues:queueA  and queueB, both 
weights are less than 1(0.25 and 0.75), MinShare(minResources) in queueA  and 
queueB are not configured, both MinShare use default value(0).

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, YARN-1458.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcema

[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107057#comment-14107057
 ] 

Jason Lowe commented on YARN-2440:
--

I think cfs_quota_us has a maximum value of 100, so we may have an issue if 
vcores>10.

I don't see how this takes into account the mapping of vcores to actual CPUs.   
It's not safe to assume 1 vcore == 1 physical CPU, as some sites will map 
multiple vcores to a physical core to allow fractions of a physical CPU to be 
allocated or to account for varying CPU performance across a heterogeneous 
cluster.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2441) NPE in nodemanager after restart

2014-08-22 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2441:


Priority: Major  (was: Minor)

> NPE in nodemanager after restart
> 
>
> Key: YARN-2441
> URL: https://issues.apache.org/jira/browse/YARN-2441
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nishan Shetty
>
> {code}
> 2014-08-22 16:43:19,640 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Blocking new container-requests as container manager rpc server is still 
> starting.
> 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45026: starting
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Updating node address : host-10-18-40-95:45026
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager started at /10.18.40.95:45026
> 2014-08-22 16:43:20,030 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager bound to host-10-18-40-95/10.18.40.95:45026
> 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using 
> callQueue class java.util.concurrent.LinkedBlockingQueue
> 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 45027
> 2014-08-22 16:43:20,158 INFO 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
> protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 
> to the server
> 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45027: starting
> 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
>   at 
> org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361)
>   at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275)
>   at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755)
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
>   at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
>   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
> 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2393) FairScheduler: Add the notion of steady fair share

2014-08-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107050#comment-14107050
 ] 

Wei Yan commented on YARN-2393:
---

Thanks, [~kasha], [~ashwinshankar77]. Will post a patch for the YARN-2360 for 
the UI.

> FairScheduler: Add the notion of steady fair share
> --
>
> Key: YARN-2393
> URL: https://issues.apache.org/jira/browse/YARN-2393
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Fix For: 2.6.0
>
> Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, 
> yarn-2393-4.patch
>
>
> Static fair share is a fair share allocation considering all(active/inactive) 
> queues.It would be shown on the UI for better predictability of finish time 
> of applications.
> We would compute static fair share only when needed, like on queue creation, 
> node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107025#comment-14107025
 ] 

Wei Yan commented on YARN-2440:
---

[~vvasudev], I misunderstood this jira. Will post comment later.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2431) NM restart: cgroup is not removed for reacquired containers

2014-08-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107008#comment-14107008
 ] 

Jason Lowe commented on YARN-2431:
--

Release audit problems are unrelated, see HDFS-6905.

> NM restart: cgroup is not removed for reacquired containers
> ---
>
> Key: YARN-2431
> URL: https://issues.apache.org/jira/browse/YARN-2431
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2431.patch
>
>
> The cgroup for a reacquired container is not being removed when the container 
> exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107012#comment-14107012
 ] 

Varun Vasudev commented on YARN-2440:
-

[~ywskycn] this patch doesn't limit containers to container_vcores/NM_vcores 
ratio. What it does do is limit the overall YARN usage to the 
yarn.nodemanager.resource.cpu-vcores. If you have 4 cores on a machine and set 
yarn.nodemanager.resource.cpu-vcores 2, we don't restrict the YARN containers 
to 2 cores. The containers can create threads and use up as many cores as they 
want, which defeats the purpose of setting yarn.nodemanager.resource.cpu-vcores.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107005#comment-14107005
 ] 

Wei Yan commented on YARN-2440:
---

[~vvasudev], for general cases, we shouldn't strictly limit the cfs_quota_us. 
We always want to let co-located containers to share the cpu resource in a 
proportional way, not strictly follow the container_vcores/NM_vcores ratio. We 
have one runnable patch in YARN-810. I'll check with Sandy for the reviewing.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2393) FairScheduler: Add the notion of steady fair share

2014-08-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107007#comment-14107007
 ] 

Hudson commented on YARN-2393:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6097 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6097/])
YARN-2393. FairScheduler: Add the notion of steady fair share. (Wei Yan via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619845)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerFairShare.java


> FairScheduler: Add the notion of steady fair share
> --
>
> Key: YARN-2393
> URL: https://issues.apache.org/jira/browse/YARN-2393
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, 
> yarn-2393-4.patch
>
>
> Static fair share is a fair share allocation considering all(active/inactive) 
> queues.It would be shown on the UI for better predictability of finish time 
> of applications.
> We would compute static fair share only when needed, like on queue creation, 
> node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-08-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107000#comment-14107000
 ] 

Wei Yan commented on YARN-810:
--

[~vvasudev], thanks for the offer. I'm still working on this.

> Support CGroup ceiling enforcement on CPU
> -
>
> Key: YARN-810
> URL: https://issues.apache.org/jira/browse/YARN-810
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.0.5-alpha
>Reporter: Chris Riccomini
>Assignee: Sandy Ryza
> Attachments: YARN-810.patch, YARN-810.patch
>
>
> Problem statement:
> YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
> Containers are then allowed to request vcores between the minimum and maximum 
> defined in the yarn-site.xml.
> In the case where a single-threaded container requests 1 vcore, with a 
> pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
> the core it's using, provided that no other container is also using it. This 
> happens, even though the only guarantee that YARN/CGroups is making is that 
> the container will get "at least" 1/4th of the core.
> If a second container then comes along, the second container can take 
> resources from the first, provided that the first container is still getting 
> at least its fair share (1/4th).
> There are certain cases where this is desirable. There are also certain cases 
> where it might be desirable to have a hard limit on CPU usage, and not allow 
> the process to go above the specified resource requirement, even if it's 
> available.
> Here's an RFC that describes the problem in more detail:
> http://lwn.net/Articles/336127/
> Solution:
> As it happens, when CFS is used in combination with CGroups, you can enforce 
> a ceiling using two files in cgroups:
> {noformat}
> cpu.cfs_quota_us
> cpu.cfs_period_us
> {noformat}
> The usage of these two files is documented in more detail here:
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
> Testing:
> I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
> it behaves as described above (it is a soft cap, and allows containers to use 
> more than they asked for). I then tested CFS CPU quotas manually with YARN.
> First, you can see that CFS is in use in the CGroup, based on the file names:
> {noformat}
> [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
> total 0
> -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
> drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
> -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
> -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
> -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
> -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
> [criccomi@eat1-qa464 ~]$ sudo -u app cat
> /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
> 10
> [criccomi@eat1-qa464 ~]$ sudo -u app cat
> /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
> -1
> {noformat}
> Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
> We can place processes in hard limits. I have process 4370 running YARN 
> container container_1371141151815_0003_01_03 on a host. By default, it's 
> running at ~300% cpu usage.
> {noformat}
> CPU
> 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
> {noformat}
> When I set the CFS quote:
> {noformat}
> echo 1000 > 
> /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
>  CPU
> 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
> {noformat}
> It drops to 1% usage, and you can see the box has room to spare:
> {noformat}
> Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
> 0.0%st
> {noformat}
> Turning the quota back to -1:
> {noformat}
> echo -1 > 
> /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
> {noformat}
> Burns the cores again:
> {noformat}
> Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
> 0.0%st
> CPU
> 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
> {noformat}
> On my dev box, I was testing CGroups by running a python process eight times, 
> to burn through all the cores, since it was doing as described above (giving 
> extra CPU to the process, even with a cpu.shares limit). T

[jira] [Commented] (YARN-2441) NPE in nodemanager after restart

2014-08-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106999#comment-14106999
 ] 

Jason Lowe commented on YARN-2441:
--

Ah, then this seems like a case where a client (likely an AM) is connecting to 
the NM before the NM has finished registering with the RM to get the secret 
keys.  Trying to block new container requests at the app level probably isn't 
going to work in practice because the SASL layer in RPC doesn't let the 
connection get to the point where the app can try to reject the request.

IMHO we should remove the "blocking client requests" code and instead do a 
delayed server start, sorta like the delay added by YARN-1337 when NM recovery 
is enabled.  Ideally the RPC layer would support the ability to bind to a 
server socket but not start accepting requests until later.  That would allow 
us to register with the RM knowing what our client port is but without trying 
to let clients through that port until we're really ready.

Shorter term fix might be to have the secret manager throw an exception that 
can be retried by clients if the master key isn't set yet.

> NPE in nodemanager after restart
> 
>
> Key: YARN-2441
> URL: https://issues.apache.org/jira/browse/YARN-2441
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nishan Shetty
>Priority: Minor
>
> {code}
> 2014-08-22 16:43:19,640 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Blocking new container-requests as container manager rpc server is still 
> starting.
> 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45026: starting
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Updating node address : host-10-18-40-95:45026
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager started at /10.18.40.95:45026
> 2014-08-22 16:43:20,030 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager bound to host-10-18-40-95/10.18.40.95:45026
> 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using 
> callQueue class java.util.concurrent.LinkedBlockingQueue
> 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 45027
> 2014-08-22 16:43:20,158 INFO 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
> protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 
> to the server
> 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45027: starting
> 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
>   at 
> org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361)
>   at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275)
>   at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755)
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
>   at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
>   at or

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106995#comment-14106995
 ] 

Varun Vasudev commented on YARN-160:


[~djp]
{quote}
Both physical id and core id are not guaranteed to have in /proc/cpuinfo 
(please see below for my local VM's info). We may use processor number instead 
in case these ids are 0 (like we did in Windows). Again, this weak my 
confidence that this automatic way of getting CPU/memory resources should 
happen by default (not sure if any cross-platform issues). May be a safer way 
here is to keep previous default behavior (with some static setting) with an 
extra config to enable this. We can wait this feature to be more stable later 
to change the default behavior.
{noformat}

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 70
model name  : Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
stepping: 1
cpu MHz : 2295.265
cache size  : 6144 KB
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm 
constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc 
aperfmperf unfair_spinlock pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 
x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb 
xsaveopt pln pts dts tpr_shadow vnmi ept vpid fsgsbase smep
bogomips: 4590.53
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
{noformat}
{quote}

In the example you gave, where we have processors listed but no physical id or 
core id entries, the numProcessors will be set to the number of entries and 
numCores will be set to 1. From the diff -
{noformat}
+  numCores = 1;
{noformat}
There is also a test case to ensure this behaviour.

In addition, cluster administrators can decide whether the NodeManager should 
report numProcessors or numCores by toggling 
yarn.nodemanager.resource.count-logical-processors-as-vcores which by default 
is true. In the vm example, by default the NodeManager will report vcores as 
the number of processor entries in /proc/cpuinfo. If 
yarn.nodemanager.resource.count-logical-processors-as-vcores is set to false, 
the NodeManager will report vcores as 1(if there are no physical id or core id 
entries).

> nodemanagers should obtain cpu/memory values from underlying OS
> ---
>
> Key: YARN-160
> URL: https://issues.apache.org/jira/browse/YARN-160
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Varun Vasudev
> Fix For: 2.6.0
>
> Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch
>
>
> As mentioned in YARN-2
> *NM memory and CPU configs*
> Currently these values are coming from the config of the NM, we should be 
> able to obtain those values from the OS (ie, in the case of Linux from 
> /proc/meminfo & /proc/cpuinfo). As this is highly OS dependent we should have 
> an interface that obtains this information. In addition implementations of 
> this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
> not to be avail as YARN resource), this would allow to reserve mem/cpu for 
> the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2393) FairScheduler: Add the notion of steady fair share

2014-08-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2393:
---

Issue Type: New Feature  (was: Improvement)

> FairScheduler: Add the notion of steady fair share
> --
>
> Key: YARN-2393
> URL: https://issues.apache.org/jira/browse/YARN-2393
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, 
> yarn-2393-4.patch
>
>
> Static fair share is a fair share allocation considering all(active/inactive) 
> queues.It would be shown on the UI for better predictability of finish time 
> of applications.
> We would compute static fair share only when needed, like on queue creation, 
> node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2393) FairScheduler: Add the notion of steady fair share

2014-08-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2393:
---

Summary: FairScheduler: Add the notion of steady fair share  (was: 
FairScheduler: Implement steady fair share)

> FairScheduler: Add the notion of steady fair share
> --
>
> Key: YARN-2393
> URL: https://issues.apache.org/jira/browse/YARN-2393
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, 
> yarn-2393-4.patch
>
>
> Static fair share is a fair share allocation considering all(active/inactive) 
> queues.It would be shown on the UI for better predictability of finish time 
> of applications.
> We would compute static fair share only when needed, like on queue creation, 
> node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2393) FairScheduler: Implement steady fair share

2014-08-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106991#comment-14106991
 ] 

Karthik Kambatla commented on YARN-2393:


Committing this. 

> FairScheduler: Implement steady fair share
> --
>
> Key: YARN-2393
> URL: https://issues.apache.org/jira/browse/YARN-2393
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, 
> yarn-2393-4.patch
>
>
> Static fair share is a fair share allocation considering all(active/inactive) 
> queues.It would be shown on the UI for better predictability of finish time 
> of applications.
> We would compute static fair share only when needed, like on queue creation, 
> node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2393) FairScheduler: Implement steady fair share

2014-08-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106990#comment-14106990
 ] 

Karthik Kambatla commented on YARN-2393:


One of the reasons we (Sandy and I) wanted to make the fairshare being used for 
scheduling instantaneous was to address the case where the maxAMResource 
becomes so small when there are multiple queues that we can't run any 
applications at all. I think it is better to leave it as is. In case any one 
runs into (in testing) issues with maxAMResource, we can consider preempting 
AMs as an alternative. 

> FairScheduler: Implement steady fair share
> --
>
> Key: YARN-2393
> URL: https://issues.apache.org/jira/browse/YARN-2393
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, 
> yarn-2393-4.patch
>
>
> Static fair share is a fair share allocation considering all(active/inactive) 
> queues.It would be shown on the UI for better predictability of finish time 
> of applications.
> We would compute static fair share only when needed, like on queue creation, 
> node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2441) NPE in nodemanager after restart

2014-08-22 Thread Nishan Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106987#comment-14106987
 ] 

Nishan Shetty commented on YARN-2441:
-

[~jlowe] Sorry i mentioned the wrong Affected Version. Its branch 2. 
Work-preserving NM is not enabled, its just plain restart

> NPE in nodemanager after restart
> 
>
> Key: YARN-2441
> URL: https://issues.apache.org/jira/browse/YARN-2441
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nishan Shetty
>Priority: Minor
>
> {code}
> 2014-08-22 16:43:19,640 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Blocking new container-requests as container manager rpc server is still 
> starting.
> 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45026: starting
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Updating node address : host-10-18-40-95:45026
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager started at /10.18.40.95:45026
> 2014-08-22 16:43:20,030 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager bound to host-10-18-40-95/10.18.40.95:45026
> 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using 
> callQueue class java.util.concurrent.LinkedBlockingQueue
> 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 45027
> 2014-08-22 16:43:20,158 INFO 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
> protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 
> to the server
> 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45027: starting
> 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
>   at 
> org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361)
>   at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275)
>   at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755)
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
>   at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
>   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
> 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State

2014-08-22 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2442:


Affects Version/s: (was: 3.0.0)
   2.5.0

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Nishan Shetty
>Priority: Trivial
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2393) FairScheduler: Implement steady fair share

2014-08-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2393:
---

Summary: FairScheduler: Implement steady fair share  (was: Fair Scheduler : 
Implement steady fair share)

> FairScheduler: Implement steady fair share
> --
>
> Key: YARN-2393
> URL: https://issues.apache.org/jira/browse/YARN-2393
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, 
> yarn-2393-4.patch
>
>
> Static fair share is a fair share allocation considering all(active/inactive) 
> queues.It would be shown on the UI for better predictability of finish time 
> of applications.
> We would compute static fair share only when needed, like on queue creation, 
> node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2441) NPE in nodemanager after restart

2014-08-22 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2441:


Affects Version/s: (was: 3.0.0)
   2.5.0

> NPE in nodemanager after restart
> 
>
> Key: YARN-2441
> URL: https://issues.apache.org/jira/browse/YARN-2441
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nishan Shetty
>Priority: Minor
>
> {code}
> 2014-08-22 16:43:19,640 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Blocking new container-requests as container manager rpc server is still 
> starting.
> 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45026: starting
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Updating node address : host-10-18-40-95:45026
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager started at /10.18.40.95:45026
> 2014-08-22 16:43:20,030 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager bound to host-10-18-40-95/10.18.40.95:45026
> 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using 
> callQueue class java.util.concurrent.LinkedBlockingQueue
> 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 45027
> 2014-08-22 16:43:20,158 INFO 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
> protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 
> to the server
> 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45027: starting
> 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
>   at 
> org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361)
>   at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275)
>   at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755)
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
>   at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
>   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
> 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2436) [post-HADOOP-9902] yarn application help doesn't work

2014-08-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106935#comment-14106935
 ] 

Hudson commented on YARN-2436:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1871 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1871/])
YARN-2436. [post-HADOOP-9902] yarn application help doesn't work (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619603)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn


> [post-HADOOP-9902] yarn application help doesn't work
> -
>
> Key: YARN-2436
> URL: https://issues.apache.org/jira/browse/YARN-2436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: YARN-2436.patch
>
>
> The previous version of the yarn command plays games with the command stack 
> for some commands.  The new code needs duplicate this wackiness.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled

2014-08-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106932#comment-14106932
 ] 

Hudson commented on YARN-2434:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1871 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1871/])
YARN-2434. RM should not recover containers from previously failed attempt when 
AM restart is not enabled. Contributed by Jian He (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619614)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> RM should not recover containers from previously failed attempt when AM 
> restart is not enabled
> --
>
> Key: YARN-2434
> URL: https://issues.apache.org/jira/browse/YARN-2434
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 3.0.0, 2.6.0
>
> Attachments: YARN-2434.1.patch
>
>
> If container-preserving AM restart is not enabled and AM failed during RM 
> restart, RM on restart should not recover containers from previously failed 
> attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2441) NPE in nodemanager after restart

2014-08-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106902#comment-14106902
 ] 

Jason Lowe commented on YARN-2441:
--

Was this truly running trunk as the Affected Versions field indicates or was 
this some other version of Hadoop?  Also was this a work-preserving NM restart 
scenario (i.e.: yarn.nodemanager.recovery.enabled=true) or a typical NM startup?

> NPE in nodemanager after restart
> 
>
> Key: YARN-2441
> URL: https://issues.apache.org/jira/browse/YARN-2441
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Priority: Minor
>
> {code}
> 2014-08-22 16:43:19,640 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Blocking new container-requests as container manager rpc server is still 
> starting.
> 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45026: starting
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Updating node address : host-10-18-40-95:45026
> 2014-08-22 16:43:20,029 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager started at /10.18.40.95:45026
> 2014-08-22 16:43:20,030 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  ContainerManager bound to host-10-18-40-95/10.18.40.95:45026
> 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using 
> callQueue class java.util.concurrent.LinkedBlockingQueue
> 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 45027
> 2014-08-22 16:43:20,158 INFO 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
> protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 
> to the server
> 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45027: starting
> 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
>   at 
> org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278)
>   at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361)
>   at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275)
>   at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878)
>   at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755)
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
>   at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
>   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
> 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 
> for port 45026: readAndProcess from client 10.18.40.84 threw exception 
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106893#comment-14106893
 ] 

Varun Vasudev commented on YARN-2440:
-

[~nroberts] there's already a ticket for your request - YARN-810. That's next 
on my todo list. I've left a comment there asking if I can take it over.

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml

2014-08-22 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106889#comment-14106889
 ] 

Nathan Roberts commented on YARN-2440:
--

Thanks Varun for the patch. I'm wondering if it would be possible to make this 
configurable at the system level and per-app. For example, I'd like an 
application to be able to specify that it wants to run with strict container 
limits (to verify SLA's for example), but in general I don't want these limits 
in place (why not let a container use additional CPU if it's available?). 

> Cgroups should limit YARN containers to cores allocated in yarn-site.xml
> 
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled

2014-08-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106860#comment-14106860
 ] 

Hudson commented on YARN-2434:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1845 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1845/])
YARN-2434. RM should not recover containers from previously failed attempt when 
AM restart is not enabled. Contributed by Jian He (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619614)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> RM should not recover containers from previously failed attempt when AM 
> restart is not enabled
> --
>
> Key: YARN-2434
> URL: https://issues.apache.org/jira/browse/YARN-2434
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 3.0.0, 2.6.0
>
> Attachments: YARN-2434.1.patch
>
>
> If container-preserving AM restart is not enabled and AM failed during RM 
> restart, RM on restart should not recover containers from previously failed 
> attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2436) [post-HADOOP-9902] yarn application help doesn't work

2014-08-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106863#comment-14106863
 ] 

Hudson commented on YARN-2436:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1845 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1845/])
YARN-2436. [post-HADOOP-9902] yarn application help doesn't work (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619603)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn


> [post-HADOOP-9902] yarn application help doesn't work
> -
>
> Key: YARN-2436
> URL: https://issues.apache.org/jira/browse/YARN-2436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: YARN-2436.patch
>
>
> The previous version of the yarn command plays games with the command stack 
> for some commands.  The new code needs duplicate this wackiness.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2345) yarn rmadmin -report

2014-08-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106816#comment-14106816
 ] 

Allen Wittenauer edited comment on YARN-2345 at 8/22/14 1:28 PM:
-

[~leftnoteasy]], this is to bring consistency between HDFS and YARN.hdfs 
dfsadmin -report has existed for a very long time while YARN doesn't have one.  
From a user perspective, it's irrelevant what is happening on the inside, just 
that YARN is "weird" if the equivalent is "yarn node -all -list".




was (Author: aw):
[~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin 
-report has existed for a very long time while YARN doesn't have one.  From a 
user perspective, it's irrelevant what is happening on the inside, just that 
YARN is "weird" if the equivalent is "yarn node -all -list".



> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Allen Wittenauer
>Assignee: Hao Gao
>  Labels: newbie
> Attachments: YARN-2345.1.patch
>
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2014-08-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106816#comment-14106816
 ] 

Allen Wittenauer commented on YARN-2345:


[~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin 
-report has existed for a very long time while the RM doesn't have one.  From a 
user perspective, it's irrelevant what is happening on the inside, just that 
YARN is "weird" if the equivalent is "yarn node -all -list".



> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Allen Wittenauer
>Assignee: Hao Gao
>  Labels: newbie
> Attachments: YARN-2345.1.patch
>
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2345) yarn rmadmin -report

2014-08-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106816#comment-14106816
 ] 

Allen Wittenauer edited comment on YARN-2345 at 8/22/14 1:26 PM:
-

[~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin 
-report has existed for a very long time while YARN doesn't have one.  From a 
user perspective, it's irrelevant what is happening on the inside, just that 
YARN is "weird" if the equivalent is "yarn node -all -list".




was (Author: aw):
[~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin 
-report has existed for a very long time while the RM doesn't have one.  From a 
user perspective, it's irrelevant what is happening on the inside, just that 
YARN is "weird" if the equivalent is "yarn node -all -list".



> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Allen Wittenauer
>Assignee: Hao Gao
>  Labels: newbie
> Attachments: YARN-2345.1.patch
>
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2442) ResourceManager JMX UI does not give HA State

2014-08-22 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2442:
---

 Summary: ResourceManager JMX UI does not give HA State
 Key: YARN-2442
 URL: https://issues.apache.org/jira/browse/YARN-2442
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Priority: Trivial


ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
STOPPED)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2441) NPE in nodemanager after restart

2014-08-22 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2441:
---

 Summary: NPE in nodemanager after restart
 Key: YARN-2441
 URL: https://issues.apache.org/jira/browse/YARN-2441
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Priority: Minor


{code}
2014-08-22 16:43:19,640 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Blocking new container-requests as container manager rpc server is still 
starting.
2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 45026: starting
2014-08-22 16:43:20,029 INFO 
org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
 Updating node address : host-10-18-40-95:45026
2014-08-22 16:43:20,029 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 ContainerManager started at /10.18.40.95:45026
2014-08-22 16:43:20,030 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 ContainerManager bound to host-10-18-40-95/10.18.40.95:45026
2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using 
callQueue class java.util.concurrent.LinkedBlockingQueue
2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket 
Reader #1 for port 45027
2014-08-22 16:43:20,158 INFO 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 
to the server
2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 45027: starting
2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for 
port 45026: readAndProcess from client 10.18.40.84 threw exception 
[java.lang.NullPointerException]
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
at 
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
at 
org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91)
at 
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278)
at 
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305)
at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
at 
org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384)
at 
org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361)
at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275)
at 
org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238)
at 
org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878)
at 
org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755)
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for 
port 45026: readAndProcess from client 10.18.40.84 threw exception 
[java.lang.NullPointerException]
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled

2014-08-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106717#comment-14106717
 ] 

Hudson commented on YARN-2434:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #654 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/654/])
YARN-2434. RM should not recover containers from previously failed attempt when 
AM restart is not enabled. Contributed by Jian He (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619614)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> RM should not recover containers from previously failed attempt when AM 
> restart is not enabled
> --
>
> Key: YARN-2434
> URL: https://issues.apache.org/jira/browse/YARN-2434
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 3.0.0, 2.6.0
>
> Attachments: YARN-2434.1.patch
>
>
> If container-preserving AM restart is not enabled and AM failed during RM 
> restart, RM on restart should not recover containers from previously failed 
> attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >