[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible

2014-07-26 Thread Krisztian Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Horvath updated YARN-2280:


Attachment: YARN-2280.patch

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible

2014-07-26 Thread Krisztian Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Horvath updated YARN-2280:


Attachment: (was: YARN-2280.patch)

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible

2014-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075334#comment-14075334
 ] 

Hadoop QA commented on YARN-2280:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657984/YARN-2280.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4447//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4447//console

This message is automatically generated.

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075338#comment-14075338
 ] 

Hudson commented on YARN-2335:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/624/])
YARN-2335. Annotate all hadoop-sls APIs as @Private. (Wei Yan via kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613478)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RumenToSLSConverter.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/conf/SLSConfiguration.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/CapacitySchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ContainerSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FifoSchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/NodeUpdateSchedulerEventWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Annotate all hadoop-sls APIs as @Private
 

 Key: YARN-2335
 URL: https://issues.apache.org/jira/browse/YARN-2335
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2335-1.branch2.patch, YARN-2335-1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075346#comment-14075346
 ] 

Hudson commented on YARN-1796:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/624/])
YARN-1796. container-executor shouldn't require o-r permissions. Contributed by 
Aaron T. Myers. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613548)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 container-executor shouldn't require o-r permissions
 

 Key: YARN-1796
 URL: https://issues.apache.org/jira/browse/YARN-1796
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-1796.patch


 The container-executor currently checks that other users don't have read 
 permissions. This is unnecessary and runs contrary to the debian packaging 
 policy manual.
 This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075344#comment-14075344
 ] 

Hudson commented on YARN-2211:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/624/])
YARN-2211. Persist AMRMToken master key in RMStateStore for RM recovery. 
Contributed by Xuan Gong (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613515)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/AMRMTokenSecretManagerState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/AMRMTokenSecretManagerStatePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 RMStateStore needs to save 

[jira] [Commented] (YARN-2214) FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075342#comment-14075342
 ] 

Hudson commented on YARN-2214:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/624/])
YARN-2214. FairScheduler: preemptContainerPreCheck() in FSParentQueue delays 
convergence towards fairness. (Ashwin Shankar via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613459)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence 
 towards fairness
 --

 Key: YARN-2214
 URL: https://issues.apache.org/jira/browse/YARN-2214
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Fix For: 2.6.0

 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt


 preemptContainerPreCheck() in FSParentQueue rejects preemption requests if 
 the parent queue is below fair share. This can cause a delay in converging 
 towards fairness when the starved leaf queue and the queue above fairshare 
 belong under a non-root parent queue(ie their least common ancestor is a 
 parent queue which is not root).
 Here is an example :
 root.parent has fair share = 80% and usage = 80%
 root.parent.child1 has fair share =40% usage = 80%
 root.parent.child2 has fair share=40% usage=0%
 Now a job is submitted to child2 and the demand is 40%.
 Preemption will kick in and try to reclaim all the 40% from child1.
 When it preempts the first container from child1,the usage of root.parent 
 will become 80%, which is less than root.parent's fair share,causing 
 preemption to stop.So only one container gets preempted in this round 
 although the need is a lot more. child2 would eventually get to half its fair 
 share but only after multiple rounds of preemption.
 Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it 
 only in FSLeafQueue(which is already there).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1726) ResourceSchedulerWrapper broken due to AbstractYarnScheduler

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075341#comment-14075341
 ] 

Hudson commented on YARN-1726:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/624/])
YARN-1726. Add missing files. ResourceSchedulerWrapper broken due to 
AbstractYarnScheduler. (Wei Yan via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613552)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java
YARN-1726. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei 
Yan via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613547)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/TestSLSRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 ResourceSchedulerWrapper broken due to AbstractYarnScheduler
 

 Key: YARN-1726
 URL: https://issues.apache.org/jira/browse/YARN-1726
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Blocker
 Fix For: 2.5.0

 Attachments: YARN-1726-5.patch, YARN-1726-6-branch2.patch, 
 YARN-1726-6.patch, YARN-1726-7-branch2.patch, YARN-1726-7.patch, 
 YARN-1726.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch


 The YARN scheduler simulator failed when running Fair Scheduler, due to 
 AbstractYarnScheduler introduced in YARN-1041. The ResourceSchedulerWrapper 
 should inherit AbstractYarnScheduler, instead of implementing 
 ResourceScheduler interface directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075385#comment-14075385
 ] 

Hudson commented on YARN-2335:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/])
YARN-2335. Annotate all hadoop-sls APIs as @Private. (Wei Yan via kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613478)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RumenToSLSConverter.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/conf/SLSConfiguration.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/CapacitySchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ContainerSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FifoSchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/NodeUpdateSchedulerEventWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Annotate all hadoop-sls APIs as @Private
 

 Key: YARN-2335
 URL: https://issues.apache.org/jira/browse/YARN-2335
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2335-1.branch2.patch, YARN-2335-1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2214) FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075389#comment-14075389
 ] 

Hudson commented on YARN-2214:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/])
YARN-2214. FairScheduler: preemptContainerPreCheck() in FSParentQueue delays 
convergence towards fairness. (Ashwin Shankar via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613459)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence 
 towards fairness
 --

 Key: YARN-2214
 URL: https://issues.apache.org/jira/browse/YARN-2214
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Fix For: 2.6.0

 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt


 preemptContainerPreCheck() in FSParentQueue rejects preemption requests if 
 the parent queue is below fair share. This can cause a delay in converging 
 towards fairness when the starved leaf queue and the queue above fairshare 
 belong under a non-root parent queue(ie their least common ancestor is a 
 parent queue which is not root).
 Here is an example :
 root.parent has fair share = 80% and usage = 80%
 root.parent.child1 has fair share =40% usage = 80%
 root.parent.child2 has fair share=40% usage=0%
 Now a job is submitted to child2 and the demand is 40%.
 Preemption will kick in and try to reclaim all the 40% from child1.
 When it preempts the first container from child1,the usage of root.parent 
 will become 80%, which is less than root.parent's fair share,causing 
 preemption to stop.So only one container gets preempted in this round 
 although the need is a lot more. child2 would eventually get to half its fair 
 share but only after multiple rounds of preemption.
 Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it 
 only in FSLeafQueue(which is already there).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1726) ResourceSchedulerWrapper broken due to AbstractYarnScheduler

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075388#comment-14075388
 ] 

Hudson commented on YARN-1726:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/])
YARN-1726. Add missing files. ResourceSchedulerWrapper broken due to 
AbstractYarnScheduler. (Wei Yan via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613552)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java
YARN-1726. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei 
Yan via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613547)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/TestSLSRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 ResourceSchedulerWrapper broken due to AbstractYarnScheduler
 

 Key: YARN-1726
 URL: https://issues.apache.org/jira/browse/YARN-1726
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Blocker
 Fix For: 2.5.0

 Attachments: YARN-1726-5.patch, YARN-1726-6-branch2.patch, 
 YARN-1726-6.patch, YARN-1726-7-branch2.patch, YARN-1726-7.patch, 
 YARN-1726.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch


 The YARN scheduler simulator failed when running Fair Scheduler, due to 
 AbstractYarnScheduler introduced in YARN-1041. The ResourceSchedulerWrapper 
 should inherit AbstractYarnScheduler, instead of implementing 
 ResourceScheduler interface directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075391#comment-14075391
 ] 

Hudson commented on YARN-2211:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/])
YARN-2211. Persist AMRMToken master key in RMStateStore for RM recovery. 
Contributed by Xuan Gong (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613515)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/AMRMTokenSecretManagerState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/AMRMTokenSecretManagerStatePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 RMStateStore needs to 

[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075393#comment-14075393
 ] 

Hudson commented on YARN-1796:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/])
YARN-1796. container-executor shouldn't require o-r permissions. Contributed by 
Aaron T. Myers. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613548)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 container-executor shouldn't require o-r permissions
 

 Key: YARN-1796
 URL: https://issues.apache.org/jira/browse/YARN-1796
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-1796.patch


 The container-executor currently checks that other users don't have read 
 permissions. This is unnecessary and runs contrary to the debian packaging 
 policy manual.
 This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075402#comment-14075402
 ] 

Hudson commented on YARN-2335:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/])
YARN-2335. Annotate all hadoop-sls APIs as @Private. (Wei Yan via kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613478)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RumenToSLSConverter.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/conf/SLSConfiguration.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/CapacitySchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ContainerSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FifoSchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/NodeUpdateSchedulerEventWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Annotate all hadoop-sls APIs as @Private
 

 Key: YARN-2335
 URL: https://issues.apache.org/jira/browse/YARN-2335
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2335-1.branch2.patch, YARN-2335-1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075410#comment-14075410
 ] 

Hudson commented on YARN-1796:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/])
YARN-1796. container-executor shouldn't require o-r permissions. Contributed by 
Aaron T. Myers. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613548)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 container-executor shouldn't require o-r permissions
 

 Key: YARN-1796
 URL: https://issues.apache.org/jira/browse/YARN-1796
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-1796.patch


 The container-executor currently checks that other users don't have read 
 permissions. This is unnecessary and runs contrary to the debian packaging 
 policy manual.
 This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075408#comment-14075408
 ] 

Hudson commented on YARN-2211:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/])
YARN-2211. Persist AMRMToken master key in RMStateStore for RM recovery. 
Contributed by Xuan Gong (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613515)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/AMRMTokenSecretManagerState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/AMRMTokenSecretManagerStatePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 RMStateStore 

[jira] [Commented] (YARN-1726) ResourceSchedulerWrapper broken due to AbstractYarnScheduler

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075405#comment-14075405
 ] 

Hudson commented on YARN-1726:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/])
YARN-1726. Add missing files. ResourceSchedulerWrapper broken due to 
AbstractYarnScheduler. (Wei Yan via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613552)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java
YARN-1726. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei 
Yan via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613547)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/TestSLSRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 ResourceSchedulerWrapper broken due to AbstractYarnScheduler
 

 Key: YARN-1726
 URL: https://issues.apache.org/jira/browse/YARN-1726
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Blocker
 Fix For: 2.5.0

 Attachments: YARN-1726-5.patch, YARN-1726-6-branch2.patch, 
 YARN-1726-6.patch, YARN-1726-7-branch2.patch, YARN-1726-7.patch, 
 YARN-1726.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch


 The YARN scheduler simulator failed when running Fair Scheduler, due to 
 AbstractYarnScheduler introduced in YARN-1041. The ResourceSchedulerWrapper 
 should inherit AbstractYarnScheduler, instead of implementing 
 ResourceScheduler interface directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2214) FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075406#comment-14075406
 ] 

Hudson commented on YARN-2214:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/])
YARN-2214. FairScheduler: preemptContainerPreCheck() in FSParentQueue delays 
convergence towards fairness. (Ashwin Shankar via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613459)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence 
 towards fairness
 --

 Key: YARN-2214
 URL: https://issues.apache.org/jira/browse/YARN-2214
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Fix For: 2.6.0

 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt


 preemptContainerPreCheck() in FSParentQueue rejects preemption requests if 
 the parent queue is below fair share. This can cause a delay in converging 
 towards fairness when the starved leaf queue and the queue above fairshare 
 belong under a non-root parent queue(ie their least common ancestor is a 
 parent queue which is not root).
 Here is an example :
 root.parent has fair share = 80% and usage = 80%
 root.parent.child1 has fair share =40% usage = 80%
 root.parent.child2 has fair share=40% usage=0%
 Now a job is submitted to child2 and the demand is 40%.
 Preemption will kick in and try to reclaim all the 40% from child1.
 When it preempts the first container from child1,the usage of root.parent 
 will become 80%, which is less than root.parent's fair share,causing 
 preemption to stop.So only one container gets preempted in this round 
 although the need is a lot more. child2 would eventually get to half its fair 
 share but only after multiple rounds of preemption.
 Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it 
 only in FSLeafQueue(which is already there).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps

2014-07-26 Thread Ram Venkatesh (JIRA)
Ram Venkatesh created YARN-2362:
---

 Summary: Capacity Scheduler apps with requests that exceed 
capacity can starve pending apps
 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh


Cluster configuration:
Total memory: 8GB
yarn.scheduler.minimum-allocation-mb 256
yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)

App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
It subsequently makes a request for 4.6 GB, which cannot be granted and it 
waits.

App 2 makes a request for 1 GB - never receives it, so the app stays in the 
ACCEPTED state for ever.

I think this can happen in leaf queues that are near capacity.

The fix is likely in LeafQueue.java assignContainers near line 861, where it 
returns if the assignment would exceed queue capacity, instead of checking if 
requests for other active applications can be met.

   // Check queue max-capacity limit
   if (!assignToQueue(clusterResource, required)) {
-return NULL_ASSIGNMENT;
+break;
   }

With this change, the scenario above allows App 2 to start and finish while App 
1 continues to wait.

I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps

2014-07-26 Thread Ram Venkatesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh updated YARN-2362:


Description: 
Cluster configuration:
Total memory: 8GB
yarn.scheduler.minimum-allocation-mb 256
yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)

App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
It subsequently makes a request for 4.6 GB, which cannot be granted and it 
waits.

App 2 makes a request for 1 GB - never receives it, so the app stays in the 
ACCEPTED state for ever.

I think this can happen in leaf queues that are near capacity.

The fix is likely in LeafQueue.java assignContainers near line 861, where it 
returns if the assignment would exceed queue capacity, instead of checking if 
requests for other active applications can be met.

{code:title=LeafQueue.java|borderStyle=solid}
   // Check queue max-capacity limit
   if (!assignToQueue(clusterResource, required)) {
-return NULL_ASSIGNMENT;
+break;
   }
{code}

With this change, the scenario above allows App 2 to start and finish while App 
1 continues to wait.

I have a patch available, but wondering if the current behavior is by design.

  was:
Cluster configuration:
Total memory: 8GB
yarn.scheduler.minimum-allocation-mb 256
yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)

App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
It subsequently makes a request for 4.6 GB, which cannot be granted and it 
waits.

App 2 makes a request for 1 GB - never receives it, so the app stays in the 
ACCEPTED state for ever.

I think this can happen in leaf queues that are near capacity.

The fix is likely in LeafQueue.java assignContainers near line 861, where it 
returns if the assignment would exceed queue capacity, instead of checking if 
requests for other active applications can be met.

   // Check queue max-capacity limit
   if (!assignToQueue(clusterResource, required)) {
-return NULL_ASSIGNMENT;
+break;
   }

With this change, the scenario above allows App 2 to start and finish while App 
1 continues to wait.

I have a patch available, but wondering if the current behavior is by design.


 Capacity Scheduler apps with requests that exceed capacity can starve pending 
 apps
 --

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2346) Add a 'status' command to yarn-daemon.sh

2014-07-26 Thread Nikunj Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075462#comment-14075462
 ] 

Nikunj Bansal commented on YARN-2346:
-

HADOOP-9902 is being resolved for 3.0.0. Meanwhile for 2.5.0 I do have a patch 
based on the current scripts.

 Add a 'status' command to yarn-daemon.sh
 

 Key: YARN-2346
 URL: https://issues.apache.org/jira/browse/YARN-2346
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
Reporter: Nikunj Bansal
Assignee: Allen Wittenauer
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 Adding a 'status' command to yarn-daemon.sh will be useful for finding out 
 the status of yarn daemons.
 Running the 'status' command should exit with a 0 exit code if the target 
 daemon is running and non-zero code in case its not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps

2014-07-26 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075476#comment-14075476
 ] 

Chen He commented on YARN-2362:
---

This is interesting. In general, user may not submit an application that asks 
for 50% of the whole cluster resources. It is possible that a cluster has more 
than 2 applications. If third application finishes, App2 can get enough 
resource and run. Then, deadlock breaks. Is this reasonable, [~venkateshrin] ?

 Capacity Scheduler apps with requests that exceed capacity can starve pending 
 apps
 --

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2346) Add a 'status' command to yarn-daemon.sh

2014-07-26 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075508#comment-14075508
 ] 

Allen Wittenauer commented on YARN-2346:


In order to make this work reliably, it's a significant refactoring of how 
daemons launch.  All of that refactoring has already been done in HADOOP-9902.  

Specifically, pid handling has to get moved to the yarn, hdfs, and mapred 
commands from the *-daemon.sh commands.  For example, if one runs 'yarn 
resourcemanager' it will not generate a pid file.  This in turn means that if 
one were to modify only yarn-daemon.sh,  the status subcommand will be giving 
incorrect information because it doesn't see a pid file.  Now one could try to 
suss out the Java process running the RM, but that's a bit to fragile for my 
tastes.  Another option would be to just do even more copypasta in the shell 
code, but that's just making bad code even worse.

There's been talking of backporting HADOOP-9902 to branch-2, so it is 
worthwhile to take a wait and see approach, especially given the door is pretty 
shut on getting anything more into 2.5.

 Add a 'status' command to yarn-daemon.sh
 

 Key: YARN-2346
 URL: https://issues.apache.org/jira/browse/YARN-2346
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
Reporter: Nikunj Bansal
Assignee: Allen Wittenauer
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 Adding a 'status' command to yarn-daemon.sh will be useful for finding out 
 the status of yarn daemons.
 Running the 'status' command should exit with a 0 exit code if the target 
 daemon is running and non-zero code in case its not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2346) Add a 'status' command to yarn-daemon.sh

2014-07-26 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075508#comment-14075508
 ] 

Allen Wittenauer edited comment on YARN-2346 at 7/26/14 11:10 PM:
--

In order to make this work reliably, it's a significant refactoring of how 
daemons launch.  All of that refactoring has already been done in HADOOP-9902.  

Specifically, pid handling has to get moved to the yarn, hdfs, and mapred 
commands from the *-daemon.sh commands.  For example, if one runs 'yarn 
resourcemanager' it will not generate a pid file.  This in turn means that if 
one were to modify only yarn-daemon.sh,  the status subcommand will be giving 
incorrect information because it doesn't see a pid file.  Now one could try to 
suss out the Java process running the RM, but that's a bit to fragile for my 
tastes.  Another option would be to just do even more copypasta in the shell 
code, but that's just making bad code even worse.

There's been talk of backporting HADOOP-9902 to branch-2, so it is worthwhile 
to take a wait and see approach, especially given the door is pretty shut on 
getting anything more into 2.5.


was (Author: aw):
In order to make this work reliably, it's a significant refactoring of how 
daemons launch.  All of that refactoring has already been done in HADOOP-9902.  

Specifically, pid handling has to get moved to the yarn, hdfs, and mapred 
commands from the *-daemon.sh commands.  For example, if one runs 'yarn 
resourcemanager' it will not generate a pid file.  This in turn means that if 
one were to modify only yarn-daemon.sh,  the status subcommand will be giving 
incorrect information because it doesn't see a pid file.  Now one could try to 
suss out the Java process running the RM, but that's a bit to fragile for my 
tastes.  Another option would be to just do even more copypasta in the shell 
code, but that's just making bad code even worse.

There's been talking of backporting HADOOP-9902 to branch-2, so it is 
worthwhile to take a wait and see approach, especially given the door is pretty 
shut on getting anything more into 2.5.

 Add a 'status' command to yarn-daemon.sh
 

 Key: YARN-2346
 URL: https://issues.apache.org/jira/browse/YARN-2346
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
Reporter: Nikunj Bansal
Assignee: Allen Wittenauer
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 Adding a 'status' command to yarn-daemon.sh will be useful for finding out 
 the status of yarn daemons.
 Running the 'status' command should exit with a 0 exit code if the target 
 daemon is running and non-zero code in case its not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075518#comment-14075518
 ] 

Junping Du commented on YARN-2347:
--

Thanks for review and comments, [~zjshen]!
Nice catch for javadoc issue, will fix it soon. 
For naming of this generic version, I don't have strong preference on which is 
better. YarnVersion seems to be a little misleading as we already had yarn 
version command line to list the version of YARN. Version sounds too generic 
and easily get duplicated (we had a writable object with the same name in 
Common). Actually, this version get used for RMState, NMState, ShuffleHandler's 
State, etc. In this case, may not sounds so weird to you?

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075529#comment-14075529
 ] 

Zhijie Shen commented on YARN-2347:
---

bq. Actually, this version get used for RMState, NMState, ShuffleHandler's 
State, etc. In this case, may not sounds so weird to you?

IMHO, the version belongs to the stores, but it happens that the stores are 
storing the state information. It's more accurate to say the version is of the 
storage schema. On the other side, timeline server is a stateless machine, but 
it will still use this version stack. StateVersion may make users consider it 
stateful. If StateVersion is going to be only used for the storage layer, 
something like StoreVersion sounds better to me. On the contrary, if it is 
going to be used to annotate other stuff, such as RPC interface, it looks good 
to have more generalized name.

Anyway, it's not a critical problem, and I'm not strong minded about refatoring 
the name. However, it reminds me another issue: it may be better to add some 
more javadoc for StateVersion to let users know what it is really about.




 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.

2014-07-26 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2359:


Attachment: YARN-2359.001.patch

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.

2014-07-26 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075531#comment-14075531
 ] 

zhihai xu commented on YARN-2359:
-

I just added a unit test case (testAMCrashAtScheduled) in the patch to verify 
this state transition in RMAppAttempt state machine.

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.

2014-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075537#comment-14075537
 ] 

Hadoop QA commented on YARN-2359:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658009/YARN-2359.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4448//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4448//console

This message is automatically generated.

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075538#comment-14075538
 ] 

Junping Du commented on YARN-2347:
--

bq. On the other side, timeline server is a stateless machine, but it will 
still use this version stack. StateVersion may make users consider it stateful. 
If StateVersion is going to be only used for the storage layer, something like 
StoreVersion sounds better to me.
That's good point. Can we think it is for application's state that stored in 
timeline store? If still no reasonable, let's get back to version. The problem 
of StoreVersion is: it sounds like a version for store implementation. For 
example, v1 for LevelDB, v2 for some others (HBase), etc. What do you think?
bq.  it may be better to add some more javadoc for StateVersion to let users 
know what it is really about.
Also good point. Will fix it soon.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps

2014-07-26 Thread Ram Venkatesh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075554#comment-14075554
 ] 

Ram Venkatesh commented on YARN-2362:
-

I agree that apps that need the entire cluster capacity  are likely not common. 
However, I think the scenario above can happen in busy clusters where an app 
might make a request that exceeds _current_ capacity and hence block all other 
apps. Yes, whenever more resources get freed up and App1's request is 
satisfied, only then will App2 run. Note, since we are enumerating the set of 
active apps, the behavior is actually non-deterministic - if the new app 
happens to be enumerated before the large app, the allocation request will 
actually be satisfied. The change proposed here makes it deterministic and can 
also reduce the wait for jobs that can complete - the downside of course is the 
large app can now experience starvation if small apps keep getting through. 

 Capacity Scheduler apps with requests that exceed capacity can starve pending 
 apps
 --

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2362) Capacity Scheduler: apps with requests that exceed capacity can starve pending apps

2014-07-26 Thread Ram Venkatesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh updated YARN-2362:


Summary: Capacity Scheduler: apps with requests that exceed capacity can 
starve pending apps  (was: Capacity Scheduler apps with requests that exceed 
capacity can starve pending apps)

 Capacity Scheduler: apps with requests that exceed capacity can starve 
 pending apps
 ---

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2362) Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps

2014-07-26 Thread Ram Venkatesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh updated YARN-2362:


Summary: Capacity Scheduler: apps with requests that exceed current 
capacity can starve pending apps  (was: Capacity Scheduler: apps with requests 
that exceed capacity can starve pending apps)

 Capacity Scheduler: apps with requests that exceed current capacity can 
 starve pending apps
 ---

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2347:
-

Attachment: YARN-2347-v4.patch

Update patch in v4 as [~zjshen]'s comments.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)