[jira] [Commented] (YARN-2103) Fix code bug in SerializedExceptionPBImpl

2014-05-27 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009347#comment-14009347
 ] 

Binglin Chang commented on YARN-2103:
-

I plan to add generic test to test all PBImpls in YARN-2051, so separated tests 
are not needed.

 Fix code bug in SerializedExceptionPBImpl
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2103.v1.patch


 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-05-27 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2075:


Summary: TestRMAdminCLI consistently fail on trunk and branch-2  (was: 
TestRMAdminCLI consistently fail on trunk)

 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-05-27 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009749#comment-14009749
 ] 

Mit Desai commented on YARN-2075:
-

Hi Kenji,
I applied the patch to trunk and branch-2. The tests still fail

 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1728) History server doesn't understand percent encoded paths

2014-05-27 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009795#comment-14009795
 ] 

jay vyas commented on YARN-1728:


Possble link to MAPREDUCE-5902, not sure exactly how this would pop up in two 
places, but it seems almost the exact same problem, just on the DFS side 
instead of on the web side.

 History server doesn't understand percent encoded paths
 ---

 Key: YARN-1728
 URL: https://issues.apache.org/jira/browse/YARN-1728
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Abraham Elmahrek

 For example, going to the job history server page 
 http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
  results in the following error:
 {code}
 Cannot get container logs. Invalid nodeId: 
 test-cdh5-hue.ent.cloudera.com%3A8041
 {code}
 Where the url decoded version works:
 http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
 It seems like both should be supported as the former is simply percent 
 encoding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009807#comment-14009807
 ] 

Tsuyoshi OZAWA commented on YARN-2096:
--

One good news: TestRMRestart with Anubhav's patch works well - after running 
tests hundreds times, no failure. Good job :-)

 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.5.0

 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-05-27 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1680:
--

Attachment: YARN-1680-v2.patch

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-05-27 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009893#comment-14009893
 ] 

Jian Fang commented on YARN-796:


Hi Bikas, I think it is better to have the node manager to specify its own 
labels and then it registers the labels with RM. 

Also, it would be great if YARN could provide an API to add/update labels to a 
node. This is based on the following scenario.

Usually a hadoop cluster in cloud is elastic, that is to say, the cluster size 
can be automatically or manually expended or shrunk based on cluster situation, 
for example, idleness.  When a node in a cluster is chosen to be shrunk, i.e., 
to be removed, we could call the API to label the node so that no more tasks 
would be assigned to this node.  

We could use the decommission API to achieve this goal, but I think the label 
API may be more elegant. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009905#comment-14009905
 ] 

Hadoop QA commented on YARN-1474:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646928/YARN-1474.17.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3833//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3833//console

This message is automatically generated.

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0, 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
 YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
 YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
 YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, 
 YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, 
 YARN-1474.9.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009928#comment-14009928
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

The three test failures of TestFairScheduler are filed as YARN-2105. 

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0, 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
 YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
 YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
 YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, 
 YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, 
 YARN-1474.9.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009931#comment-14009931
 ] 

Hadoop QA commented on YARN-1680:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646932/YARN-1680-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3834//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3834//console

This message is automatically generated.

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2106) TestFairScheduler in trunk is failing

2014-05-27 Thread Wei Yan (JIRA)
Wei Yan created YARN-2106:
-

 Summary: TestFairScheduler in trunk is failing
 Key: YARN-2106
 URL: https://issues.apache.org/jira/browse/YARN-2106
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan


Some issues due to the Queue Placement policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2106) TestFairScheduler in trunk is failing

2014-05-27 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved YARN-2106.
--

Resolution: Duplicate

 TestFairScheduler in trunk is failing
 -

 Key: YARN-2106
 URL: https://issues.apache.org/jira/browse/YARN-2106
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan

 Some issues due to the Queue Placement policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-05-27 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010005#comment-14010005
 ] 

Chen He commented on YARN-1680:
---

These three errors are reported in 
[YARN-2105|https://issues.apache.org/jira/browse/YARN-2105] and not related to 
this JIRA.

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2105) Three TestFairScheduler tests fail in trunk

2014-05-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010013#comment-14010013
 ] 

Karthik Kambatla commented on YARN-2105:


Looks good to me. I ll wait for Sandy also to take a look. 

 Three TestFairScheduler tests fail in trunk
 ---

 Key: YARN-2105
 URL: https://issues.apache.org/jira/browse/YARN-2105
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ashwin Shankar
 Attachments: YARN-2105-v1.txt


 The following tests fail in trunk:
 {code}
 Failed tests:
   TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0
 Tests in error:
   TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer
   TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2107:
--

Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-1530)

 Refactor timeline classes into server.timeline package
 --

 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Right now, most of timeline-server classes are present in an 
 applicationhistoryserver package instead of a top level timeline package.
 This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-2107:
-

 Summary: Refactor timeline classes into server.timeline package
 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Right now, most of timeline-server classes are present in an 
applicationhistoryserver package instead of a top level timeline package.

This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2105) Three TestFairScheduler tests fail in trunk

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010164#comment-14010164
 ] 

Tsuyoshi OZAWA commented on YARN-2105:
--

The patch works well on my local.

 Three TestFairScheduler tests fail in trunk
 ---

 Key: YARN-2105
 URL: https://issues.apache.org/jira/browse/YARN-2105
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ashwin Shankar
 Attachments: YARN-2105-v1.txt


 The following tests fail in trunk:
 {code}
 Failed tests:
   TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0
 Tests in error:
   TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer
   TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2107:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Refactor timeline classes into server.timeline package
 --

 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Right now, most of timeline-server classes are present in an 
 applicationhistoryserver package instead of a top level timeline package.
 This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2107:
--

Attachment: YARN-2107.txt

Here's a simple eclipse-refactor patch attached.

Easiest way to review if on git - apply the patch, git add new files and 
run git diff -M

 Refactor timeline classes into server.timeline package
 --

 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-2107.txt


 Right now, most of timeline-server classes are present in an 
 applicationhistoryserver package instead of a top level timeline package.
 This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010200#comment-14010200
 ] 

Hadoop QA commented on YARN-2107:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646971/YARN-2107.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer
  
org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryClientService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3835//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3835//console

This message is automatically generated.

 Refactor timeline classes into server.timeline package
 --

 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-2107.txt


 Right now, most of timeline-server classes are present in an 
 applicationhistoryserver package instead of a top level timeline package.
 This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2108) Show minShare on RM Scheduler page

2014-05-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2108:
--

Description: 
Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
MaxCapacity.
It would be better to show MinShare with possibly different color code, so that 
we know queue is running more than its min share. 

 Show minShare on RM Scheduler page
 --

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li

 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2014-05-27 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010234#comment-14010234
 ] 

Wei Yan commented on YARN-1021:
---

[~cristiana.voicu], the SLS directly supports rumen traces. In general, you 
need to have some existing workload traces (i.e., from some production 
clusters), and then use Rumen to generate workload traces. Then let the SLS 
load these traces. Or you can generate some traces randomly (random # of jobs, 
requests, lifetime, etc).
Sorry that I don't have the traces used in that page right now.

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities

2014-05-27 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010244#comment-14010244
 ] 

Ashwin Shankar commented on YARN-2099:
--

Ah, I didn't know about YARN-596,this is very nice ! 
I agree with Sandy's comment. Keeping app preemption based on leaf queue's 
scheduling policy and having a separate policy
which is purely based on priority makes sense to me.

 Preemption in fair scheduler should consider app priorities
 ---

 Key: YARN-2099
 URL: https://issues.apache.org/jira/browse/YARN-2099
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.5.0
Reporter: Ashwin Shankar

 Fair scheduler should take app priorities into account while
 preempting containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-05-27 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

upmerged to latest

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities

2014-05-27 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010251#comment-14010251
 ] 

Wei Yan commented on YARN-2099:
---

Hey, [~ashwinshankar77], Are you working on this one? If not, I would like to 
take it.

 Preemption in fair scheduler should consider app priorities
 ---

 Key: YARN-2099
 URL: https://issues.apache.org/jira/browse/YARN-2099
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.5.0
Reporter: Ashwin Shankar

 Fair scheduler should take app priorities into account while
 preempting containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities

2014-05-27 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010269#comment-14010269
 ] 

Ashwin Shankar commented on YARN-2099:
--

Hey [~ywskycn], please go ahead.

 Preemption in fair scheduler should consider app priorities
 ---

 Key: YARN-2099
 URL: https://issues.apache.org/jira/browse/YARN-2099
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.5.0
Reporter: Ashwin Shankar

 Fair scheduler should take app priorities into account while
 preempting containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010277#comment-14010277
 ] 

Zhijie Shen commented on YARN-2107:
---

+1 for the new namespace. The test failure is caused by the defaults:

{code}
  property
descriptionStore class name for timeline store./description
nameyarn.timeline-service.store-class/name

valueorg.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore/value
  /property
{code}

We need to change yarn-default.xml accordingly.

 Refactor timeline classes into server.timeline package
 --

 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-2107.txt


 Right now, most of timeline-server classes are present in an 
 applicationhistoryserver package instead of a top level timeline package.
 This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2099) Preemption in fair scheduler should consider app priorities

2014-05-27 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-2099:
-

Assignee: Wei Yan

 Preemption in fair scheduler should consider app priorities
 ---

 Key: YARN-2099
 URL: https://issues.apache.org/jira/browse/YARN-2099
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Wei Yan

 Fair scheduler should take app priorities into account while
 preempting containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010319#comment-14010319
 ] 

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646981/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3836//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3836//console

This message is automatically generated.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-05-27 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010358#comment-14010358
 ] 

Thomas Graves commented on YARN-1769:
-

TestFairScheduler is failing for other reasons.  see 
https://issues.apache.org/jira/browse/YARN-2105.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2107:
--

Attachment: YARN-2107.1.txt

Tx for the review and the tip Zhijie. I fixed both yarn-default.xml and the 
documentation.

Technically the rename is an incompatible change the LevelDBStore impl. But 
Timeline service wasn't 'declared' stable, so I am not creating any 
compatibility bridges.

 Refactor timeline classes into server.timeline package
 --

 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-2107.1.txt, YARN-2107.txt


 Right now, most of timeline-server classes are present in an 
 applicationhistoryserver package instead of a top level timeline package.
 This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010372#comment-14010372
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

[~kkambatl], v17 is ready for review. could you take a look?

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0, 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
 YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
 YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
 YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, 
 YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, 
 YARN-1474.9.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010448#comment-14010448
 ] 

Hadoop QA commented on YARN-2107:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646997/YARN-2107.1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3837//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3837//console

This message is automatically generated.

 Refactor timeline classes into server.timeline package
 --

 Key: YARN-2107
 URL: https://issues.apache.org/jira/browse/YARN-2107
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-2107.1.txt, YARN-2107.txt


 Right now, most of timeline-server classes are present in an 
 applicationhistoryserver package instead of a top level timeline package.
 This is one part of YARN-2043, there is more to do..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues

2014-05-27 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar reassigned YARN-1961:


Assignee: Ashwin Shankar

 Fair scheduler preemption doesn't work for non-leaf queues
 --

 Key: YARN-1961
 URL: https://issues.apache.org/jira/browse/YARN-1961
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler

 Setting minResources and minSharePreemptionTimeout to a non-leaf queue 
 doesn't cause preemption to happen when that non-leaf queue is below 
 minResources and there are outstanding demands in that non-leaf queue.
 Here is an example fs allocation config(partial) :
 {code:xml}
 queue name=abc
   minResources3072 mb,0 vcores/minResources
   minSharePreemptionTimeout30/minSharePreemptionTimeout
 queue name=childabc1
 /queue
 queue name=childabc2
 /queue
  /queue
  {code}
 With the above configs,preemption doesn't seem to happen if queue abc is 
 below minShare and it has outstanding unsatisfied demands from apps in its 
 child queues. Ideally in such cases we would like preemption to kick off and 
 reclaim resources from other queues(not under queue abc).
 Looking at the code it seems like preemption checks for starvation only at 
 the leaf queue level and not at the parent level.
 {code:title=FairScheduler.java|borderStyle=solid}
 boolean isStarvedForMinShare(FSLeafQueue sched)
 boolean isStarvedForFairShare(FSLeafQueue sched)
 {code}
 This affects our use case where we have a parent queue with probably a 100 
 unconfigured leaf queues under it.We want to give a minshare to the parent 
 queue to protect all the leaf queues under it,but we cannot do it due to this 
 bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2105) Three TestFairScheduler tests fail in trunk

2014-05-27 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010497#comment-14010497
 ] 

Sandy Ryza commented on YARN-2105:
--

+1.  Thanks for the quick turnaround on this Ashwin.

 Three TestFairScheduler tests fail in trunk
 ---

 Key: YARN-2105
 URL: https://issues.apache.org/jira/browse/YARN-2105
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ashwin Shankar
 Attachments: YARN-2105-v1.txt


 The following tests fail in trunk:
 {code}
 Failed tests:
   TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0
 Tests in error:
   TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer
   TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2108) Show minShare on RM Fair Scheduler page

2014-05-27 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2108:
-

Summary: Show minShare on RM Fair Scheduler page  (was: Show minShare on RM 
Scheduler page)

 Show minShare on RM Fair Scheduler page
 ---

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li

 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2108) Show minShare on RM Fair Scheduler page

2014-05-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2108:
--

Attachment: YARN-2108.v1.patch

 Show minShare on RM Fair Scheduler page
 ---

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2108.v1.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1801) NPE in public localizer

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010507#comment-14010507
 ] 

Tsuyoshi OZAWA commented on YARN-1801:
--

Looks good to me(non-binding). [~jlowe], can you take a look please?

 NPE in public localizer
 ---

 Key: YARN-1801
 URL: https://issues.apache.org/jira/browse/YARN-1801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Hong Zhiguo
Priority: Critical
 Attachments: YARN-1801.patch


 While investigating YARN-1800 found this in the NM logs that caused the 
 public localizer to shutdown:
 {noformat}
 2014-01-23 01:26:38,655 INFO  localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:addResource(651)) - Downloading public 
 rsrc:{ 
 hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
  1390440382009, FILE, null }
 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:run(726)) - Error: Shutting down
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
 2014-01-23 01:26:38,656 INFO  localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:run(728)) - Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2105) Fix TestFairScheduler after YARN-2012

2014-05-27 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2105:
-

Summary: Fix TestFairScheduler after YARN-2012  (was: Three 
TestFairScheduler tests fail in trunk)

 Fix TestFairScheduler after YARN-2012
 -

 Key: YARN-2105
 URL: https://issues.apache.org/jira/browse/YARN-2105
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ashwin Shankar
 Fix For: 2.5.0

 Attachments: YARN-2105-v1.txt


 The following tests fail in trunk:
 {code}
 Failed tests:
   TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0
 Tests in error:
   TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer
   TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2108) Show minShare on RM Fair Scheduler page

2014-05-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2108:
--

Attachment: YARN-2108.v2.patch

 Show minShare on RM Fair Scheduler page
 ---

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2108.v1.patch, YARN-2108.v2.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1801) NPE in public localizer

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010560#comment-14010560
 ] 

Hadoop QA commented on YARN-1801:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646195/YARN-1801.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3839//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3839//console

This message is automatically generated.

 NPE in public localizer
 ---

 Key: YARN-1801
 URL: https://issues.apache.org/jira/browse/YARN-1801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Hong Zhiguo
Priority: Critical
 Attachments: YARN-1801.patch


 While investigating YARN-1800 found this in the NM logs that caused the 
 public localizer to shutdown:
 {noformat}
 2014-01-23 01:26:38,655 INFO  localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:addResource(651)) - Downloading public 
 rsrc:{ 
 hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
  1390440382009, FILE, null }
 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:run(726)) - Error: Shutting down
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
 2014-01-23 01:26:38,656 INFO  localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:run(728)) - Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-2091:


Assignee: Tsuyoshi OZAWA

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA

 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010580#comment-14010580
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

ContainerManagerImpl cannot distinguish the exit reason because 
ContainersMonitorImpl dispatches ContainerKillEvent without the exit reason 
currently. I plan to add exit reason to ContainerKillEvent. Please let me know 
if you have better idea.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA

 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010585#comment-14010585
 ] 

Bikas Saha commented on YARN-2091:
--

Thats the missing pieces AFAIK. That exit reason needs to be passed along 
internally through the NM and then on to the RM and AM. Maybe simply directly 
use ContainerExitStatus instead of a new reason object inside 
ContainerKillEvent.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA

 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-27 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-596:
-

Attachment: YARN-596.patch

Update a new patch after YARN-2105 is in.

 In fair scheduler, intra-application container priorities affect 
 inter-application preemption decisions
 ---

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2108) Show minShare on RM Fair Scheduler page

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010611#comment-14010611
 ] 

Hadoop QA commented on YARN-2108:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647016/YARN-2108.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3840//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3840//console

This message is automatically generated.

 Show minShare on RM Fair Scheduler page
 ---

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2108.v1.patch, YARN-2108.v2.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010614#comment-14010614
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

Hi Bikas, let me clarify what simply directly use means. I meant to pass exit 
reason via ContainerKillEvent like {{ContainerKillEvent(containerId, 
ContainerExitStatus.KILL_EXCEEDED_MEMORY, msg)}}. Is this out of way?

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA

 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-27 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
  Labels: easyfix
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010630#comment-14010630
 ] 

Bikas Saha commented on YARN-2091:
--

We are on the same page. The kill reason is directly a ContainerExitStatus.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA

 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010643#comment-14010643
 ] 

Hadoop QA commented on YARN-596:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647026/YARN-596.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3841//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3841//console

This message is automatically generated.

 In fair scheduler, intra-application container priorities affect 
 inter-application preemption decisions
 ---

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010659#comment-14010659
 ] 

Hadoop QA commented on YARN-1913:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647029/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3842//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3842//console

This message is automatically generated.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
  Labels: easyfix
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-27 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010700#comment-14010700
 ] 

Binglin Chang commented on YARN-2103:
-

Hi [~ozawa], thanks for reviewing the patch and the comments. I use the 
original title because the bug isn't just about inconsistent viaProto, but also 
lack of equals and hashcode method(which will affect other records who uses 
SerializedException), I guess I should point out all bugs in the jira. 

about code format, most PBImpl classes use those common code:
{code}
  private void maybeInitBuilder() {
if (viaProto || builder == null) {
  builder = GetApplicationsRequestProto.newBuilder(proto);
}
viaProto = false;
  }

  @Override
  public int hashCode() {
return getProto().hashCode();
  }

  @Override
  public boolean equals(Object other) {
if (other == null)
  return false;
if (other.getClass().isAssignableFrom(this.getClass())) {
  return this.getProto().equals(this.getClass().cast(other).getProto());
}
return false;
  }

{code}

you can see GetApplicationsRequestPBImpl/GetApplicationsResponsePBImpl,  I just 
follow those patterns, maybe we can change them all in another JIRA, changing 
them may not fit into in this JIRA. 

bq.  How about adding concrete tests as a first step of generic tests on 
YARN-2051. 
After generic test are added, those old tests are probably redundant and can be 
removed. Guess we can discuss this in the future. I can provide a separate test 
currently.



 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2103.v1.patch


 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-27 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2103:


Description: 
Bug 1:
{code}
  SerializedExceptionProto proto = SerializedExceptionProto
  .getDefaultInstance();
  SerializedExceptionProto.Builder builder = null;
  boolean viaProto = false;
{code}

Since viaProto is false, we should initiate build rather than proto

Bug 2:
the class does not provide hashcode() and equals() like other PBImpl records, 
this class is used in other records, it may affect other records' behavior. 



  was:
{code}
  SerializedExceptionProto proto = SerializedExceptionProto
  .getDefaultInstance();
  SerializedExceptionProto.Builder builder = null;
  boolean viaProto = false;
{code}

Since viaProto is false, we should initiate build rather than proto



 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2103.v1.patch


 Bug 1:
 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto
 Bug 2:
 the class does not provide hashcode() and equals() like other PBImpl records, 
 this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010728#comment-14010728
 ] 

Karthik Kambatla commented on YARN-1474:


Thanks Tsuyoshi. We are very close to getting this in. Few minor comments:
# In each of the schedulers, I don't think we need the following snippet or for 
that matter the variable {{initialized}} at all. {{reinitialize()}} would have 
just the contents of else-block. When using the scheduler, one should 
setRMContext(), init() and then reinitialize() thereafter.
{code}
if (!initialized) {
  this.rmContext = rmContext;
  initScheduler(configuration);
  startSchedulerThreads();
} else {
{code}
# ResourceSchedulerWrapper should override serviceInit, serviceStart and 
serviceStop methods. Not init, start and stop. 
# I have a feeling we ll have to update some tests including the ones that are 
modified in the latest patch to call scheduler.init() right after 
scheduler.setRMContext, if we are not using the scheduler from a MockRM or 
ResourceManager instance.

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0, 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
 YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
 YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
 YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, 
 YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, 
 YARN-1474.9.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-27 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010791#comment-14010791
 ] 

Sandy Ryza commented on YARN-596:
-

Thanks Wei.  Getting close - a few more comments.

{code}
+  private static final ResourceCalculator RESOURCE_CALCULATOR =
+  new DefaultResourceCalculator();
{code}
This is no longer needed in FSQueue, right?

FIFOPolicy should throw an unsupported operation exception if its 
checkIfUsageOverFairShare is called.

fairshare should be fairShare

 In fair scheduler, intra-application container priorities affect 
 inter-application preemption decisions
 ---

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-05-27 Thread Kenji Kikushima (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010804#comment-14010804
 ] 

Kenji Kikushima commented on YARN-2075:
---

Hi [~mitdesai ], thanks for your testing.
I also tested patch on trunk locally, and confirmed TestRMAdminCLI passed.
This patch contains modification for HAAdmin.java. Please refresh if you didn't 
refresh o.a.h.ha yet.
{noformat}
$ mvn test -Dtest=org.apache.hadoop.yarn.client.TestRMAdminCLI
[INFO] Scanning for projects...
[INFO]
[INFO] 
[INFO] Building hadoop-yarn-client 3.0.0-SNAPSHOT
[INFO] 
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-yarn-client 
---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-resources-plugin:2.2:resources (default-resources) @ 
hadoop-yarn-client ---
[INFO] Using default encoding to copy filtered resources.
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ 
hadoop-yarn-client ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-resources-plugin:2.2:testResources (default-testResources) @ 
hadoop-yarn-client ---
[INFO] Using default encoding to copy filtered resources.
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ 
hadoop-yarn-client ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hadoop-yarn-client 
---
[INFO] Surefire report directory: 
/home/user/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/surefire-reports

---
 T E S T S
---

---
 T E S T S
---
Running org.apache.hadoop.yarn.client.TestRMAdminCLI
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.308 sec - in 
org.apache.hadoop.yarn.client.TestRMAdminCLI

Results :

Tests run: 13, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 4.266s
[INFO] Finished at: Wed May 28 13:33:31 UTC 2014
[INFO] Final Memory: 17M/268M
[INFO] 
{noformat}


 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)