[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-11 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623676#comment-14623676
 ] 

Varun Saxena commented on YARN-3893:


For 2nd option, we will have to return STANDBY to client if the state is 
WAITING_FOR_ACTIVE. So it can primarily be a RM internal state.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3917) getResourceCalculatorPlugin for the default should intercept all exceptions

2015-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623675#comment-14623675
 ] 

Hadoop QA commented on YARN-3917:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 18s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| | |  40m 20s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744858/HADOOP-1.001.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1df39c1 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8512/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8512/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8512/console |


This message was automatically generated.

> getResourceCalculatorPlugin for the default should intercept all exceptions
> ---
>
> Key: YARN-3917
> URL: https://issues.apache.org/jira/browse/YARN-3917
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Fix For: 2.8.0
>
> Attachments: HADOOP-1.001.patch
>
>
> Since the user has not configured a specific plugin, any problems with the 
> default resource calculator instantiation should be ignored.
> {code}
> 2015-07-10 08:16:18,445 INFO org.apache.hadoop.service.AbstractService: 
> Service containers-monitor failed in state INITED; cause: 
> java.lang.UnsupportedOperationException: Could not determine OS
> java.lang.UnsupportedOperationException: Could not determine OS
> at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.(ResourceCalculatorPlugin.java:37)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getResourceCalculatorPlugin(ResourceCalculatorPlugin.java:160)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.serviceInit(ContainersMonitorImpl.java:108)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:249)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:312)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:547)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:595)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3917) getResourceCalculatorPlugin for the default should intercept all exceptions

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623673#comment-14623673
 ] 

Hudson commented on YARN-3917:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8152 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8152/])
YARN-3917. getResourceCalculatorPlugin for the default should intercept all 
exceptions. (gera) (gera: rev d7319dee37ea93f1a1ba4153ea63ea8010ba2441)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java


> getResourceCalculatorPlugin for the default should intercept all exceptions
> ---
>
> Key: YARN-3917
> URL: https://issues.apache.org/jira/browse/YARN-3917
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: HADOOP-1.001.patch
>
>
> Since the user has not configured a specific plugin, any problems with the 
> default resource calculator instantiation should be ignored.
> {code}
> 2015-07-10 08:16:18,445 INFO org.apache.hadoop.service.AbstractService: 
> Service containers-monitor failed in state INITED; cause: 
> java.lang.UnsupportedOperationException: Could not determine OS
> java.lang.UnsupportedOperationException: Could not determine OS
> at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.(ResourceCalculatorPlugin.java:37)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getResourceCalculatorPlugin(ResourceCalculatorPlugin.java:160)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.serviceInit(ContainersMonitorImpl.java:108)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:249)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:312)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:547)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:595)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3917) getResourceCalculatorPlugin for the default should intercept all exceptions

2015-07-11 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-3917:

Summary: getResourceCalculatorPlugin for the default should intercept all 
exceptions  (was: getResourceCalculatorPlugin for the default should intercept 
all excpetions)

> getResourceCalculatorPlugin for the default should intercept all exceptions
> ---
>
> Key: YARN-3917
> URL: https://issues.apache.org/jira/browse/YARN-3917
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: HADOOP-1.001.patch
>
>
> Since the user has not configured a specific plugin, any problems with the 
> default resource calculator instantiation should be ignored.
> {code}
> 2015-07-10 08:16:18,445 INFO org.apache.hadoop.service.AbstractService: 
> Service containers-monitor failed in state INITED; cause: 
> java.lang.UnsupportedOperationException: Could not determine OS
> java.lang.UnsupportedOperationException: Could not determine OS
> at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.(ResourceCalculatorPlugin.java:37)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getResourceCalculatorPlugin(ResourceCalculatorPlugin.java:160)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.serviceInit(ContainersMonitorImpl.java:108)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:249)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:312)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:547)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:595)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3917) getResourceCalculatorPlugin for the default should intercept all excpetions

2015-07-11 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623663#comment-14623663
 ] 

Gera Shegalov commented on YARN-3917:
-

Thanks [~chris.douglas] for review. Moved JIRA to YARN beause 
{{ResourceCalculatorPlugin.java}} is in hadoop-yarn-common.

> getResourceCalculatorPlugin for the default should intercept all excpetions
> ---
>
> Key: YARN-3917
> URL: https://issues.apache.org/jira/browse/YARN-3917
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: HADOOP-1.001.patch
>
>
> Since the user has not configured a specific plugin, any problems with the 
> default resource calculator instantiation should be ignored.
> {code}
> 2015-07-10 08:16:18,445 INFO org.apache.hadoop.service.AbstractService: 
> Service containers-monitor failed in state INITED; cause: 
> java.lang.UnsupportedOperationException: Could not determine OS
> java.lang.UnsupportedOperationException: Could not determine OS
> at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.(ResourceCalculatorPlugin.java:37)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getResourceCalculatorPlugin(ResourceCalculatorPlugin.java:160)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.serviceInit(ContainersMonitorImpl.java:108)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:249)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:312)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:547)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:595)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3917) getResourceCalculatorPlugin for the default should intercept all excpetions

2015-07-11 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov moved HADOOP-1 to YARN-3917:
--

Affects Version/s: (was: 2.8.0)
   2.8.0
 Target Version/s: 2.8.0  (was: 2.8.0)
  Key: YARN-3917  (was: HADOOP-1)
  Project: Hadoop YARN  (was: Hadoop Common)

> getResourceCalculatorPlugin for the default should intercept all excpetions
> ---
>
> Key: YARN-3917
> URL: https://issues.apache.org/jira/browse/YARN-3917
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: HADOOP-1.001.patch
>
>
> Since the user has not configured a specific plugin, any problems with the 
> default resource calculator instantiation should be ignored.
> {code}
> 2015-07-10 08:16:18,445 INFO org.apache.hadoop.service.AbstractService: 
> Service containers-monitor failed in state INITED; cause: 
> java.lang.UnsupportedOperationException: Could not determine OS
> java.lang.UnsupportedOperationException: Could not determine OS
> at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.(ResourceCalculatorPlugin.java:37)
> at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getResourceCalculatorPlugin(ResourceCalculatorPlugin.java:160)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.serviceInit(ContainersMonitorImpl.java:108)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:249)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:312)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:547)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:595)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled

2015-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623623#comment-14623623
 ] 

Hadoop QA commented on YARN-3916:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 56s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 34s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| | |  40m 19s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744910/YARN-3916.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1df39c1 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8511/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8511/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8511/console |


This message was automatically generated.

> DrainDispatcher#await should wait till event has been completely handled
> 
>
> Key: YARN-3916
> URL: https://issues.apache.org/jira/browse/YARN-3916
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-3916.01.patch
>
>
> DrainDispatcher#await should wait till event has been completely handled.
> Currently it only checks for whether event queue has become empty.
> And in many tests we directly check for a state to be changed after calling 
> await.
> Sometimes, the states do not change by the time we check them as event has 
> not been completely handled.
> This is causing test failures such as YARN-3909 and YARN-3910 and may cause 
> other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk

2015-07-11 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623617#comment-14623617
 ] 

Anubhav Dhoot commented on YARN-3910:
-

LGTM. Tried out the changes and even if I force slow down the scheduler 
dispatcher thread to simulate a slow machine it still passes.

> TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
> 
>
> Key: YARN-3910
> URL: https://issues.apache.org/jira/browse/YARN-3910
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3910.001.patch, YARN-3910.02.patch
>
>
> Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.049 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.031 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-07-11 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623616#comment-14623616
 ] 

Varun Saxena commented on YARN-3644:


[~raju.bairishetti],
bq. I ran the test in debugger mode. also. Test is hitting all the source 
changes
I did not mean that test is not hitting the change. It is doing so and I 
verified it as well from logs.
What I meant is some Mockito#verify statements can be added or some other 
assertions added to check if required functions or flows are getting hit.
Because the assertion of Service state being STARTED is something which can 
happen irrespective of whether your code is hit or not. 
Let us say somebody changes the code in future in a manner where your part of 
the code is conditional. Unlikely but you never know what happens 6 months down 
the line. 
If you have verification statements checking whether your code has been called 
or not, your test case would fail after future changes, if those parts of code 
are not called. This would force the developer to change your test case as well.
If you have just the check for service being STARTED, after any future changes, 
your tests may still pass despite relevant flow being hit or not. And this may 
mask any mistakes made in this or related part of the main code.  So test case 
should verify if flow is being hit, either by checking function invocations or 
by having a set of assertions which are somewhat unique to test.

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Raju Bairishetti
> Attachments: YARN-3644.001.patch, YARN-3644.001.patch, 
> YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2007) AM expressing the estimated endtime to RM

2015-07-11 Thread Maysam Yabandeh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maysam Yabandeh resolved YARN-2007.
---
Resolution: Not A Problem

> AM expressing the estimated endtime to RM
> -
>
> Key: YARN-2007
> URL: https://issues.apache.org/jira/browse/YARN-2007
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
>
> YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler, 
> which requires RM to know about the estimated end time of jobs. The endtime 
> is estimated by the AppMaster as part of MAPREDUCE-5871. This jira focuses on 
> API updates that allow AM to express this estimated value to the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled

2015-07-11 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3916:
---
Attachment: YARN-3916.01.patch

> DrainDispatcher#await should wait till event has been completely handled
> 
>
> Key: YARN-3916
> URL: https://issues.apache.org/jira/browse/YARN-3916
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-3916.01.patch
>
>
> DrainDispatcher#await should wait till event has been completely handled.
> Currently it only checks for whether event queue has become empty.
> And in many tests we directly check for a state to be changed after calling 
> await.
> Sometimes, the states do not change by the time we check them as event has 
> not been completely handled.
> This is causing test failures such as YARN-3909 and YARN-3910 and may cause 
> other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled

2015-07-11 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3916:
---
Attachment: (was: YARN-3916.01.patch)

> DrainDispatcher#await should wait till event has been completely handled
> 
>
> Key: YARN-3916
> URL: https://issues.apache.org/jira/browse/YARN-3916
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
>
> DrainDispatcher#await should wait till event has been completely handled.
> Currently it only checks for whether event queue has become empty.
> And in many tests we directly check for a state to be changed after calling 
> await.
> Sometimes, the states do not change by the time we check them as event has 
> not been completely handled.
> This is causing test failures such as YARN-3909 and YARN-3910 and may cause 
> other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled

2015-07-11 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3916:
---
Attachment: YARN-3916.01.patch

Updated patch

> DrainDispatcher#await should wait till event has been completely handled
> 
>
> Key: YARN-3916
> URL: https://issues.apache.org/jira/browse/YARN-3916
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-3916.01.patch
>
>
> DrainDispatcher#await should wait till event has been completely handled.
> Currently it only checks for whether event queue has become empty.
> And in many tests we directly check for a state to be changed after calling 
> await.
> Sometimes, the states do not change by the time we check them as event has 
> not been completely handled.
> This is causing test failures such as YARN-3909 and YARN-3910 and may cause 
> other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled

2015-07-11 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623590#comment-14623590
 ] 

Varun Saxena commented on YARN-3916:


Pre YARN-3878 code would have actually worked for DrainDispatcher.

> DrainDispatcher#await should wait till event has been completely handled
> 
>
> Key: YARN-3916
> URL: https://issues.apache.org/jira/browse/YARN-3916
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
>
> DrainDispatcher#await should wait till event has been completely handled.
> Currently it only checks for whether event queue has become empty.
> And in many tests we directly check for a state to be changed after calling 
> await.
> Sometimes, the states do not change by the time we check them as event has 
> not been completely handled.
> This is causing test failures such as YARN-3909 and YARN-3910 and may cause 
> other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-11 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623567#comment-14623567
 ] 

Varun Saxena commented on YARN-3893:


[~sunilg] 
We can do the cleanup(i.e. stop active services) when we switch to standby. We 
do this already. Also cleanup will be done when we stop RM. So this shouldn't 
be an issue.

What is happening is as under :

Let us assume there is RM1 and RM2.
Basically, when exception occurs,  RM1 waits for RM2 to become active and joins 
leader election again. As both RMs' have wrong configuration, RM1 will try to 
become active again(and not switch to standby) after RM2 has tried the same.
Now, as the problem is in call to {{refreshAll}}, both RMs' would be marked as 
ACTIVE in their respective RM Contexts. Because we set it to ACTIVE before 
calling refreshAll.

*The problem reported here is that RM is shown as Active when it is not 
actually ACTIVE i.e. UI is accessible and getServiceState returns both RM as 
Active. And when we access UI or get service state we check what's the state in 
RM Context. And that is ACTIVE.*
So for anyone who is accessing RM from command line or via UI, RM is 
active(*because RM context says so*), when it is not really active. Both RMs' 
are just trying incessantly to become active and failing.

That is why I suggested that we can update the RM Context. Infact changing RM 
context is necessary. We can decide when to stop active services, if at all.

So there are 2 options :
# We can set RM context to standby when exception occurs and stop active 
services. But if we do it, this would mean we will have to redo the work of 
starting active services again if this RM were to become ACTIVE.
# Introduce a new state (say WAITING_FOR_ACTIVE) and set this state when 
exception is thrown and check this state to stop active services when switching 
to standby. And not starting the services again in case of switching to ACTIVE.

Thoughts, [~sunilg], [~xgong] ?

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623566#comment-14623566
 ] 

Bibin A Chundatt commented on YARN-3894:


Testcase failures are not due of this patch to my knowledge. Please check

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3894.patch, capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
> ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.Service

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623564#comment-14623564
 ] 

Hadoop QA commented on YARN-3894:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 46s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  51m 25s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestApplicationCleanup 
|
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744898/0001-YARN-3894.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1df39c1 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8509/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8509/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8509/console |


This message was automatically generated.

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3894.patch, capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623527#comment-14623527
 ] 

Bibin A Chundatt commented on YARN-3894:


[~sunilg] and [~leftnoteasy] Please review patch

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3894.patch, capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
> ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could

[jira] [Updated] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3894:
---
Attachment: 0001-YARN-3894.patch

Attached patch as per discussion.
Please review patch 

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3894.patch, capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
> ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transitio

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623508#comment-14623508
 ] 

Bibin A Chundatt commented on YARN-3894:


Hi [~leftnoteasy]
Thank you for sharing your thoughts.
Will upload the patch soon.

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
> ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedExcepti

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-11 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623502#comment-14623502
 ] 

Sunil G commented on YARN-3893:
---

refreshAll() is doing many set of refresh operations. And exception may come 
from any state. Its better to gracefully close those. So setting state directly 
wont help much, we may need to go through part of transitionToStandby.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-07-11 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623499#comment-14623499
 ] 

Sunil G commented on YARN-3849:
---

Thank you very much [~leftnoteasy] for reviewing and committing this patch. 
Thank you [~rohithsharma] for the analysis and review.

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623498#comment-14623498
 ] 

Sunil G commented on YARN-3894:
---

HI [~leftnoteasy]
Thank you for sharing your thoughts. 
+1 for using  QueueCapacities.getExistingNodeLabels. It will solve the problem.

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
> ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.ap

[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623497#comment-14623497
 ] 

Hudson commented on YARN-3849:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8151 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8151/])
YARN-3849. Too much of preemption activity causing continuos killing of 
containers across queues. (Sunil G via wangda) (wangda: rev 
1df39c1efc9ed26d3f1a5887c31c38c873e0b784)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java


> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-07-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623494#comment-14623494
 ] 

Wangda Tan commented on YARN-3849:
--

Latest patch LGTM, committing.

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623491#comment-14623491
 ] 

Wangda Tan commented on YARN-3894:
--

After read comment from [~sunilg], I know what's the problem. In 
ParentQueue.setChilQueues, using QueueCapacities.getExistingNodeLabels instead 
of labelManager.getClusterNodeLabels should be able to solve this problem.

Thoughts? [~sunilg], [~bibinchundatt].





> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
> ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMIS

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623489#comment-14623489
 ] 

Wangda Tan commented on YARN-3894:
--

Hi [~bibinchundatt],
Are you using latest trunk? In latest trunk, node label related capacity 
checking for capacity scheduler is not related to node label manager 
initialization. Misconfiguration of node label capacity should fail CS.

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration 
> for already added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
> queues.
> java.io.IOException: Failed to re-init queues
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
> children of queue root for label=node2
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
> ... 8 more
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
> DESCRIPTION=Exception refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-07-11 Thread Raju Bairishetti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623462#comment-14623462
 ] 

Raju Bairishetti commented on YARN-3644:


Thanks [~varun_saxena] for the review and comments.

bq. The config name is yarn.nodemanager.shutdown.on.RM.connection.failures. All 
our config names are in lowercase, just for the sake of consistency, maybe RM 
can be in lowercase too. Thoughts?
  Agree. Will change it to lower case.

bq. The test doesnt really check for whether ConnectionException was thrown or 
NM Shutdown event was called or not.
   I ran the test in debugger mode. also. Test is hitting all the source 
changes. *I agree, I will rewrite this test using Mockito  to make it more 
generic*

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Raju Bairishetti
> Attachments: YARN-3644.001.patch, YARN-3644.001.patch, 
> YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3898) YARN web console only proxies GET to application master but doesn't provide any feedback for other HTTP methods

2015-07-11 Thread Kam Kasravi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623456#comment-14623456
 ] 

Kam Kasravi commented on YARN-3898:
---

Yes - should i mark this as a duplicate? 


 On Saturday, July 11, 2015 7:55 AM, Steve Loughran (JIRA) 
 wrote:
   

 
    [ 
https://issues.apache.org/jira/browse/YARN-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623430#comment-14623430
 ] 

Steve Loughran commented on YARN-3898:
--

YARN-2084 covers not filtering the other methods, but instead proxying them all 
the way through. Individual AMs would get to handle the operations.

Would suit you?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)




> YARN web console only proxies GET to application master but doesn't provide 
> any feedback for other HTTP methods
> ---
>
> Key: YARN-3898
> URL: https://issues.apache.org/jira/browse/YARN-3898
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Kam Kasravi
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> YARN web console should provide some feedback when filtering (and preventing) 
> DELETE, POST, PUT, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623458#comment-14623458
 ] 

Hudson commented on YARN-3116:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2199 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2199/])
YARN-3116. RM notifies NM whether a container is an AM container or normal task 
container. Contributed by Giovanni Matteo Fumarola. (zjshen: rev 
1ea36299a47af302379ae0750b571ec021eb54ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerInitializationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerTerminationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java


> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Fix For: 2.8.0
>
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623457#comment-14623457
 ] 

Hudson commented on YARN-3445:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2199 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2199/])
YARN-3445. Cache runningApps in RMNode for getting running apps on given 
NodeId. (Junping Du via mingma) (mingma: rev 
08244264c0583472b9c4e16591cfde72c6db62a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java


> Cache runningApps in RMNode for getting running apps on given NodeId
> 
>
> Key: YARN-3445
> URL: https://issues.apache.org/jira/browse/YARN-3445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, 
> YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, 
> YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch
>
>
> Per discussion in YARN-3334, we need filter out unnecessary collectors info 
> from RM in heartbeat response. Our propose is to add cache for runningApps in 
> RMNode, so RM only send collectors for local running apps back. This is also 
> needed in YARN-914 (graceful decommission) that if no running apps in NM 
> which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2015-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623434#comment-14623434
 ] 

Hadoop QA commented on YARN-2031:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12699147/YARN-2031-002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1ea3629 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8508/console |


This message was automatically generated.

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: BB2015-05-TBR
> Attachments: YARN-2031-002.patch, YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3898) YARN web console only proxies GET to application master but doesn't provide any feedback for other HTTP methods

2015-07-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623430#comment-14623430
 ] 

Steve Loughran commented on YARN-3898:
--

YARN-2084 covers not filtering the other methods, but instead proxying them all 
the way through. Individual AMs would get to handle the operations.

Would suit you?

> YARN web console only proxies GET to application master but doesn't provide 
> any feedback for other HTTP methods
> ---
>
> Key: YARN-3898
> URL: https://issues.apache.org/jira/browse/YARN-3898
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Kam Kasravi
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> YARN web console should provide some feedback when filtering (and preventing) 
> DELETE, POST, PUT, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623414#comment-14623414
 ] 

Hudson commented on YARN-3445:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #241 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/241/])
YARN-3445. Cache runningApps in RMNode for getting running apps on given 
NodeId. (Junping Du via mingma) (mingma: rev 
08244264c0583472b9c4e16591cfde72c6db62a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


> Cache runningApps in RMNode for getting running apps on given NodeId
> 
>
> Key: YARN-3445
> URL: https://issues.apache.org/jira/browse/YARN-3445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, 
> YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, 
> YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch
>
>
> Per discussion in YARN-3334, we need filter out unnecessary collectors info 
> from RM in heartbeat response. Our propose is to add cache for runningApps in 
> RMNode, so RM only send collectors for local running apps back. This is also 
> needed in YARN-914 (graceful decommission) that if no running apps in NM 
> which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623415#comment-14623415
 ] 

Hudson commented on YARN-3116:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #241 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/241/])
YARN-3116. RM notifies NM whether a container is an AM container or normal task 
container. Contributed by Giovanni Matteo Fumarola. (zjshen: rev 
1ea36299a47af302379ae0750b571ec021eb54ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerTerminationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerInitializationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java


> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Fix For: 2.8.0
>
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623390#comment-14623390
 ] 

Hudson commented on YARN-3445:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2180 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2180/])
YARN-3445. Cache runningApps in RMNode for getting running apps on given 
NodeId. (Junping Du via mingma) (mingma: rev 
08244264c0583472b9c4e16591cfde72c6db62a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java


> Cache runningApps in RMNode for getting running apps on given NodeId
> 
>
> Key: YARN-3445
> URL: https://issues.apache.org/jira/browse/YARN-3445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, 
> YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, 
> YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch
>
>
> Per discussion in YARN-3334, we need filter out unnecessary collectors info 
> from RM in heartbeat response. Our propose is to add cache for runningApps in 
> RMNode, so RM only send collectors for local running apps back. This is also 
> needed in YARN-914 (graceful decommission) that if no running apps in NM 
> which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623391#comment-14623391
 ] 

Hudson commented on YARN-3116:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2180 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2180/])
YARN-3116. RM notifies NM whether a container is an AM container or normal task 
container. Contributed by Giovanni Matteo Fumarola. (zjshen: rev 
1ea36299a47af302379ae0750b571ec021eb54ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerInitializationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerTerminationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java


> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Fix For: 2.8.0
>
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623385#comment-14623385
 ] 

Hudson commented on YARN-3116:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #251 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/251/])
YARN-3116. RM notifies NM whether a container is an AM container or normal task 
container. Contributed by Giovanni Matteo Fumarola. (zjshen: rev 
1ea36299a47af302379ae0750b571ec021eb54ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerInitializationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerTerminationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java


> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Fix For: 2.8.0
>
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623384#comment-14623384
 ] 

Hudson commented on YARN-3445:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #251 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/251/])
YARN-3445. Cache runningApps in RMNode for getting running apps on given 
NodeId. (Junping Du via mingma) (mingma: rev 
08244264c0583472b9c4e16591cfde72c6db62a2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java


> Cache runningApps in RMNode for getting running apps on given NodeId
> 
>
> Key: YARN-3445
> URL: https://issues.apache.org/jira/browse/YARN-3445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, 
> YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, 
> YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch
>
>
> Per discussion in YARN-3334, we need filter out unnecessary collectors info 
> from RM in heartbeat response. Our propose is to add cache for runningApps in 
> RMNode, so RM only send collectors for local running apps back. This is also 
> needed in YARN-914 (graceful decommission) that if no running apps in NM 
> which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk

2015-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623367#comment-14623367
 ] 

Hadoop QA commented on YARN-3910:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   5m 38s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 22s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  51m 26s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  68m 46s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestApplicationCleanup 
|
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744879/YARN-3910.02.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 1ea3629 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8506/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8506/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8506/console |


This message was automatically generated.

> TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
> 
>
> Key: YARN-3910
> URL: https://issues.apache.org/jira/browse/YARN-3910
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3910.001.patch, YARN-3910.02.patch
>
>
> Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.049 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.031 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled

2015-07-11 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623362#comment-14623362
 ] 

Varun Saxena commented on YARN-3916:


In YARN-3878 we introduced check for whether event queue is empty in 
DrainDispatcher#await.
Even pre YARN-3878, code was essentially doing the same but that was through a 
volatile flag. That may have failed sometimes as well. 
But changes to volatile flag were not seen by other thread as quickly as 
checking for event queue being empty hence few of these tests were not failing 
and allowing async dispatcher to handle the event.

We should ideally check whether event has been handled in addition to event 
queue being empty.

> DrainDispatcher#await should wait till event has been completely handled
> 
>
> Key: YARN-3916
> URL: https://issues.apache.org/jira/browse/YARN-3916
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
>
> DrainDispatcher#await should wait till event has been completely handled.
> Currently it only checks for whether event queue has become empty.
> And in many tests we directly check for a state to be changed after calling 
> await.
> Sometimes, the states do not change by the time we check them as event has 
> not been completely handled.
> This is causing test failures such as YARN-3909 and YARN-3910 and may cause 
> other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled

2015-07-11 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-3916:
--

 Summary: DrainDispatcher#await should wait till event has been 
completely handled
 Key: YARN-3916
 URL: https://issues.apache.org/jira/browse/YARN-3916
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical


DrainDispatcher#await should wait till event has been completely handled.
Currently it only checks for whether event queue has become empty.

And in many tests we directly check for a state to be changed after calling 
await.
Sometimes, the states do not change by the time we check them as event has not 
been completely handled.

This is causing test failures such as YARN-3909 and YARN-3910 and may cause 
other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk

2015-07-11 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3910:
---
Attachment: YARN-3910.02.patch

[~adhoot], updated patch after addressing your comments

> TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
> 
>
> Key: YARN-3910
> URL: https://issues.apache.org/jira/browse/YARN-3910
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3910.001.patch, YARN-3910.02.patch
>
>
> Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.049 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.031 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623347#comment-14623347
 ] 

Hudson commented on YARN-3445:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #983 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/983/])
YARN-3445. Cache runningApps in RMNode for getting running apps on given 
NodeId. (Junping Du via mingma) (mingma: rev 
08244264c0583472b9c4e16591cfde72c6db62a2)
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* hadoop-yarn-project/CHANGES.txt


> Cache runningApps in RMNode for getting running apps on given NodeId
> 
>
> Key: YARN-3445
> URL: https://issues.apache.org/jira/browse/YARN-3445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, 
> YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, 
> YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch
>
>
> Per discussion in YARN-3334, we need filter out unnecessary collectors info 
> from RM in heartbeat response. Our propose is to add cache for runningApps in 
> RMNode, so RM only send collectors for local running apps back. This is also 
> needed in YARN-914 (graceful decommission) that if no running apps in NM 
> which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623348#comment-14623348
 ] 

Hudson commented on YARN-3116:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #983 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/983/])
YARN-3116. RM notifies NM whether a container is an AM container or normal task 
container. Contributed by Giovanni Matteo Fumarola. (zjshen: rev 
1ea36299a47af302379ae0750b571ec021eb54ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerInitializationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerTerminationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java


> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Fix For: 2.8.0
>
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623343#comment-14623343
 ] 

Hudson commented on YARN-3445:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #253 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/253/])
YARN-3445. Cache runningApps in RMNode for getting running apps on given 
NodeId. (Junping Du via mingma) (mingma: rev 
08244264c0583472b9c4e16591cfde72c6db62a2)
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* hadoop-yarn-project/CHANGES.txt


> Cache runningApps in RMNode for getting running apps on given NodeId
> 
>
> Key: YARN-3445
> URL: https://issues.apache.org/jira/browse/YARN-3445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, 
> YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, 
> YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch
>
>
> Per discussion in YARN-3334, we need filter out unnecessary collectors info 
> from RM in heartbeat response. Our propose is to add cache for runningApps in 
> RMNode, so RM only send collectors for local running apps back. This is also 
> needed in YARN-914 (graceful decommission) that if no running apps in NM 
> which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623344#comment-14623344
 ] 

Hudson commented on YARN-3116:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #253 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/253/])
YARN-3116. RM notifies NM whether a container is an AM container or normal task 
container. Contributed by Giovanni Matteo Fumarola. (zjshen: rev 
1ea36299a47af302379ae0750b571ec021eb54ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerTerminationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerInitializationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java


> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Fix For: 2.8.0
>
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3915) scmadmin help message correction

2015-07-11 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3915:
---
Attachment: 0001-YARN-3915.patch

Patch attached for bug
please review

> scmadmin help message correction 
> -
>
> Key: YARN-3915
> URL: https://issues.apache.org/jira/browse/YARN-3915
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-3915.patch
>
>
> Help message for scmadmin
> *Actual*  {{hadoop scmadmin}} *expected*  {{yarn scmadmin}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3915) scmadmin help message correction

2015-07-11 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3915:
--

 Summary: scmadmin help message correction 
 Key: YARN-3915
 URL: https://issues.apache.org/jira/browse/YARN-3915
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor


Help message for scmadmin
*Actual*  {{hadoop scmadmin}} actual {{yarn scmadmin}}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3915) scmadmin help message correction

2015-07-11 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3915:
---
Description: 
Help message for scmadmin
*Actual*  {{hadoop scmadmin}} *expected*  {{yarn scmadmin}}



  was:
Help message for scmadmin
*Actual*  {{hadoop scmadmin}} actual {{yarn scmadmin}}




> scmadmin help message correction 
> -
>
> Key: YARN-3915
> URL: https://issues.apache.org/jira/browse/YARN-3915
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
>
> Help message for scmadmin
> *Actual*  {{hadoop scmadmin}} *expected*  {{yarn scmadmin}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-07-11 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623300#comment-14623300
 ] 

Arun Suresh commented on YARN-3656:
---

[~jyaniv], [~curino] and [~imenache], This looks really interesting 
optimization to the existing reservation algorithm. Thanks for working on this 
!!

I took an initial pass at the latest patch. Couple of minor nits :
* TryManyReservationAgents.java
in both create an updateReservation:
line 85/54 : you don’t need the if.. just return alg.update/create..
* minor suggestion : was wondering if instead of a TryManyReservationAgents, 
allow ReservationAgent itself to have a fallbackReservationAgent. An agent can 
then call its fallback (and if that fails, the fallback’s fallback etc.. until 
no more fallbacks exist). That way, you don't really need to maintain a 
LinkedList etc.

Will spend some time on the paper and review the actual Algorithm over the 
weekend..

> LowCost: A Cost-Based Placement Agent for YARN Reservations
> ---
>
> Key: YARN-3656
> URL: https://issues.apache.org/jira/browse/YARN-3656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Ishai Menache
>Assignee: Jonathan Yaniv
>  Labels: capacity-scheduler, resourcemanager
> Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, 
> YARN-3656-v1.2.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf
>
>
> YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
> ahead of time. YARN-1710 introduced a greedy agent for placing user 
> reservations. The greedy agent makes fast placement decisions but at the cost 
> of ignoring the cluster committed resources, which might result in blocking 
> the cluster resources for certain periods of time, and in turn rejecting some 
> arriving jobs.
> We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
> the demand of the job throughout the allowed time-window according to a 
> global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)