[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2015-11-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022796#comment-15022796
 ] 

Sangjin Lee commented on YARN-2975:
---

I think we should backport this to branch-2.6. This is a very important 
follow-up fix to YARN-2910. [~kasha]?

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-23 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022842#comment-15022842
 ] 

Carlo Curino commented on YARN-4358:


(1)/(2) done
(3) as discussed above... agreed to circle back later.
(4) was not address as the two fields are long and the diff might exceed int. 
(5) I implemented the requested changes and refactor a little further the 
PlanView interface and InMemoryPlan, getting rid of few methods which were not 
used anymore as we are switching to the more efficient RLE-centric requests to 
the plan. 

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-23 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-4386:
-

 Summary: refreshNodesGracefully() looks at active RMNode list for 
recommissioning decommissioned nodes
 Key: YARN-4386
 URL: https://issues.apache.org/jira/browse/YARN-4386
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


In refreshNodesGracefully(), during recommissioning, the entryset from 
getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
used for checking 'decommissioned' nodes which are present in 
getInactiveRMNodes() map alone. 
{code}
for (Entry entry:rmContext.getRMNodes().entrySet()) { 
.
 // Recommissioning the nodes
if (entry.getValue().getState() == NodeState.DECOMMISSIONING
|| entry.getValue().getState() == NodeState.DECOMMISSIONED) {
  this.rmContext.getDispatcher().getEventHandler()
  .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022782#comment-15022782
 ] 

Varun Saxena commented on YARN-4380:


Thanks [~ozawa].
InterruptedException indicates that there is a race. Because 
LocalizerRunner#interrupt has been called while credential file is being 
written.

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4204) ConcurrentModificationException in FairSchedulerQueueInfo

2015-11-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022808#comment-15022808
 ] 

Sangjin Lee commented on YARN-4204:
---

This looks like a great candidate for branch-2.6. [~adhoot]?

> ConcurrentModificationException in FairSchedulerQueueInfo
> -
>
> Key: YARN-4204
> URL: https://issues.apache.org/jira/browse/YARN-4204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4204.001.patch, YARN-4204.002.patch
>
>
> Saw this exception which caused RM to go down
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerQueueInfo.(FairSchedulerQueueInfo.java:100)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerInfo.(FairSchedulerInfo.java:46)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getSchedulerInfo(RMWebServices.java:229)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:589)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:552)
>   at 
> 

[jira] [Commented] (YARN-4365) FileSystemNodeLabelStore should check for root dir existence on startup

2015-11-23 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022816#comment-15022816
 ] 

Kuhu Shukla commented on YARN-4365:
---

Test failure is irreproducible locally and is unrelated to the patch as far as 
I can see.  Findbugs warnings are coming from 
{{org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl}}, 
which are not related to this patch.

> FileSystemNodeLabelStore should check for root dir existence on startup
> ---
>
> Key: YARN-4365
> URL: https://issues.apache.org/jira/browse/YARN-4365
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
> Attachments: YARN-4365-1.patch
>
>
> If the namenode is in safe mode for some reason then FileSystemNodeLabelStore 
> will prevent the RM from starting since it unconditionally tries to create 
> the root directory for the label store.  If the root directory already exists 
> and no labels are changing then we shouldn't prevent the RM from starting 
> even if the namenode is in safe mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022818#comment-15022818
 ] 

Sunil G commented on YARN-4386:
---

Hi [~kshukla],
As I see it, we can RECOMMISSION only those nodes which are in DECOMMISSIONING 
mode. Such nodes are present in {{getRMNodes}} which is correct.

Also if you see {{RMNodeImpl}}, RECOMMISSION event is not present from 
DECOMMISSIONED state. Hence even if it hits the code, it will throw an 
InvalidState Exception. So looping only in {{rmContext.getRMNodes()}} looks 
fine for me, however I also feel we do not need that extra if check which does 
for DECOMMISSIONED.
cc/ [~djp]

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4384) updateNodeResource CLI should not accept negative values for resource

2015-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022840#comment-15022840
 ] 

Hadoop QA commented on YARN-4384:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 26s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 37s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 114m 7s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_85 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_85 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
\\
\\
|| Subsystem || Report/Notes ||
| 

[jira] [Updated] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-23 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4358:
---
Attachment: YARN-4358.2.patch

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022660#comment-15022660
 ] 

Tsuyoshi Ozawa edited comment on YARN-4385 at 11/23/15 6:25 PM:


>From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/

{quote}


  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime 
java.io.IOException:...

Tests run: 14, Failures: 0, Errors: 12, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop YARN  SUCCESS [  4.803 s]
[INFO] Apache Hadoop YARN API  SUCCESS [04:44 min]
[INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min]
[INFO] Apache Hadoop YARN Server . SUCCESS [  0.109 s]
[INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s]
[INFO] Apache Hadoop YARN NodeManager  SUCCESS [10:05 min]
[INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min]
[INFO] Apache Hadoop YARN ResourceManager  SUCCESS [  01:03 h]
[INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min]
[INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min]
[INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s]
[INFO] Apache Hadoop YARN Applications ... SUCCESS [  0.053 s]
[INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED
[INFO] Apache Hadoop YARN Site ... SKIPPED
[INFO] Apache Hadoop YARN Registry ... SKIPPED
[INFO] Apache Hadoop YARN Project  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:37 h
[INFO] Finished at: 2015-11-09T20:36:25+00:00
[INFO] Final Memory: 81M/690M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-yarn-applications-distributedshell: There are test failures.
[ERROR]
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-applications-distributedshell
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-9234
Sending e-mails to: yarn-...@hadoop.apache.org
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
12 tests failed.
FAILED:  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithInvalidArgs

Error Message:
java.io.IOException: ResourceManager failed to start. Final state is STOPPED

Stack Trace:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
ResourceManager failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:331)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.access$500(MiniYARNCluster.java:99)
at 

[jira] [Comment Edited] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022660#comment-15022660
 ] 

Tsuyoshi Ozawa edited comment on YARN-4385 at 11/23/15 6:26 PM:


On my local log: 

{quote}
Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 437.156 sec 
<<< FAILURE! - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
testDSShellWithCustomLogPropertyFile(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 115.558 sec  <<< ERROR!
java.lang.Exception: test timed out after 9 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:734)
at 
org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:715)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithCustomLogPropertyFile(TestDistributedShell.java:502)
{quote}


was (Author: ozawa):
>From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/

{quote}


  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime 
java.io.IOException:...

Tests run: 14, Failures: 0, Errors: 12, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop YARN  SUCCESS [  4.803 s]
[INFO] Apache Hadoop YARN API  SUCCESS [04:44 min]
[INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min]
[INFO] Apache Hadoop YARN Server . SUCCESS [  0.109 s]
[INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s]
[INFO] Apache Hadoop YARN NodeManager  SUCCESS [10:05 min]
[INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min]
[INFO] Apache Hadoop YARN ResourceManager  SUCCESS [  01:03 h]
[INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min]
[INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min]
[INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s]
[INFO] Apache Hadoop YARN Applications ... SUCCESS [  0.053 s]
[INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED
[INFO] Apache Hadoop YARN Site ... SKIPPED
[INFO] Apache Hadoop YARN Registry ... SKIPPED
[INFO] Apache Hadoop YARN Project  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:37 h
[INFO] Finished at: 2015-11-09T20:36:25+00:00
[INFO] Final Memory: 81M/690M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-yarn-applications-distributedshell: There are test failures.
[ERROR]
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-applications-distributedshell
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-9234
Sending e-mails to: yarn-...@hadoop.apache.org
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any




[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Attachment: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt

[~varun_saxena] attaching a log when the test fails. 

I use this simple script to reproduce some intermittent failures 
https://github.com/oza/failchecker

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-23 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022698#comment-15022698
 ] 

Carlo Curino commented on YARN-4358:


[~asuresh] thanks for the comments. I agree with them. 

In particular regarding (3), we are currently slightly abusing the use of 
RLESparseResourceAllocation to efficiently track time-varying quantities (which 
are not memory/core resources). As YARN-3926 lands, this can be made look much 
cleaner as we will be able to define new logical resources. 

I will address your comments and upload a new version soon.

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart

2015-11-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022621#comment-15022621
 ] 

Naganarasimha G R commented on YARN-3127:
-

Seems like test failures are unrelated to the fix and check style is not valid.

> Avoid timeline events during RM recovery or restart
> ---
>
> Key: YARN-3127
> URL: https://issues.apache.org/jira/browse/YARN-3127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0, 2.7.1
> Environment: RM HA with ATS
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: AppTransition.png, YARN-3127.20150213-1.patch, 
> YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch, 
> YARN-3127.20151123-1.patch
>
>
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL :/applicationhistory
> //Note Earlier exception was thrown when accessed. 
> Incomplete information is shown in the ATS web UI. i.e. attempt container and 
> other information is not displayed.
> Also even if timeline server is started with RM, and on RM restart/ recovery 
> ATS events for the applications already existing in ATS are resent which is 
> not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart

2015-11-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022644#comment-15022644
 ] 

Naganarasimha G R commented on YARN-3127:
-

YARN-4306 and YARN-4318 have been already raised for the test failures

> Avoid timeline events during RM recovery or restart
> ---
>
> Key: YARN-3127
> URL: https://issues.apache.org/jira/browse/YARN-3127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0, 2.7.1
> Environment: RM HA with ATS
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: AppTransition.png, YARN-3127.20150213-1.patch, 
> YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch, 
> YARN-3127.20151123-1.patch
>
>
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL :/applicationhistory
> //Note Earlier exception was thrown when accessed. 
> Incomplete information is shown in the ATS web UI. i.e. attempt container and 
> other information is not displayed.
> Also even if timeline server is started with RM, and on RM restart/ recovery 
> ATS events for the applications already existing in ATS are resent which is 
> not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022660#comment-15022660
 ] 

Tsuyoshi Ozawa commented on YARN-4385:
--

>From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/

{quote}

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 11262 lines...]
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime 
java.io.IOException:...

Tests run: 14, Failures: 0, Errors: 12, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop YARN  SUCCESS [  4.803 s]
[INFO] Apache Hadoop YARN API  SUCCESS [04:44 min]
[INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min]
[INFO] Apache Hadoop YARN Server . SUCCESS [  0.109 s]
[INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s]
[INFO] Apache Hadoop YARN NodeManager  SUCCESS [10:05 min]
[INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min]
[INFO] Apache Hadoop YARN ResourceManager  SUCCESS [  01:03 h]
[INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min]
[INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min]
[INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s]
[INFO] Apache Hadoop YARN Applications ... SUCCESS [  0.053 s]
[INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED
[INFO] Apache Hadoop YARN Site ... SKIPPED
[INFO] Apache Hadoop YARN Registry ... SKIPPED
[INFO] Apache Hadoop YARN Project  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:37 h
[INFO] Finished at: 2015-11-09T20:36:25+00:00
[INFO] Final Memory: 81M/690M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-yarn-applications-distributedshell: There are test failures.
[ERROR]
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-applications-distributedshell
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-9234
Sending e-mails to: yarn-...@hadoop.apache.org
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
12 tests failed.
FAILED:  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithInvalidArgs

Error Message:
java.io.IOException: ResourceManager failed to start. Final state is STOPPED

Stack Trace:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
ResourceManager failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:331)
at 

[jira] [Moved] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa moved HADOOP-12591 to YARN-4385:
---

Key: YARN-4385  (was: HADOOP-12591)
Project: Hadoop YARN  (was: Hadoop Common)

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4385:
-
Component/s: test

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tsuyoshi Ozawa
> Attachments: 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3454) Add efficient merge operation to RLESparseResourceAllocation

2015-11-23 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022688#comment-15022688
 ] 

Carlo Curino commented on YARN-3454:


[~asuresh] Thank you so much for the thoughtful review and commit. 

> Add efficient merge operation to RLESparseResourceAllocation
> 
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-3454.1.patch, YARN-3454.2.patch, YARN-3454.3.patch, 
> YARN-3454.4.patch, YARN-3454.5.patch, YARN-3454.patch
>
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 
> In the context of this fix, we also introduced static methods to "merge" two 
> RLESparseResourceAllocation, while applying an operator in the process 
> (add/subtract/min/max/subtractTestPositive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4385:
-
Attachment: 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt

Attaching a log when it fails. 

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tsuyoshi Ozawa
> Attachments: 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-23 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022866#comment-15022866
 ] 

Kuhu Shukla commented on YARN-4386:
---

Yes I agree. Missed correlating the DECOMMISSIONED state transition to this 
check. Changing the Priority to Minor.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-23 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4386:
--
Priority: Minor  (was: Major)

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-11-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022885#comment-15022885
 ] 

Sangjin Lee commented on YARN-3762:
---

This should be a good candidate for branch-2.6 (2.6.3). [~kasha], what do you 
think?

> FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
> ---
>
> Key: YARN-3762
> URL: https://issues.apache.org/jira/browse/YARN-3762
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch
>
>
> In our testing, we ran into the following ConcurrentModificationException:
> {noformat}
> halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
> 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
> queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
> queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
> 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022933#comment-15022933
 ] 

Jason Lowe commented on YARN-4344:
--

+1 for branch-2.6 patch as well, committing this.

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"

2015-11-23 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4334:
---
Attachment: YARN-4334.4.patch

.4 patch fix some checkstyle and whitespace issues. TestWebApp is tracked by 
YARN-4379, not related to my change. TestAMAuthorization and TestClientRMTokens 
are not caused by my patch either. 
[~jlowe], please help review the latest patch, thanks!

> Ability to avoid ResourceManager recovery if state store is "too old"
> -
>
> Key: YARN-4334
> URL: https://issues.apache.org/jira/browse/YARN-4334
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4334.2.patch, YARN-4334.3.patch, YARN-4334.4.patch, 
> YARN-4334.patch, YARN-4334.wip.2.patch, YARN-4334.wip.3.patch, 
> YARN-4334.wip.4.patch, YARN-4334.wip.patch
>
>
> There are times when a ResourceManager has been down long enough that 
> ApplicationMasters and potentially external client-side monitoring mechanisms 
> have given up completely.  If the ResourceManager starts back up and tries to 
> recover we can get into situations where the RM launches new application 
> attempts for the AMs that gave up, but then the client _also_ launches 
> another instance of the app because it assumed everything was dead.
> It would be nice if the RM could be optionally configured to avoid trying to 
> recover if the state store was "too old."  The RM would come up without any 
> applications recovered, but we would avoid a double-submission situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-23 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.003.patch
YARN-3946.v1.003.Images.zip

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-11-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022069#comment-15022069
 ] 

Junping Du commented on YARN-4131:
--

Cool. I will keep this JIRA open until we are sure nothing else need to be 
added. Thanks!

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-11-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021988#comment-15021988
 ] 

Junping Du commented on YARN-4131:
--

I think YARN-1897 cover most of this JIRA's work. [~ste...@apache.org], any gap 
now you think for providing chaos monkey of YARN? 
btw, [~adhoot], sorry for replying late as I was taking a long vacation just 
after your above comments and miss your comments after come back.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-11-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022013#comment-15022013
 ] 

Steve Loughran commented on YARN-4131:
--

I think we are OK ...some needs to write the chaos monkey and see what -if any- 
they're missing

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-23 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong reassigned YARN-4382:
--

Assignee: Jun Gong

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-23 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022059#comment-15022059
 ] 

Jun Gong commented on YARN-4382:


[~lachisis] Thanks for reporting the issue. Please feel free to re-assign to 
yourself if you starts/wants to work on it. 

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022035#comment-15022035
 ] 

Naganarasimha G R commented on YARN-3946:
-

Hi [~wangda], 
Sorry for the delay. As per the offline discussion we had concluded to 
# We should only record AM launch related events with the patch, so we don't 
need to record recover/running state. (I think you can clean 
am-launch-diagnostic when AM container allocated).
# Event time is good, but I think we should put it in a separated JIRA. Maybe 
we need do some refactoring of existing diagnostic part.

I have taken care about the first point and have AM launch diagnostic messages 
till container is assigned to the AM process. and for the second point as it 
was simple modification, i have captured it in this jira itself. Please check 
it .
Also another difference from the previous patch, as i was earlier mentioning in 
some cases the reason why the node is not assigned was getting overwritten by 
the following modification in LeafQueue.
{code}
@@ -904,7 +919,9 @@ public synchronized CSAssignment assignContainers(Resource 
clusterResource,
 
 // Done
 return assignment;
-  } else if (!assignment.getSkipped()) {
+  } else if (assignment.getSkipped()) {
+application.updateNodeDiagnostics(node);
+  } else {
{code}
hence have handled in this patch by storing this diagnostic message temporarily 
and clear it once message is created
Also have pasted some images related to the patch.

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4383) TeraGen Application allows same output directory for multiple jobs

2015-11-23 Thread tongshiquan (JIRA)
tongshiquan created YARN-4383:
-

 Summary: TeraGen Application allows same output directory for 
multiple jobs 
 Key: YARN-4383
 URL: https://issues.apache.org/jira/browse/YARN-4383
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: tongshiquan


When Teragen is run multiple times with the same output directory, normally it 
should validate  and fail.

But some cases it may continue and cause the exceptions which results failure 
in job later time.


I think the reason behind it is 
{code} 
org.apache.hadoop.examples.terasort.TeraOutputFormat.checkOutputSpecs(TeraOutputFormat.java)
 have issue, it permit the output already exists if it have only 1 kid and it's 
PARTITION_FILENAME
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4344:

Attachment: YARN-4344-branch-2.6.001.patch

Uploaded a version for branch-2.6

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3127) Avoid timeline events during RM recovery or restart

2015-11-23 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3127:

Attachment: YARN-3127.20151123-1.patch

Hi [~sjlee0], [~rohithsharma] & [~xgong], i have rebased the patch can you 
please take a look at it. Based on this we can get YARN-4350 corrected.

> Avoid timeline events during RM recovery or restart
> ---
>
> Key: YARN-3127
> URL: https://issues.apache.org/jira/browse/YARN-3127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0, 2.7.1
> Environment: RM HA with ATS
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: AppTransition.png, YARN-3127.20150213-1.patch, 
> YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch, 
> YARN-3127.20151123-1.patch
>
>
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL :/applicationhistory
> //Note Earlier exception was thrown when accessed. 
> Incomplete information is shown in the ATS web UI. i.e. attempt container and 
> other information is not displayed.
> Also even if timeline server is started with RM, and on RM restart/ recovery 
> ATS events for the applications already existing in ATS are resent which is 
> not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-23 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4381:
-
Assignee: Lin Yiqun

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022087#comment-15022087
 ] 

Junping Du commented on YARN-4381:
--

[~linyiqun], thank you for contributing the patch for YARN project. I just add 
u to yarn contributor and assign this JIRA to you.

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4298) Fix findbugs warnings in hadoop-yarn-common

2015-11-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022089#comment-15022089
 ] 

Sunil G commented on YARN-4298:
---

Thanks [~varun_saxena] !!

Even after that fix, same warnings are shown. I will have a re-look in this now 
by locally verifying it.

> Fix findbugs warnings in hadoop-yarn-common
> ---
>
> Key: YARN-4298
> URL: https://issues.apache.org/jira/browse/YARN-4298
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-4298.patch, 0002-YARN-4298.patch
>
>
> {noformat}
>  classname='org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl'>
>  category='MT_CORRECTNESS' message='Inconsistent synchronization of 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.builder;
>  locked 95% of time' lineNumber='390'/>
>  category='MT_CORRECTNESS' message='Inconsistent synchronization of 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.proto;
>  locked 94% of time' lineNumber='390'/>
>  category='MT_CORRECTNESS' message='Inconsistent synchronization of 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.viaProto;
>  locked 94% of time' lineNumber='390'/>
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-11-23 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022161#comment-15022161
 ] 

Chang Li commented on YARN-4132:


TestWebApp is tracked by YARN-4379, not related to my change. [~jlowe], please 
help review the updated patch, thanks!

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, 
> YARN-4132.5.patch, YARN-4132.6.2.patch, YARN-4132.6.patch, YARN-4132.7.patch, 
> YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022176#comment-15022176
 ] 

Hadoop QA commented on YARN-3946:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 15 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 655, now 666). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 55s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 51s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 183m 52s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimitsByPartition
 |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023190#comment-15023190
 ] 

Hadoop QA commented on YARN-4358:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 21 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 64, now 85). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 53 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 2 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 44s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85
 with JDK v1.7.0_85 generated 15 new issues (was 2, now 17). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 22s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 36s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 135m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-11-23 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3878:
--
Fix Version/s: 2.6.3

+1. Committed it to branch-2.6. Thanks [~varun_saxena]!

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-3878-branch-2.6.01.patch, YARN-3878.01.patch, 
> YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
> YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, 
> YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 

[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023219#comment-15023219
 ] 

Hudson commented on YARN-4344:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2647 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2647/])
YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: 
rev d36b6e045f317c94e97cb41a163aa974d161a404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-11-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3925:
-
Fix Version/s: 2.6.3

I pulled this into 2.6.3 as well.

> ContainerLogsUtils#getContainerLogFile fails to read container log files from 
> full disks.
> -
>
> Key: YARN-3925
> URL: https://issues.apache.org/jira/browse/YARN-3925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-3925.000.patch, YARN-3925.001.patch
>
>
> ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
> {{getContainerLogFile}} depends on 
> {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
> {{LocalDirsHandlerService#getLogPathToRead}} calls 
> {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
> configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
> include full disks in {{LocalDirsHandlerService#checkDirs}}:
> {code}
> Configuration conf = getConfig();
> List localDirs = getLocalDirs();
> conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
> localDirs.toArray(new String[localDirs.size()]));
> List logDirs = getLogDirs();
> conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
>   logDirs.toArray(new String[logDirs.size()]));
> {code}
> ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
> ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4372) Cannot enable system-metrics-publisher inside MiniYARNCluster

2015-11-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023169#comment-15023169
 ] 

Vinod Kumar Vavilapalli commented on YARN-4372:
---

bq. Even after the patch TestDistributedShell.testDSShellWithoutDomain is 
failing (test case passes but the in the console logs there were logs for 
unreachable timlineserver for each smp events).
You are right, *sigh*, this is the same bug we ran into at YARN-3087: Guice not 
letting us run two UI services at the same time. This used to work because 
Timeline Service started last before this patch. Need to think more, not sure 
how we can fix this.

> Cannot enable system-metrics-publisher inside MiniYARNCluster
> -
>
> Key: YARN-4372
> URL: https://issues.apache.org/jira/browse/YARN-4372
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-4372-20151119.1.txt
>
>
> [~Naganarasimha] found this at YARN-2859, see [this 
> comment|https://issues.apache.org/jira/browse/YARN-2859?focusedCommentId=15005746=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15005746].
> The way daemons are started inside MiniYARNCluster, RM is not setup correctly 
> to send information to TimelineService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Attachment: 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt

[~varun_saxena], thank you for the fix. The fix itself looks good me. 

I got another error though it's rare to happen: 

{quote}
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 0.093 sec  <<< FAILURE!
org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
Argument(s) are different! Wanted:
eventHandler.handle(

);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
Actual invocation has different arguments:
eventHandler.handle(
EventType: APPLICATION_INITED
);
-> at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
{quote}

Attaching a log for the failure. Could you take a look?

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4343) Need to support Application History Server on ATSV2

2015-11-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023786#comment-15023786
 ] 

Naganarasimha G R commented on YARN-4343:
-

Hi [~sjlee0], 
I think we need to have *"yarn-2928-1st-milestone"* label for this too. As 
YARNClientImpl tries to search the apps(attempts & Container details) from ATS 
if its not present in RM. for CLI and AppReportFetcher tries to get the 
appliction report from ATS for Webservice.So without this i feel its break in 
functionality so i think for ATS v2 too we need to support it 

Coming to discussion about the approach, i had offline discussion with 
[~varun_saxena] and he said that as per the 
[discussion|https://issues.apache.org/jira/browse/YARN-3047?focusedCommentId=14368563=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14368563]
 with [~zjshen] in YARN-3047, RPC need not be supported and it can be supported 
at client side for ATSV2 (have a timelineclient get the timeline entities and 
convert it into report objects). 
{quote}
For some legacy problem. AHS exposes RPC interface. However, IMHO, we don't 
need to create the RPC interface again in v2 as we're building the new system 
from the ground. What we can do is to wrap over the REST APIs in the java 
client, and provide YARN CLI commands.
{quote}
May be i can have some kind of factory and instantiate the client based on the 
configurations and try to do it, but little doubt full on the dependcies part. 
Let me give a try.

> Need to support Application History Server on ATSV2
> ---
>
> Key: YARN-4343
> URL: https://issues.apache.org/jira/browse/YARN-4343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> AHS is used by the CLI and Webproxy(REST), if the application related 
> information is not found in RM then it tries to fetch from AHS and show



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023799#comment-15023799
 ] 

Hudson commented on YARN-4344:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2572 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2572/])
YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: 
rev d36b6e045f317c94e97cb41a163aa974d161a404)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-11-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023819#comment-15023819
 ] 

Naganarasimha G R commented on YARN-3623:
-

Hi [~vinodkv], [~sjlee0], [~gtCarrera] & [~xgong],
If the above solution is fine then we can have following steps 
# make this jira as subjira of 1.5 and introduce the timeline version 
# As part of YARN-4183, we can create 
*"yarn.timeline-service.client.require-delegation-token"* so that it fixes 
depency on *"yarn.timeline-service.enabled"* in the client side to get tokens
# YARN-4356 or a *new jira* can handle modifications and updates of timeline 
version in ATSv2
# Can raise new jira and handle REST interface support to get the supported ATS 
version and config from the server for ATS 1.5
# Can raise new jira and handle version checking for the ATSv2 interface 
methods in TimelineClient.and also fetching of version


> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Naganarasimha G R
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023870#comment-15023870
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

{quote}
Archiving artifacts
[description-setter] Description set: YARN-4348
Recording test results
ERROR: Publisher 'Publish JUnit test result report' failed: No test report 
files were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
An attempt to send an e-mail to empty list of recipients, ignored.
Finished: FAILURE
{quote}

Hmm, Jenkins looks to be unhealthy. 

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, 
> log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-11-23 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023880#comment-15023880
 ] 

Bibin A Chundatt commented on YARN-4304:


Hi [~sunilg]

# Could you also check the memory total when container reservation is done for 
NM.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023487#comment-15023487
 ] 

Hudson commented on YARN-4344:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #633 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/633/])
YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: 
rev d36b6e045f317c94e97cb41a163aa974d161a404)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* hadoop-yarn-project/CHANGES.txt


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023435#comment-15023435
 ] 

Hadoop QA commented on YARN-4248:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 23s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66
 with JDK v1.8.0_66 generated 1 new issues (was 2, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} Patch generated 22 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 40, now 62). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 18s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 162m 26s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_85 Failed junit tests | 

[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-23 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023481#comment-15023481
 ] 

lachisis commented on YARN-4382:


Thanks for your reply, Jun Gong. 
I think it is a good idea to use "release_agent" to clear the empty container 
hierarchys. But I am afaid that does "release_agent" option suit all the cgroup 
versions?
I just test "release_agent" option, maybe some mistake, it does not work now.


> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023549#comment-15023549
 ] 

Hudson commented on YARN-4349:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8868 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8868/])
YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: 
rev 8676a118a12165ae5a8b80a2a4596c133471ebc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java


> Support CallerContext in 

[jira] [Commented] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"

2015-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023410#comment-15023410
 ] 

Hadoop QA commented on YARN-4334:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 32s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 6 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 608, now 611). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 39s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 2 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 16s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s 
{color} | {color:green} hadoop-yarn-api in the 

[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-23 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023415#comment-15023415
 ] 

Lin Yiqun commented on YARN-4381:
-

Thanks [~djp]!

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-11-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023527#comment-15023527
 ] 

Sangjin Lee commented on YARN-3862:
---

Whether we make TimelineFilter part of the object model or not, we'll still 
need to come up with a way to support filter queries on the URLs, no?

While we're at it, today there are no reads done through the TimelineClient 
API, correct? Today there are only the REST-based queries. Of course this 
doesn't mean we won't support more programmatic reads via TimelineClient (and 
RPC?) in the future, and also there may be value in making TimelineFilter part 
of the common API. I just wanted to understand whether we need to make that 
call as part of this JIRA. Did I understand this correctly, or did I miss 
something important?

> Decide which contents to retrieve and send back in response in TimelineReader
> -
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023375#comment-15023375
 ] 

Hudson commented on YARN-4344:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #717 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/717/])
YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: 
rev d36b6e045f317c94e97cb41a163aa974d161a404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-11-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023646#comment-15023646
 ] 

Sangjin Lee commented on YARN-3862:
---

I had a chance to go over the latest patch in a little more detail. I think 
this is now closer to being ready. I do have some comments and suggestions, 
some major and others minor.

(TimelineFilterUtils.java)
- createHBaseColQualPrefixFilter(): this is still trying to compute the column 
prefix by hand. The main point of introducing getColumnPrefixBytes() on 
ColumnPrefix was to avoid doing this for confs and metrics. Can we rework the 
signatures of createHBaseFilterList() so that we can rely on 
ColumnPrefix.getColumnPrefixBytes()? Ideally all computations of qualifier 
bytes should go through ColumnPrefix.getColumnPrefixBytes().

(TestHBaseTimelineReaderImpl.java)
- I'm not too sure about the name; for other tests we basically combined the 
reader and writer tests. Thoughts on how to make this best fit into the 
existing tests?

(GenericEntityReader.java)
- l.139: nit: typo: releated -> related
- I keep confusing configFilters and confs. The names are so similar that I 
have to go check the implementations to distinguish them (configFilters 
filtering rows we want to return, and confs filters contents of the matching 
rows). Could there be a better way to name them so that their meanings are 
clearer? I don't have a great idea at the moment, and you might want to think 
about better names...
- On a related note, this is probably outside the scope of this JIRA, but I see 
that the configFilter and metricFilter are applied on the client-side. Probably 
on a separate JIRA, we should see if we can do this on the HBase side. This is 
just a reminder.
- l.156: Why do we need to check if configFilters == null? Is it because if 
configFilters are specified we implicitly assume we want the config columns 
returned in the content? Is that a valid assumption?

(TimelineReader.java)
- Related to one of the points above, at least we should add javadoc that 
clearly explains confs and metrics and how they are different from 
configFilters and metricFilters. That will help us a great deal in maintaining 
this.

(FlowRunColumnPrefix.java)
- As a result of YARN-4053 being committed, getColumnPrefixBytes(String) 
already exists. It should be removed from this patch.

(TestHBaseStorageFlowRun.java)
- testWriteFlowRunMetricsPrefix() and testWriteFlowRunsMetricFields() are 
failing possibly due to changes in YARN-4053.



> Decide which contents to retrieve and send back in response in TimelineReader
> -
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023722#comment-15023722
 ] 

Hudson commented on YARN-4349:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1442 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1442/])
YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: 
rev 8676a118a12165ae5a8b80a2a4596c133471ebc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


> Support CallerContext in YARN

[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023728#comment-15023728
 ] 

Naganarasimha G R commented on YARN-4350:
-

Hi [~sjlee0], 
I have rebased  YARN-3127, hope you can have a look at it once ? also for 
YARN-4372 seems like it might take little bit more time(atleast i was not able 
to crack it). so if its very important then we can take the approach(fixed port 
got from serversocketutil) as i had mentioned in YARN-2859 and we can continue. 
And once the YARN-4372  is done we can revert it back to ephemermal 
port(YARN-2859 solution). Thoughts ?


> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023652#comment-15023652
 ] 

Hudson commented on YARN-4349:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #709 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/709/])
YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: 
rev 8676a118a12165ae5a8b80a2a4596c133471ebc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java


> Support 

[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023671#comment-15023671
 ] 

Hudson commented on YARN-4349:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2650 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2650/])
YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: 
rev 8676a118a12165ae5a8b80a2a4596c133471ebc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> Support 

[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023581#comment-15023581
 ] 

Hudson commented on YARN-4349:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #719 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/719/])
YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: 
rev 8676a118a12165ae5a8b80a2a4596c133471ebc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Support 

[jira] [Assigned] (YARN-4384) updateNodeResource CLI should not accept negative values for resource

2015-11-23 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-4384:


Assignee: Junping Du

> updateNodeResource CLI should not accept negative values for resource
> -
>
> Key: YARN-4384
> URL: https://issues.apache.org/jira/browse/YARN-4384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
> Fix For: 2.8.0
>
>
> updateNodeResource CLI should not accept negative values for MemSize and 
> vCores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-11-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022480#comment-15022480
 ] 

Jason Lowe commented on YARN-4225:
--

Yes, I was thinking the client would refrain from reporting on a field it knew 
wasn't provided.  However I think having a getPreemptionDisabled and 
hasPreemptionDisabled methods exposed outside the protobuf is very confusing -- 
someone might call has when they should be calling get.  Maybe a name like 
"isPreemptionDisabledValid" or something would be more clear.

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-23 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022715#comment-15022715
 ] 

Karthik Kambatla commented on YARN-3980:


+1. Will commit this later today. 

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, 
> YARN-3980-v8.patch, YARN-3980-v9.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4384) updateNodeResource CLI should not accept negative values for resource

2015-11-23 Thread Sushmitha Sreenivasan (JIRA)
Sushmitha Sreenivasan created YARN-4384:
---

 Summary: updateNodeResource CLI should not accept negative values 
for resource
 Key: YARN-4384
 URL: https://issues.apache.org/jira/browse/YARN-4384
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Sushmitha Sreenivasan
 Fix For: 2.8.0


updateNodeResource CLI should not accept negative values for MemSize and vCores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4384) updateNodeResource CLI should not accept negative values for resource

2015-11-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022459#comment-15022459
 ] 

Junping Du commented on YARN-4384:
--

Thanks for reporting this, [~ssreenivasan]! I agree that we should check the 
value to make sure admin/user won't set some invalid values to memory and vCore 
unintentionally. Will update a patch to fix it.

> updateNodeResource CLI should not accept negative values for resource
> -
>
> Key: YARN-4384
> URL: https://issues.apache.org/jira/browse/YARN-4384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
> Fix For: 2.8.0
>
>
> updateNodeResource CLI should not accept negative values for MemSize and 
> vCores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022458#comment-15022458
 ] 

Jason Lowe commented on YARN-4380:
--

[~varun_saxena] could you take a look?

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4384) updateNodeResource CLI should not accept negative values for resource

2015-11-23 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4384:
-
Attachment: YARN-4384.patch

Upload a patch to fix this.

> updateNodeResource CLI should not accept negative values for resource
> -
>
> Key: YARN-4384
> URL: https://issues.apache.org/jira/browse/YARN-4384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-4384.patch
>
>
> updateNodeResource CLI should not accept negative values for MemSize and 
> vCores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Attachment: YARN-4348-branch-2.7.003.patch

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, 
> log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4380:
--

Assignee: Varun Saxena

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022590#comment-15022590
 ] 

Varun Saxena commented on YARN-4380:


Thanks for reporting this [~ozawa]. I tried running this test several times on 
branch-2 but could not simulate the failure.
If you are able to simulate, it will be helpful if you can share the logs.

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-11-23 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022606#comment-15022606
 ] 

Eric Payne commented on YARN-4225:
--

bq. someone might call has when they should be calling get. Maybe a name like 
"isPreemptionDisabledValid" or something would be more clear
In order to remove the need for two methods, another alternative would be to 
have {{QueueInfoPBImpl#getPreemptionDisabled}} return a {{Boolean}} rather than 
a native type, and then have it return null if it internally determines that 
the field is not there.

 So, in {{QueueCLI#printQueueInfo}}, the code would look something like this:
{code}
Boolean preemptStatus = queueInfo.getPreemptionDisabled();
if (preemptStatus != null) {
  writer.print("\tPreemption : ");
  writer.println(preemptStatus ? "disabled" : "enabled");
}
{code}


In General, what is the Hadoop policy when a newer client talks to an older 
server and the protobuf output is different than expected. Should we expose 
some form of the {{has}} method, or should we overload the {{get}} method as I 
described here?

 I would appreciate any additional feedback from the community in general 
([~vinodkv], do you have any thoughts?)

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart

2015-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022310#comment-15022310
 ] 

Hadoop QA commented on YARN-3127:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 43s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85
 with JDK v1.7.0_85 generated 1 new issues (was 2, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 147, now 148). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 26s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 138m 10s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_85 Failed junit tests | 

[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022963#comment-15022963
 ] 

Hudson commented on YARN-4344:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8864 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8864/])
YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: 
rev d36b6e045f317c94e97cb41a163aa974d161a404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023041#comment-15023041
 ] 

Varun Saxena commented on YARN-4380:


Check for localizer runner thread to finish should be enough for the test to 
pass.
But to avoid InterruptedException in logs(due to race), have added check for 
localization to begin as well before initiating localizer heartbeats in test.

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4380:
---
Attachment: YARN-4380.01.patch

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-23 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3840:
---
Attachment: RMApps_Sorted.png

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Varun Saxena
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, 
> YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, 
> YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023084#comment-15023084
 ] 

Varun Saxena commented on YARN-3840:


Added an image to show sorted app ids' with the script in the patch

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Varun Saxena
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, 
> YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, 
> YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023097#comment-15023097
 ] 

Hadoop QA commented on YARN-4380:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 59s 
{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 22s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 51s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 11s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12773897/YARN-4380.01.patch |
| JIRA Issue | YARN-4380 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 983711f1122c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d36b6e0 |
| mvninstall | 

[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023127#comment-15023127
 ] 

Hudson commented on YARN-4344:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1439 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1439/])
YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: 
rev d36b6e045f317c94e97cb41a163aa974d161a404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements

2015-11-23 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4360:
---
Attachment: YARN-4360.2.patch

> Improve GreedyReservationAgent to support "early" allocations, and 
> performance improvements 
> 
>
> Key: YARN-4360
> URL: https://issues.apache.org/jira/browse/YARN-4360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4360.2.patch, YARN-4360.patch
>
>
> The GreedyReservationAgent allocates "as late as possible". Per various 
> conversations, it seems useful to have a mirror behavior that allocates as 
> early as possible. Also in the process we leverage improvements from 
> YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which 
> significantly speeds up allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023142#comment-15023142
 ] 

Hudson commented on YARN-4344:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #706 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/706/])
YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: 
rev d36b6e045f317c94e97cb41a163aa974d161a404)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, 
> YARN-4344.002.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements

2015-11-23 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023143#comment-15023143
 ] 

Carlo Curino commented on YARN-4360:


Rebasing after YARN-3454 got committed. A point of discussion: The 
configuration could be either choosing "GreedyReservationAgent" and set the 
allocation direction, or have an other top-level class (e.g., 
"LeftGreedyReservationAgent") that invokes the same "internals" but configured 
for left-to-right allocation. One less config param, one more class... thoughts?

> Improve GreedyReservationAgent to support "early" allocations, and 
> performance improvements 
> 
>
> Key: YARN-4360
> URL: https://issues.apache.org/jira/browse/YARN-4360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4360.2.patch, YARN-4360.patch
>
>
> The GreedyReservationAgent allocates "as late as possible". Per various 
> conversations, it seems useful to have a mirror behavior that allocates as 
> early as possible. Also in the process we leverage improvements from 
> YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which 
> significantly speeds up allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)