[jira] [Updated] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container

2015-09-13 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4152:
---
Attachment: 0001-YARN-4152.patch

> NM crash when LogAggregationService#stopContainer called for absent container
> -
>
> Key: YARN-4152
> URL: https://issues.apache.org/jira/browse/YARN-4152
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4152.patch
>
>
> NM crash during of log aggregation.
> Ran Pi job with 500 container and killed application in between
> *Logs*
> {code}
> 2015-09-12 18:44:25,597 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_e51_1442063466801_0001_01_99 is : 143
> 2015-09-12 18:44:25,670 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101
> 2015-09-12 18:44:25,670 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_e51_1442063466801_0001_01_000101 from application 
> application_1442063466801_0001
> 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_STOP for appId application_1442063466801_0001
> 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf   
> OPERATION=Container Finished - SucceededTARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1442063466801_0001
> CONTAINERID=container_e51_1442063466801_0001_01_000100
> {code}
> *Analysis*
> Looks like for absent container also {{stopContainer}} is called 
> {code}
>   case CONTAINER_FINISHED:
> LogHandlerContainerFinishedEvent containerFinishEvent =
> (LogHandlerContainerFinishedEvent) event;
> stopContainer(containerFinishEvent.getContainerId(),
> containerFinishEvent.getExitCode());
> break;
> {code}
> *Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101*
> Should skip when {{null==context.getContainers().get(containerId)}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742532#comment-14742532
 ] 

Hadoop QA commented on YARN-4152:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  7s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 41s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 56s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755607/0001-YARN-4152.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7269906 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9102/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9102/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9102/console |


This message was automatically generated.

> NM crash when LogAggregationService#stopContainer called for absent container
> -
>
> Key: YARN-4152
> URL: https://issues.apache.org/jira/browse/YARN-4152
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4152.patch
>
>
> NM crash during of log aggregation.
> Ran Pi job with 500 container and killed application in between
> *Logs*
> {code}
> 2015-09-12 18:44:25,597 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_e51_1442063466801_0001_01_99 is : 143
> 2015-09-12 18:44:25,670 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101
> 2015-09-12 18:44:25,670 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_e51_1442063466801_0001_01_000101 from application 
> application_1442063466801_0001
> 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_STOP for appId application_1442063466801_0001
> 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf   
> OPERATION=Container Finished - Succeeded

[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Attachment: YARN-2902.05.patch

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-13 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4140:
---
Attachment: 0001-YARN-4140.patch

Attaching first patch for review

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742554#comment-14742554
 ] 

Hadoop QA commented on YARN-2902:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 47s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 9 new or modified test files. |
| {color:green}+1{color} | javac |  10m 10s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 30s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 25s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | whitespace |   7m  9s | The patch has 10  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 47s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 57s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 13s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   7m 56s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  70m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755609/YARN-2902.05.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7269906 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9103/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9103/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9103/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9103/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9103/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9103/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9103/console |


This message was automatically generated.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3951) Test case failures in TestLogAggregationService, TestResourceLocalizationService

2015-09-13 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R resolved YARN-3951.
-
Resolution: Cannot Reproduce

Was not able to reproduce this and while testing in local setup seems to be 
passing successfully even after several retries. Hence closing this issue

> Test case failures in TestLogAggregationService, 
> TestResourceLocalizationService 
> ---
>
> Key: YARN-3951
> URL: https://issues.apache.org/jira/browse/YARN-3951
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Found some test case failures  in YARN-3045 build which were not related to 
> YARN-3045 patch
> TestContainer.testKillOnLocalizedWhenContainerNotLaunched
> {quote}
> java.lang.AssertionError: expected: but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer.testKillOnLocalizedWhenContainerNotLaunched(TestContainer.java:413)
> {quote}
> TestResourceLocalizationService.testLocalizationHeartbeat
> {quote}
> Wanted but not invoked:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testLocalizationHeartbeat(TestResourceLocalizationService.java:900)
> Actually, there were zero interactions with this mock.
> {quote}
> TestResourceLocalizationService.testPublicResourceAddResourceExceptions
> {quote}
> java.lang.AssertionError: expected null, but was:<\{ \{ 
> file:/local/PRIVATE/ef9783a7514fda92, 2411, FILE, null 
> \},pending,\[(container_314159265358979_0003_01_42)\],2661055154305048,DOWNLOADING}>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceAddResourceExceptions(TestResourceLocalizationService.java:1366)
> {quote}
> TestLogAggregationService.testLogAggregationCreateDirsFailsWithoutKillingNM
> {quote}
> org.mortbay.util.MultiException: Multiple exceptions
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.checkEvents(TestLogAggregationService.java:1046)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationCreateDirsFailsWithoutKillingNM(TestLogAggregationService.java:736)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742662#comment-14742662
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe], kindly review.

The patch at a very high level does the following :
# On container kill, NM(localization service) will create deletion task for all 
the downloading resources and schedule it to run after a configured delay(new 
config added for it). Made the decision to not wait for HB from localizer first 
because we would not want to depend on localizer if there is some problem there 
and it does not send HB.
# On subsequent HB from localizer, NM will indicate to localizer that it can 
delete the downloading resources by itself after cancelling download tasks. 
Added a boolean flag in proto for this.
# After localizer deletes the resources, it will be send last HB to NM. A 
boolean flag has been added in proto to indicate this to NM. On receiving this 
HB, NM will cancel the deletion task so that deletion is not attempted by NM as 
well. Although its not a problem even if we attempt deletion because if nothing 
can be deleted, deletion task wont do anything. But if deletion task can be 
cancelled, then why not.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Attachment: YARN-2902.06.patch

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service

2015-09-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4072:
-
Affects Version/s: 2.8.0
 Target Version/s: 2.8.0
 Priority: Minor  (was: Major)
   Issue Type: Improvement  (was: Bug)

> ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager 
> to support JvmPauseMonitor as a service
> 
>
> Key: YARN-4072
> URL: https://issues.apache.org/jira/browse/YARN-4072
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-4072.patch, HADOOP-12407-001.patch
>
>
> As JvmPauseMonitor is made as an AbstractService, subsequent method changes 
> are needed in all places which uses the monitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service

2015-09-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4072:
-
Attachment: HADOOP-12407-001.patch

aggregate patch

> ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager 
> to support JvmPauseMonitor as a service
> 
>
> Key: YARN-4072
> URL: https://issues.apache.org/jira/browse/YARN-4072
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-4072.patch, HADOOP-12407-001.patch
>
>
> As JvmPauseMonitor is made as an AbstractService, subsequent method changes 
> are needed in all places which uses the monitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Attachment: YARN-2902.06.patch

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742658#comment-14742658
 ] 

Hadoop QA commented on YARN-2902:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 51s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 50s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | checkstyle |   2m 38s | The applied patch generated  6 
new checkstyle issues (total was 344, now 298). |
| {color:green}+1{color} | whitespace |   7m 19s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  3s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 55s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  64m 23s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755626/YARN-2902.06.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7269906 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9107/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9107/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9107/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9107/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9107/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9107/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9107/console |


This message was automatically generated.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Attachment: (was: YARN-2902.06.patch)

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742593#comment-14742593
 ] 

Varun Saxena commented on YARN-2902:


Sorry havent fixed checkstyle, thats related to more than 2000 lines in file.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742612#comment-14742612
 ] 

Hadoop QA commented on YARN-2902:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 32s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 48s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | checkstyle |   2m 36s | The applied patch generated  
10 new checkstyle issues (total was 344, now 306). |
| {color:green}+1{color} | whitespace |   6m 49s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  0s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 47s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  64m  3s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755616/YARN-2902.06.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7269906 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9105/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9105/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9105/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9105/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9105/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9105/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9105/console |


This message was automatically generated.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-09-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742702#comment-14742702
 ] 

Naganarasimha G R commented on YARN-3816:
-

Hi [~djp], Sorry for the delay and pitching in late. 
Few doubts :
# I am not sure whether all the things mentioned in the jira description is 
achieved by the patch. (would it be good to update description with what all 
will be completed as part of the patch?). Following are not completely achieved 
right?
#* number of containers launched/completed/failed,
#* framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to 
show details of states in framework level.
# In the doc, ApplicationState Table (aggregated from 
AppLevelTimelineCollector​) has Container Aggregate metrics (allocated: 0 
preempted:0 failed: 0 reuse: 0 ) is this req @ AppLevelTimelineCollector​ felt 
it should be only @ aggregated from  ​RMTimelineCollector. Also time(start: 
last_modification: avg_execution ) is required as metric ? may be i misread the 
table description ?
# In the doc {{aggregation-design-discussion.pdf}}, you had mentioned that 
{{time average & max}} is what will be considered, but in the patch it seems 
more like only {{SUM}} is supported neither avg or max, so is {{sum}} more imp 
than the other(or am i missing something) ? Also would like to know the 
significance of this measurement as i felt {{per‐container average}} more 
helpful as it can be useful for calibrating RM.
# IIUC Based on the current design aggregation seems to be happening @ the 
collector end. in that case do we require 
{{TimelineWriter.aggregate(TimelineEntity data, TimelineAggregationTrack 
track)}} ? Is there any idea to push some logic to writer for aggregation?
# {{TimelineAggregationBasis}} doesnt have value for {{queue}}, as this is used 
in {{TimelineReaderWebServices}}, inst it required for reader?
# will it be required to accumulate time series data with single value data and 
viceversa ? would accumulation need to be done on same type ? if not some real 
scenarios where it can be possibly happen.
# Would it be better to have set of {{operation}} which can be performed in 
TimelineMetric so that accumulateTo automatically detect and accumulate for 
diff operations ? currently it seems like statically set to {{SUM}} in 
{{TimelineCollecor}}.
# Currently for each putEntity call in collector we are not only aggregating & 
invoking accumulateTo but also sending it to be written to the writer, but in 
the doc its mentioned that it will cache for 15 seconds and then update right?
# Not sure earlier why was {{pid}} added for a container cpu and mem usage 
metric and not sure why we are removing it. But seems like for a given 
container we do not req pid to be appended as it will be unique to it. is that 
the reason we are removing it ?
# do we need to set {{aggregateTo}} to true for container metrics(cputotalCore% 
& pmemUsage) to ? also we are currently not capturing {{vmemUsage}} do we need 
to capture it ?
# In the Doc its mentioned we are going to split the table "ApplicationState 
table" into 2 ??It can be split into two tables by aggregated from 
RMTimelineCollector or AppLevelTimelineCollector?? , is it req?

some nits :
# yarn.timeline-service.aggregation.accumulation.enabled can have default value 
to be explicitly set as true in yarn-default.xml as per the default value in 
yarn config.
# in {{TestTimelineMetric.testAccumulationOnTimelineMetrics}} assertEquals 
expected value should come as first arg and the actual expression as next. when 
it fails exception msg will come wrong. also unused import in that class
# 2 static methods of TimelineCollector.aggregateMetrics(TimelineEntities) are 
public are they planned to be used some other class ? if not we can make it 
private. Also aggregateMetrics returns a map, can it be a List/Set which would 
suffice for {{appendAggregatedMetricsToEntities}}
# EntityColumnPrefix.AGGREGATED_METRICS is not used anywhere, is it req?

Trying to create a setup and test the patch in the cluster, if i come across 
more queries will inform.

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742710#comment-14742710
 ] 

Naganarasimha G R commented on YARN-4140:
-

hi [~bibinchundatt],
Thanks for updating with a patch. Just took a high level look at it. Seems like 
we can optimize a bit here as node labels might not be always set.
Currently you are always looping through twice, instead first while checking 
for any Requests and we can populate anyPrioritymap map with additional check 
{code}
(null != anyResourceRequest.getNodeLabelExpression())
&& (!anyResourceRequest.getNodeLabelExpression().equals(
RMNodeLabelsManager.NO_LABEL))
{code}
after this loop if the map contains any element then we can loop for updating 
node & rack local request with NodeLabelExpression
Also HashMap anyPrioritymap => Map 
priorityToNodeLabelMapping 

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742763#comment-14742763
 ] 

Hudson commented on YARN-2005:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #387 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/387/])
YARN-2005. Blacklisting support for scheduling AMs. (Anubhav Dhoot via kasha) 
(kasha: rev 81df7b586a16f8226c7b01c139c1c70c060399c3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/DisabledBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/TestBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/SimpleBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> 

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-09-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742766#comment-14742766
 ] 

Karthik Kambatla commented on YARN-1680:


Assuming everyone agrees with the approach here, and moving forward with 
YARN-3446. 

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith Sharma K S
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.

2015-09-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742773#comment-14742773
 ] 

Karthik Kambatla commented on YARN-3697:


+1

> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> 

[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3697:
---
Summary: FairScheduler: ContinuousSchedulingThread can fail to shutdown  
(was: FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
sometimes. )

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> 

[jira] [Commented] (YARN-1772) Fair Scheduler documentation should indicate that admin ACLs also give submit permissions

2015-09-13 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742681#comment-14742681
 ] 

Neelesh Srinivas Salian commented on YARN-1772:
---

This looks to be resolved. [~d4rr3ll]'s comment is the line that exists. If 
there needs to be any additional clarity, I can add so accordingly.

[~sandyr]

Thank you.

> Fair Scheduler documentation should indicate that admin ACLs also give submit 
> permissions
> -
>
> Key: YARN-1772
> URL: https://issues.apache.org/jira/browse/YARN-1772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Priority: Minor
>  Labels: newbie
>
> I can submit to a Fair Scheduler queue if I'm in the submit ACL OR if I'm in 
> the administer ACL.  The Fair Scheduler docs seem to leave out the second 
> part. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4149) yarn logs -am should provide an option to fetch all the log files

2015-09-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742682#comment-14742682
 ] 

Xuan Gong commented on YARN-4149:
-

Thanks, varun for the patch. The patch looks good overall, but I have several 
comments:
* Can we do the same for fetching regular containers logs? In current patch, 
looks like that we only do it for am containers
* If the application is finished, looks like that we could not get any logs if 
we specify --logfiles ALL. Because we can not get information from web service 
call. Just set requestedLogFiles as NULL if the application is finished which 
should fix this.
* Are the test case failures related ?

> yarn logs -am should provide an option to fetch all the log files
> -
>
> Key: YARN-4149
> URL: https://issues.apache.org/jira/browse/YARN-4149
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, nodemanager
>Affects Versions: 2.7.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4149.001.patch
>
>
> From [~gopalv] -
> {quote}
> Trying to collect a hanging Tez AM logs, by killing the container and running 
> yarn logs -applicationId application_1437098194051_0178 -am ALL
> The output contains only one log file, which does not contain any of the 
> actual execution logs, only the initialization logs.
> From YARN-3347, I note that
>   // if we do not specify the value for CONTAINER_LOG_FILES option,
>  // we will only output syslog
> This means that the person calling the yarn logs command has to list it out 
> like this, to collect logs 
> yarn logs -applicationId application_1437098194051_0178 -am ALL -logFiles \
> syslog_dag_1437098194051_0178_2_post,\
> dag_1437098194051_0178_2-tez-dag.pb.txt,\
> syslog_dag_1437098194051_0178_2,\
> syslog_dag_1437098194051_0178_1_post,\
> syslog_dag_1437098194051_0178_1,\
> syslog,\
> stdout,\
> stderr,\
> dag_1437098194051_0178_2.dot,\
> dag_1437098194051_0178_1.dot,\
> dag_1437098194051_0178_1-tez-dag.pb.txt
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3455) Document CGroup support

2015-09-13 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742709#comment-14742709
 ] 

Neelesh Srinivas Salian commented on YARN-3455:
---

[~rohit12sh] does this need anything additional?


> Document CGroup support 
> 
>
> Key: YARN-3455
> URL: https://issues.apache.org/jira/browse/YARN-3455
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Rohith Sharma K S
>
> It would be very useful if CGroup support is documented having sections like 
> below
> # Introduction
> # Configuring CGroups
> # Any specific configuration that controls CPU scheduling
> # How/when to use CGroups with some use case expanations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742743#comment-14742743
 ] 

Hudson commented on YARN-2005:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8445 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8445/])
YARN-2005. Blacklisting support for scheduling AMs. (Anubhav Dhoot via kasha) 
(kasha: rev 81df7b586a16f8226c7b01c139c1c70c060399c3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/DisabledBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/SimpleBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/TestBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> 

[jira] [Commented] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare

2015-09-13 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742795#comment-14742795
 ] 

Xianyin Xin commented on YARN-4134:
---

Just found this issue duplicated YARN-3405, but the two has different 
solutions. Since this issue would be addressed in YARN-2154, link them together.

> FairScheduler preemption stops at queue level that all child queues are not 
> over their fairshare
> 
>
> Key: YARN-4134
> URL: https://issues.apache.org/jira/browse/YARN-4134
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4134.001.patch, YARN-4134.002.patch, 
> YARN-4134.003.patch
>
>
> Now FairScheudler uses a choose-a-candidate method to select a container from 
> leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}},
> {code}
> readLock.lock();
> try {
>   for (FSQueue queue : childQueues) {
> if (candidateQueue == null ||
> comparator.compare(queue, candidateQueue) > 0) {
>   candidateQueue = queue;
> }
>   }
> } finally {
>   readLock.unlock();
> }
> // Let the selected queue choose which of its container to preempt
> if (candidateQueue != null) {
>   toBePreempted = candidateQueue.preemptContainer();
> }
> {code}
> a candidate child queue is selected. However, if the queue's usage isn't over 
> it's fairshare, preemption will not happen:
> {code}
> if (!preemptContainerPreCheck()) {
>   return toBePreempted;
> }
> {code}
>  A scenario:
> {code}
> root
>/\
>   queue1   queue2
>/\
>   queue2.3, (  queue2.4  )
> {code}
> suppose there're 8 containers, and queues at any level have the same weight. 
> queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their 
> fairshare. Now we submit an app in queue2.4 with 4 containers needs, it 
> should preempt 2 from queue2.3, but the candidate-containers selection 
> procedure will stop at queue1, so none of the containers will be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742805#comment-14742805
 ] 

Hudson commented on YARN-2005:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #381 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/381/])
YARN-2005. Blacklisting support for scheduling AMs. (Anubhav Dhoot via kasha) 
(kasha: rev 81df7b586a16f8226c7b01c139c1c70c060399c3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/SimpleBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/DisabledBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/TestBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java


> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, 

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742706#comment-14742706
 ] 

Varun Saxena commented on YARN-2902:


Implementation details are as under :

(FSDownload.java)
* While going through the code, found one issue in current code. 
FSDownload#call is a mix of interruptible and uninterruptible code. This means 
that when {{Future#cancel(true)}} is called, the code might not be interrupted 
and we may continue downloading, even though CancellationException is thrown on 
Future#get. But in ContainerLocalizer, on CancellationException we send 
FETCH_FAILURE back to NM and do not do any cleanup(deletion of downloaded 
resources) expecting FSDownload#call to do the cleanup as it indeed does on 
other exceptions. FSDownload does not provide any method to do the cleanup 
either. So to resolve this issue, have added a wait/notify construct to let 
download task complete(either through exception or normally) and then do the 
cleanup. Thought of checking for thread being interrupted at the end of 
FSDownload#call but FileContext#rename in the code resets the interrupted flag 
so going with this solution.
This will increase the time taken for localizer to send last heartbeat(HB) to 
NM though.
*Should I raise another JIRA for this ?*
 
(ContainerLocalizer.java)
* A set of paths is maintained for the resources which were successfully 
downloaded and reported to NM in HB. This is done because if container has been 
killed, NM would pay no heed to this FETCH_SUCCESS status. So if in HB rsp, NM 
indicates that resources have to be deleted, we need a reference of what we 
sent in the HB as well. This extra set is required because we currently do not 
wait for rsp from NM before deleting entry from pending resources map (delete 
it at the time of sending HB).  Should we change it ?
* We use this set of paths to do the deletion of resources reported to NM at 
the time of localizer DIE. pendingResources map is used for cleanup of 
resources which have not been reported to NM

(ResourceLocalizationService.java)
*  On container kill, iterate over all the scheduled resources and for the 
resources which are in downloading state, schedule a deletion task which runs 
after a delay specified by config 
yarn.nodemanager.localizer.downloading-rsrcs.deletion.wait-sec 
* An additional map of localizer id(key) and deletion task future + deletion 
task id for state store(value) is maintained for cancellation of deletion task 
later(upon HB from localizer).
* On localizer HB, we would try to cancel the deletion task and remove the task 
from state store. Moreover, a custom deletion task is created which calls the 
FileDeletionTask and also removes the entry from map above after deletion task 
completes. This however can lead to a minor race for removal of task from state 
store(i.e. race due to task completion and cancellation) if the code flow is 
not interruptible. But I guess that wont be a problem as removal of task by ID 
from leveldb would simply not do anything if key does not exist.
* In case of NM restart(which should be quite rare), the map above would be 
lost. But that should be fine. Only side effect of this is that running 
deletion tasks may not be cancelled. But even if we attempt to delete a non 
existent directory, it should not be a problem.

(DeletionService.java and Container Executor(s))
* Made relevant changes in DeletionService to ensure scheduling of deletion 
tasks with a delay.
* Made changes to ensure deletion task's future and task id used for state 
store are returned back to caller for cancellation later. Also added code for 
cancellation of running task.
* Added a new flag "ignoreMissingDir" in deletion task. Currently executors 
will throw an error if any of directory missing amongst the list of baseDirs 
passed for deletion do not exist. But for this use case we would ignore such 
missing dir and continue with deletion of other directories. Because NM may 
have inconsistent view of what all needs to be deleted.  The flag hence has 
been added to support this new behavior and continue with old behavior(if flag 
is false).
* Changes have been made in {{container-executor}} as well for support of this 
flag. An additional command line parameter has been added (for delete as user 
command). A value of 0 is analogous to the flag being false and 1 means true.

Few additional points to note 
# _tmp directories should be deleted by localizer itself(in FSDownload). 
Have added it because as you said localizers can sometimes turn rogue. In this 
case, NM will delete these dirs. Should I handle tmp dirs then ?
# I am assuming it would be fine to change command line parameters for a 
specific container-executor command in 2.8. If we do not want to change command 
line parameters for {{container-executor}}, there can be workarounds.


> Killing a container that is localizing can orphan 

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742735#comment-14742735
 ] 

Karthik Kambatla commented on YARN-2005:


Looks good to me, trusting my previous review. 

+1. Checking this in. 

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch, YARN-2005.009.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742779#comment-14742779
 ] 

Karthik Kambatla commented on YARN-3697:


Committed to trunk and branch-2. 

TestAsyncDispatcher was failing against branch-2.7. [~zxu] - could you please 
look into it? 

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> 

[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742780#comment-14742780
 ] 

Hudson commented on YARN-3697:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8446 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8446/])
YARN-3697. FairScheduler: ContinuousSchedulingThread can fail to shutdown. 
(Zhihai Xu via kasha) (kasha: rev 332b520a480994b7bd56c135f7941aad30b05e9c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR 

[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2015-09-13 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742793#comment-14742793
 ] 

Xianyin Xin commented on YARN-4120:
---

hi [~asuresh], thanks for your comment. I've go through YARN-2154, i believe it 
is a nice solution for the problems of current preemption logic. But i think 
the current patch of YARN-2154 could not solve the issue raised in this jira 
(please correct me if i wrongly understood YARN-2154.). We should distinguish 
{{usage}} and {{usage - preemption}} in {{getResourceUsgae}}, because 
{{getResourceUsage}} is used both by the preemption logic and the resource 
allocation logic. Of course we can consider this in the new implemention in 
YARN-2154 and solve them together.

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742801#comment-14742801
 ] 

Hudson commented on YARN-2005:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1119 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1119/])
YARN-2005. Blacklisting support for scheduling AMs. (Anubhav Dhoot via kasha) 
(kasha: rev 81df7b586a16f8226c7b01c139c1c70c060399c3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/SimpleBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/TestBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/DisabledBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> 

[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2015-09-13 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742812#comment-14742812
 ] 

Xianyin Xin commented on YARN-4120:
---

Hi [~kasha], [~asuresh], [~ashwinshankar77], now both the preemption logic and 
resource allocation logic uses {{comparator}} to sort the {{Schedulables}}. I 
think we have to introduce a different comparator to separate {{usage}} and 
{{usage - preemption}}, just as the patch in YARN-4134. There're also some 
discussion on changing {{Comparator.compare()}} in YARN-3453. I think for a 
collection of comparables, we can use different comparators to compare 
different attributes for different purpose. Any thoughts?

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742827#comment-14742827
 ] 

Hudson commented on YARN-3697:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #388 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/388/])
YARN-3697. FairScheduler: ContinuousSchedulingThread can fail to shutdown. 
(Zhihai Xu via kasha) (kasha: rev 332b520a480994b7bd56c135f7941aad30b05e9c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR 

[jira] [Updated] (YARN-4089) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-09-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4089:
---
Labels:   (was: patch)

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-4089
> URL: https://issues.apache.org/jira/browse/YARN-4089
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Shiwei Guo
> Attachments: YARN-4089.001.patch
>
>
> There is a  race condition of calling 
> AbstractYarnScheduler.completedContainer, which will cause the usedResource 
> counter of application not accurate. At worst situation, the scheduler will 
> not allocate any resource to any application in some queue( when the 
> usedResource became negative) even there is indeed lots of free resource to 
> be allocated.
> It also cause the Scheduler UI and metrics report negative resource usage 
> value.In our cluster, it has the ability to run 13000+ container, but the WEB 
> UI says that:
> - Containers Running: -26546
> - Memory Used: -82.38 TB
> - VCores Used: -26451
> This is how it happens in FairSchedular:
> completedContainer method will call application.containerCompleted, which 
> will subtraction the resources used by this container from the usedResource 
> counter of the application. So, if the completedContainer are called twice 
> with the same container, the counter is subtracted too much values. So is the 
> updateRootQueueMetrics call, so we can see negative allocatedMemory on 
> rootQueue.
> The solution is to check whether the container being supplied is still live 
> inside the completedContainer (as shown in the patch). There is some check 
> before calling completedContainer, but that's not enough.
> For a more deeply discussion, the completedContainer may be called from two 
> place:
> 1. Trigered by RMContainerEventType.FINISHED event:
> {code:title=FairScheduler.nodeUpdate}
> // Process completed containers
> for (ContainerStatus completedContainer : completedContainers) {
>   ContainerId containerId = completedContainer.getContainerId();
>   LOG.debug("Container FINISHED: " + containerId);
>   completedContainer(getRMContainer(containerId),
>   completedContainer, RMContainerEventType.FINISHED);
> }
> {code}
> 2. Trigered by RMContainerEventType.RELEASED
> {code:title=AbstractYarnScheduler.releaseContainers}
> completedContainer(rmContainer,
> SchedulerUtils.createAbnormalContainerStatus(containerId,
>   SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED);
> {code}
> RMContainerEventType.RELEASED is not triggered by MapReduce 
> ApplicationMaster, so we won't see this problem on MR jobs. But TEZ will 
> triggered it when it do not need this this container, while the NodeManger 
> will also report a container complete message to RM ,which in turn trigger 
> the RMContainerEventType.FINISHED event. If RMContainerEventType.FINISHED 
> event comes to RM early than TEZ AM, the problem happens.
> This behavior can be more easily seen if the cluster had setup a 
> TimelineServer for TEZ, which make it more likely TEZ AM will send 
> RMContainerEventType.RELEASED event later than NM send 
> RMContainerEventType.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-09-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3933:
---
Labels:   (was: patch)

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
> Attachments: YARN-3933.001.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742810#comment-14742810
 ] 

zhihai xu commented on YARN-3697:
-

This looks like not an issue. I try the latest code with YARN-3697 at branch-2. 
 TestAsyncDispatcher was succeeding.

---
 T E S T S
---

---
 T E S T S
---
Running org.apache.hadoop.yarn.event.TestAsyncDispatcher
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.606 sec - in 
org.apache.hadoop.yarn.event.TestAsyncDispatcher

Results :

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> 

[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742811#comment-14742811
 ] 

zhihai xu commented on YARN-3697:
-

thanks [~kasha] for reviewing and committing the patch!

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> 

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742820#comment-14742820
 ] 

Hudson commented on YARN-2005:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2306 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2306/])
YARN-2005. Blacklisting support for scheduling AMs. (Anubhav Dhoot via kasha) 
(kasha: rev 81df7b586a16f8226c7b01c139c1c70c060399c3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/SimpleBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/TestBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistUpdates.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/DisabledBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistManager.java


> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> 

[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742819#comment-14742819
 ] 

zhihai xu commented on YARN-3697:
-

Yes, you are right. TestAsyncDispatcher was failing at  branch-2.7 not branch-2.

---
 T E S T S
---
Running org.apache.hadoop.yarn.event.TestAsyncDispatcher
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.549 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.event.TestAsyncDispatcher
testDispatchStopOnTimeout(org.apache.hadoop.yarn.event.TestAsyncDispatcher)  
Time elapsed: 10.007 sec  <<< ERROR!
java.lang.Exception: test timed out after 1 milliseconds
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:148)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
at 
org.apache.hadoop.yarn.event.TestAsyncDispatcher.testDispatchStopOnTimeout(TestAsyncDispatcher.java:76)


Results :

Tests in error: 
  TestAsyncDispatcher.testDispatchStopOnTimeout:76->Object.wait:-2 »  test 
timed...

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 

[jira] [Updated] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-09-13 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3216:
--
Attachment: 0002-YARN-3216.patch

Attaching an initial work in progress patch. I will add tests in coming patch. 
Kindly help to check the same.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742776#comment-14742776
 ] 

Hadoop QA commented on YARN-3697:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751923/YARN-3697.001.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 332b520 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9108/console |


This message was automatically generated.

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=

[jira] [Commented] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2015-09-13 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742797#comment-14742797
 ] 

Xianyin Xin commented on YARN-2154:
---

The new logic would solve the issue raised in YARN-3405 and YARN-4134. link 
them for tracking.

> FairScheduler: Improve preemption to preempt only those containers that would 
> satisfy the incoming request
> --
>
> Key: YARN-2154
> URL: https://issues.apache.org/jira/browse/YARN-2154
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Arun Suresh
>Priority: Critical
> Attachments: YARN-2154.1.patch
>
>
> Today, FairScheduler uses a spray-gun approach to preemption. Instead, it 
> should only preempt resources that would satisfy the incoming request. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742798#comment-14742798
 ] 

Hadoop QA commented on YARN-3405:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727554/YARN-3405.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 332b520 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9109/console |


This message was automatically generated.

> FairScheduler's preemption cannot happen between sibling in some case
> -
>
> Key: YARN-3405
> URL: https://issues.apache.org/jira/browse/YARN-3405
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: YARN-3405.01.patch, YARN-3405.02.patch
>
>
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100
> # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, and it cause some new preemption request for 
> fairshare 25.
> # When preemption from root, it has possibility to find preemption candidate 
> is queue-2. If so preemptContainerPreCheck for queue-2 return false because 
> it's equal to its fairshare.
> # Finally queue-1-2 will be waiting for resource release form queue-1-1 
> itself.
> What I expect here is that queue-1-2 preempt from queue-1-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742813#comment-14742813
 ] 

Hudson commented on YARN-3697:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1120 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1120/])
YARN-3697. FairScheduler: ContinuousSchedulingThread can fail to shutdown. 
(Zhihai Xu via kasha) (kasha: rev 332b520a480994b7bd56c135f7941aad30b05e9c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742833#comment-14742833
 ] 

Hudson commented on YARN-2005:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2329 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2329/])
YARN-2005. Blacklisting support for scheduling AMs. (Anubhav Dhoot via kasha) 
(kasha: rev 81df7b586a16f8226c7b01c139c1c70c060399c3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/TestBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/DisabledBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/SimpleBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> 

[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-09-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3635:
-
Attachment: YARN-3635.8.patch

Rebased to latest trunk (ver.8)

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch, YARN-3635.7.patch, 
> YARN-3635.8.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4153) TestAsyncDispatcher failed at branch-2.7

2015-09-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742882#comment-14742882
 ] 

zhihai xu commented on YARN-4153:
-

I uploaded a patch YARN-4153.000.patch which add the missing code from 
YARN-3999 to branch-2.7. The test TestAsyncDispatcher passed with the patch:
{code}
---
 T E S T S
---
Running org.apache.hadoop.yarn.event.TestAsyncDispatcher
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.583 sec - in 
org.apache.hadoop.yarn.event.TestAsyncDispatcher
Results :
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
{code}

> TestAsyncDispatcher failed at branch-2.7
> 
>
> Key: YARN-4153
> URL: https://issues.apache.org/jira/browse/YARN-4153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4153.000.patch
>
>
> TestAsyncDispatcher failed at branch-2.7. It is because the change from 
> YARN-3999 didn't merge to branch-2.7 completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4153) TestAsyncDispatcher failed at branch-2.7

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742886#comment-14742886
 ] 

Hadoop QA commented on YARN-4153:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755654/YARN-4153.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 332b520 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9115/console |


This message was automatically generated.

> TestAsyncDispatcher failed at branch-2.7
> 
>
> Key: YARN-4153
> URL: https://issues.apache.org/jira/browse/YARN-4153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4153.000.patch
>
>
> TestAsyncDispatcher failed at branch-2.7. It is because the change from 
> YARN-3999 didn't merge to branch-2.7 completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742925#comment-14742925
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 16s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 48s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 46s | The applied patch generated  
14 new checkstyle issues (total was 234, now 241). |
| {color:red}-1{color} | whitespace |   0m  6s | The patch has 15  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  45m 40s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  84m  5s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | 
hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755647/YARN-3635.8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 332b520 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9112/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9112/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9112/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9112/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9112/console |


This message was automatically generated.

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch, YARN-3635.7.patch, 
> YARN-3635.8.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to 

[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742930#comment-14742930
 ] 

Hudson commented on YARN-3697:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #367 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/367/])
YARN-3697. FairScheduler: ContinuousSchedulingThread can fail to shutdown. 
(Zhihai Xu via kasha) (kasha: rev 332b520a480994b7bd56c135f7941aad30b05e9c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR 

[jira] [Commented] (YARN-4153) TestAsyncDispatcher failed at branch-2.7

2015-09-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742887#comment-14742887
 ] 

zhihai xu commented on YARN-4153:
-

The patch can only be applied to branch-2.7

> TestAsyncDispatcher failed at branch-2.7
> 
>
> Key: YARN-4153
> URL: https://issues.apache.org/jira/browse/YARN-4153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4153.000.patch
>
>
> TestAsyncDispatcher failed at branch-2.7. It is because the change from 
> YARN-3999 didn't merge to branch-2.7 completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742906#comment-14742906
 ] 

Hudson commented on YARN-3697:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #382 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/382/])
YARN-3697. FairScheduler: ContinuousSchedulingThread can fail to shutdown. 
(Zhihai Xu via kasha) (kasha: rev 332b520a480994b7bd56c135f7941aad30b05e9c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR 

[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change

2015-09-13 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-4154:
-
Description: 
{code}
[ERROR] 
/mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
 no suitable constructor found for 
MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)

{code}

> Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
> ---
>
> Key: YARN-4154
> URL: https://issues.apache.org/jira/browse/YARN-4154
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jeff Zhang
>
> {code}
> [ERROR] 
> /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
>  no suitable constructor found for 
> MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change

2015-09-13 Thread Jeff Zhang (JIRA)
Jeff Zhang created YARN-4154:


 Summary: Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster 
change
 Key: YARN-4154
 URL: https://issues.apache.org/jira/browse/YARN-4154
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jeff Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change

2015-09-13 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-4154:
-
Description: 
{code}
[ERROR] 
/mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
 no suitable constructor found for 
MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)

{code}

\cc [~vinodkv]

  was:
{code}
[ERROR] 
/mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
 no suitable constructor found for 
MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)

{code}


> Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
> ---
>
> Key: YARN-4154
> URL: https://issues.apache.org/jira/browse/YARN-4154
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jeff Zhang
>
> {code}
> [ERROR] 
> /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
>  no suitable constructor found for 
> MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> {code}
> \cc [~vinodkv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-09-13 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2513:
--
Attachment: YARN-2513.v4.patch

[~hitesh], fixed the NPE on multiple UIs hosted. It looks like the 
filterHolders were being invalidated after the first UI filter was defined.

[~zjshen], I added yarn-default.xml and TimelineServer.md changes. I didn't get 
a chance to add the test you suggest.

[~xgong], I have updated the patch to help get this going again.

> Host framework UIs in YARN for use with the ATS
> ---
>
> Key: YARN-2513
> URL: https://issues.apache.org/jira/browse/YARN-2513
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>  Labels: 2.6.1-candidate
> Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
> YARN-2513.v3.patch, YARN-2513.v4.patch
>
>
> Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
> infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742854#comment-14742854
 ] 

Hudson commented on YARN-2005:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #366 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/366/])
YARN-2005. Blacklisting support for scheduling AMs. (Anubhav Dhoot via kasha) 
(kasha: rev 81df7b586a16f8226c7b01c139c1c70c060399c3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/SimpleBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/TestBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/DisabledBlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/blacklist/BlacklistManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> 

[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-9.YARN-1197.patch

Uploaded ver.9, fixed a bunch of long lines.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch, YARN-1651-7.YARN-1197.patch, 
> YARN-1651-8.YARN-1197.patch, YARN-1651-9.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4153) TestAsyncDispatcher failed at branch-2.7

2015-09-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4153:

Attachment: YARN-4153.000.patch

> TestAsyncDispatcher failed at branch-2.7
> 
>
> Key: YARN-4153
> URL: https://issues.apache.org/jira/browse/YARN-4153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4153.000.patch
>
>
> TestAsyncDispatcher failed at branch-2.7. It is because the change from 
> YARN-3999 didn't merge to branch-2.7 completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742896#comment-14742896
 ] 

zhihai xu commented on YARN-3697:
-

Hi [~kasha], TestAsyncDispatcher failed without my change. I create YARN-4153 
to fix TestAsyncDispatcher failure at branch-2.7.
We should commit YARN-3697 to branch-2.7 after YARN-4153.

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> 

[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change

2015-09-13 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-4154:
-
Description: 
{code}
[ERROR] 
/mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
 no suitable constructor found for 
MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)

{code}

MR might have the same issue.

\cc [~vinodkv]



  was:
{code}
[ERROR] 
/mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
 no suitable constructor found for 
MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)
constructor 
org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
 is not applicable
  (actual and formal argument lists differ in length)

{code}

\cc [~vinodkv]


> Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
> ---
>
> Key: YARN-4154
> URL: https://issues.apache.org/jira/browse/YARN-4154
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jeff Zhang
>
> {code}
> [ERROR] 
> /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
>  no suitable constructor found for 
> MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> {code}
> MR might have the same issue.
> \cc [~vinodkv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742978#comment-14742978
 ] 

Hadoop QA commented on YARN-1651:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   2m 59s | YARN-1197 compilation may be 
broken. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 23 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |  10m  1s | The applied patch generated  65  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 16s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |  51m 21s | The patch has 177  line(s) 
that end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   7m  0s | The patch appears to introduce 8 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests |   9m  9s | Tests failed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | tools/hadoop tests |   0m 53s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   6m 51s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  0s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  55m 28s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 158m 38s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755643/YARN-1651-8.YARN-1197.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-1197 / 78ad04d |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/diffJavadocWarnings.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9111/console |


This message was automatically generated.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, 

[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-13 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4140:
---
Attachment: 0002-YARN-4140.patch

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-8.YARN-1197.patch

Attached ver.8, rebased to latest trunk.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch, YARN-1651-7.YARN-1197.patch, 
> YARN-1651-8.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4153) TestAsyncDispatcher failed at branch-2.7

2015-09-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742868#comment-14742868
 ] 

zhihai xu commented on YARN-4153:
-

The test failure is:
---
T E S T S
---
Running org.apache.hadoop.yarn.event.TestAsyncDispatcher
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.549 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.event.TestAsyncDispatcher
testDispatchStopOnTimeout(org.apache.hadoop.yarn.event.TestAsyncDispatcher) 
Time elapsed: 10.007 sec <<< ERROR!
java.lang.Exception: test timed out after 1 milliseconds
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:148)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
at 
org.apache.hadoop.yarn.event.TestAsyncDispatcher.testDispatchStopOnTimeout(TestAsyncDispatcher.java:76)
Results :
Tests in error: 
TestAsyncDispatcher.testDispatchStopOnTimeout:76->Object.wait:-2 » test timed...

> TestAsyncDispatcher failed at branch-2.7
> 
>
> Key: YARN-4153
> URL: https://issues.apache.org/jira/browse/YARN-4153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> TestAsyncDispatcher failed at branch-2.7. It is because the change from 
> YARN-3999 didn't merge to branch-2.7 completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4153) TestAsyncDispatcher failed at branch-2.7

2015-09-13 Thread zhihai xu (JIRA)
zhihai xu created YARN-4153:
---

 Summary: TestAsyncDispatcher failed at branch-2.7
 Key: YARN-4153
 URL: https://issues.apache.org/jira/browse/YARN-4153
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: zhihai xu
Assignee: zhihai xu


TestAsyncDispatcher failed at branch-2.7. It is because the change from 
YARN-3999 didn't merge to branch-2.7 completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742915#comment-14742915
 ] 

Hadoop QA commented on YARN-2513:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  7s | Site still builds. |
| {color:red}-1{color} | checkstyle |   1m 46s | The applied patch generated  6 
new checkstyle issues (total was 211, now 216). |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  1 
new checkstyle issues (total was 38, now 38). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  5s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 15s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  59m 19s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755650/YARN-2513.v4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 332b520 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9113/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9113/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9113/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9113/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9113/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9113/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9113/console |


This message was automatically generated.

> Host framework UIs in YARN for use with the ATS
> ---
>
> Key: YARN-2513
> URL: https://issues.apache.org/jira/browse/YARN-2513
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>  Labels: 2.6.1-candidate
> Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
> YARN-2513.v3.patch, YARN-2513.v4.patch
>
>
> Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
> infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742963#comment-14742963
 ] 

Hadoop QA commented on YARN-1651:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   3m 10s | YARN-1197 compilation may be 
broken. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 23 new or modified test files. |
| {color:green}+1{color} | javac |   8m 15s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |  10m 17s | The applied patch generated  65  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 21s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |  37m 18s | The patch has 177  line(s) 
that end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   7m  9s | The patch appears to introduce 8 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 25s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | tools/hadoop tests |   0m 53s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  3s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  55m 46s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 146m 45s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755643/YARN-1651-8.YARN-1197.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-1197 / 78ad04d |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/diffJavadocWarnings.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9110/console |


This message was automatically generated.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, 
> YARN-1651-6.YARN-1197.patch, 

[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change

2015-09-13 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-4154:
-
Affects Version/s: 2.6.1
 Target Version/s: 2.6.1
 Priority: Blocker  (was: Major)

> Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
> ---
>
> Key: YARN-4154
> URL: https://issues.apache.org/jira/browse/YARN-4154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Jeff Zhang
>Priority: Blocker
>
> {code}
> [ERROR] 
> /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5]
>  no suitable constructor found for 
> MiniYARNCluster(java.lang.String,int,int,int,int,boolean)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> constructor 
> org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int)
>  is not applicable
>   (actual and formal argument lists differ in length)
> {code}
> MR might have the same issue.
> \cc [~vinodkv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742965#comment-14742965
 ] 

Hudson commented on YARN-3697:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2330 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2330/])
YARN-3697. FairScheduler: ContinuousSchedulingThread can fail to shutdown. 
(Zhihai Xu via kasha) (kasha: rev 332b520a480994b7bd56c135f7941aad30b05e9c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR 

[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742972#comment-14742972
 ] 

Hudson commented on YARN-3697:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2307 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2307/])
YARN-3697. FairScheduler: ContinuousSchedulingThread can fail to shutdown. 
(Zhihai Xu via kasha) (kasha: rev 332b520a480994b7bd56c135f7941aad30b05e9c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling]