[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558618#comment-14558618 ]
Hadoop QA commented on YARN-3655: --------------------------------- \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 14s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12735229/YARN-3655.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ada233b | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8077/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8077/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8077/console | This message was automatically generated. > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > ----------------------------------------------------------------------------------------- > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.7.0 > Reporter: zhihai xu > Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch, YARN-3655.003.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List<ResourceRequest> ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)