[
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575924#comment-14575924
]
Hadoop QA commented on YARN-3655:
---------------------------------
\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch | 15m 2s | Findbugs (version ) appears to
be broken on trunk. |
| {color:green}+1{color} | @author | 0m 0s | The patch does not contain any
@author tags. |
| {color:green}+1{color} | tests included | 0m 0s | The patch appears to
include 1 new or modified test files. |
| {color:green}+1{color} | javac | 7m 35s | There were no new javac warning
messages. |
| {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc
warning messages. |
| {color:green}+1{color} | release audit | 0m 22s | The applied patch does
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle | 0m 23s | There were no new checkstyle
issues. |
| {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that
end in whitespace. |
| {color:green}+1{color} | install | 1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with
eclipse:eclipse. |
| {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests | 50m 17s | Tests failed in
hadoop-yarn-server-resourcemanager. |
| | | 86m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests |
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL |
http://issues.apache.org/jira/secure/attachment/12738190/YARN-3655.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 71de367 |
| hadoop-yarn-server-resourcemanager test log |
https://builds.apache.org/job/PreCommit-YARN-Build/8208/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-YARN-Build/8208/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output |
https://builds.apache.org/job/PreCommit-YARN-Build/8208/console |
This message was automatically generated.
> FairScheduler: potential livelock due to maxAMShare limitation and container
> reservation
> -----------------------------------------------------------------------------------------
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.7.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch,
> YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container
> reservation.
> If a node is reserved by an application, all the other applications don't
> have any chance to assign a new container on this node, unless the
> application which reserves the node assigns a new container on this node or
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and
> fail to get a new container due to maxAMShare limitation, it will block all
> other applications to use the nodes it reserves. If all other running
> applications can't release their AM containers due to being blocked by these
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
> List<ResourceRequest> ask = appSchedulingInfo.getAllResourceRequests();
> if (ask.isEmpty() || !getQueue().canRunAppAM(
> ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Skipping allocation because maxAMShare limit would " +
> "be exceeded");
> }
> return Resources.none();
> }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM
> container on the node due to Max AM share limitation and the node is reserved
> by the application.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)