[
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546209#comment-14546209
]
MENG DING commented on YARN-1902:
---------------------------------
I was almost going to log the same issue when I saw this thread (and also
YARN-3020) :-).
After reading all the discussions, and after reading the related code, I still
believe this is a bug.
I understand what [~bikassaha] has said that the AM-RM protocol is NOT a delta
protocol, and that currently user (i.e., ApplicationMaster) is responsible for
calling removeContainerRequest() after receiving an allocation, but consider
the follow simple modification to the packaged *distributedshell* application:
{code:title=ApplicationMaster.java|borderStyle=solid}
---
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
+++
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
@@ -805,6 +805,8 @@ public void onContainersAllocated(List<Container>
allocatedContainers) {
// as all containers may not be allocated at one go.
launchThreads.add(launchThread);
launchThread.start();
+ ContainerRequest containerAsk = setupContainerAskForRM();
+ amRMClient.removeContainerRequest(containerAsk);
}
}
{code}
The code simply removes a container request after successfully receiving an
allocated container in the ApplicationMaster. When you submit this application
by specifying, say, 3 containers on the CLI, you will sometimes get 4
containers allocated (not counting the AM container)!
{code}
root@node2:~# hadoop
org.apache.hadoop.yarn.applications.distributedshell.Client -jar
/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
-shell_command "sleep 100000" -num_containers 3 -timeout 200000000
{code}
{code}
root@node2:~# yarn container -list appattempt_1431531743796_0015_000001
15/05/15 20:49:01 INFO client.RMProxy: Connecting to ResourceManager at
node2/10.211.55.102:8032
15/05/15 20:49:01 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Total number of containers :5
Container-Id Start Time Finish Time
State Host Node Http Address
LOG-URL
container_1431531743796_0015_01_000005 Fri May 15 20:44:12 +0000 2015
N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_000005/root
container_1431531743796_0015_01_000001 Fri May 15 20:44:06 +0000 2015
N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_000001/root
container_1431531743796_0015_01_000002 Fri May 15 20:44:10 +0000 2015
N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_000002/root
container_1431531743796_0015_01_000004 Fri May 15 20:44:11 +0000 2015
N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_000004/root
container_1431531743796_0015_01_000003 Fri May 15 20:44:10 +0000 2015
N/A RUNNING node4:41128
http://node4:8042
http://node4:8042/node/containerlogs/container_1431531743796_0015_01_000003/root
{code}
The *fundamental* problem here, I believe, is that the AMRMClient maintains an
internal request table *remoteRequestsTable* that keeps track of *total*
container requests (i.e., including container requests that have been
satisfied, and that are not yet satisfied):
{code:title=AMRMClient.java|borderStyle=solid}
protected final
Map<Priority, Map<String, TreeMap<Resource, ResourceRequestInfo>>>
remoteRequestsTable =
new TreeMap<Priority, Map<String, TreeMap<Resource,
ResourceRequestInfo>>>();
{code}
However, the corresponding table *requests* at the scheduler side (inside
AppSchedulingInfo.java) keeps track of *outstanding* container requests (i.e.,
container requests that are not yet satisfied):
{code:title=AppSchedulingInfo.java|borderStyle=solid}
final Map<Priority, Map<String, ResourceRequest>> requests =
new ConcurrentHashMap<Priority, Map<String, ResourceRequest>>();
{code}
Every time an allocation is successfully made, the decResourceRequest() or
decrementOutstanding() call will update the *requests* table so that it only
contains outstanding requests, but unfortunately, every time an
ApplicationMaster heartbeat comes, the same *requests* table is updated by the
updateResourceRequests() call with the total requests coming from AMRMClient.
This inconsistent view of total requests from AMRMClient side, and the
outstanding requests from the Scheduler side, in my opinion, is very confusing
to say the least.
I see that a solution has already been proposed by [~wangda] in YARN-3020,
which I think is the correct thing to do:
{quote}
maybe we should add a default implementation to deduct pending resource
requests by prioirty/resource-name/capacity of allocated containers
automatically (User can disable this default behavior, implement their own
logic to deduct pending resource requests.)
{quote}
This solution will make *remoteRequestsTable* in AMRMClient only keep track of
outstanding container requests, which is then consistent with the *requests*
table at the Scheduler side.
Any comments or thoughts? We are currently investigating YARN-1197, and are
faced with a similar issue with properly tracking container resource increase
requests at both client and server side.
Thanks,
Meng
> Allocation of too many containers when a second request is done with the same
> resource capability
> -------------------------------------------------------------------------------------------------
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Affects Versions: 2.2.0, 2.3.0, 2.4.0
> Reporter: Sietse T. Au
> Assignee: Sietse T. Au
> Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is
> called z times with x, allocate is called and at least one of the z allocated
> containers is started, then if another addContainerRequest call is done and
> subsequently an allocate call to the RM, (z+1) containers will be allocated,
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls.
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1)
> are requested in both scenarios, but that only in the second scenario, the
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused
> by the structure of the remoteRequestsTable. The consequence of Map<Resource,
> ResourceRequestInfo> is that ResourceRequestInfo does not hold any
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers
> received.
> The solution implemented is to initialize a new ResourceRequest in
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)