[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

MENG DING (JIRA) Wed, 20 May 2015 07:25:37 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552375#comment-14552375
 ]


MENG DING commented on YARN-1902:
---------------------------------

I have been experimenting with the idea of changing AppSchedulingInfo to 
maintain a total request table, a fulfilled allocation table, and then 
calculate the difference of the two tables as the real outstanding request 
table used for scheduling. All is fine until I realized that this cannot handle 
one use case where a AMRMClient, right before sending the allocation heartbeat, 
removes all container requests, and add new container requests at the same 
priority and location (possibly with different resource capability).  
AppSchedulingInfo does not know about this, and may not treat the newly added 
container requests as outstanding requests.

I agree that currently I do not see a clean solution without affecting backward 
compatibility. 

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1902
>                 URL: https://issues.apache.org/jira/browse/YARN-1902
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0
>            Reporter: Sietse T. Au
>            Assignee: Sietse T. Au
>              Labels: client
>         Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map<Resource, 
> ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

Reply via email to