[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188516#comment-14188516
 ] 

Bikas Saha commented on YARN-1902:
----------------------------------

bq. Given a ContainerRequest x with Resource y, when addContainerRequest is 
called z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Firstly, I am not sure if the same ContainerRequest object can be passed 
multiple times in addContainerRequest. It should be different objects each time 
(even if they point to the same resource). This might have something to do with 
the internal book-keeping done for matching requests.

Secondly, after z requests are made and 1 allocation is received then z-1 
requests remain. If you are using AMRMClientImpl then its your (users) 
responsibility to call removeContainerRequest() for the request that was 
matched to this container. The AMRMClient does not know which of your z 
requests could be assigned to this container. So in the general case, it cannot 
automatically remove a request from the internal table because it does not know 
which request to remove. If the javadocs dont clarify these semantics then we 
can improve the javadocs.

Thirdly, the protocol between the AMRMClient and the RM has an inherent race. 
So if the client had earlier asked for z containers and in the next heartbeat 
reduces that to z-1, the RM may actually return z containers to it because it 
had already allocated them to this client before the client updated the RM with 
the new value.

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1902
>                 URL: https://issues.apache.org/jira/browse/YARN-1902
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0
>            Reporter: Sietse T. Au
>              Labels: client
>         Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map<Resource, 
> ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to