[ 
https://issues.apache.org/jira/browse/YARN-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684104#comment-13684104
 ] 

Maysam Yabandeh commented on YARN-779:
--------------------------------------

Thanks [~sandyr]. Let me run by you my understanding of the problem, to ensure 
that we are on the same page. The reported erroneous scenario could be 
addressed by reseting the outstanding requests at RM, whenever ANY gets 0. The 
actual problem, however, still remains since the AMRMClient receives a 
ContainerRequest and decomposes it into independent ResourceRequests. The 
information about the disjunction between the requested resources is, thus, not 
available at RM to properly maintain the list of outstanding requests. Building 
on top of the original example, here is the erroneous scenario:

{code}
@AMRMClient
ContainerRequest(..., {node1, node2}, ..., 10)
ContainerRequest(..., {node3}, ..., 5)
{code}

The internal state at RM will be:

{code}
@AppSchedulingInfo
Resource  #
-------------
node1    10
node2    10
node3    5
ANY      15
{code}

In other words, the original request of "(10*(node1 or node2)) and 5*node3"  
could be interpreted in different way such as "10*node1 and (5*(node2 or 
node3))". If my understanding is correct, then solution lies in changing the 
API between AM and RM, to also send the original disjunction between the 
requested resources. We then need to change the AppSchedulingInfo to properly 
maintain the added information. Does this makes sense?


                
> AMRMClient should clean up dangling unsatisfied request
> -------------------------------------------------------
>
>                 Key: YARN-779
>                 URL: https://issues.apache.org/jira/browse/YARN-779
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.0.4-alpha
>            Reporter: Alejandro Abdelnur
>            Assignee: Maysam Yabandeh
>            Priority: Critical
>
> If an AMRMClient allocates a ContainerRequest for 10 containers in node1 or 
> node2 is placed (assuming a single rack) the resulting ResourceRequests will 
> be
> {code}
> location - containers
> ---------------------
> node1    - 10
> node2    - 10
> rack     - 10
> ANY      - 10
> {code}
> Assuming 5 containers are allocated in node1 and 5 containers are allocated 
> in node2, the following ResourceRequests will be outstanding on the RM.
> {code}
> location - containers
> ---------------------
> node1    - 5
> node2    - 5
> {code}
> If the AMMRClient does a new ContainerRequest allocation, this time for 5 
> containers in node3, the resulting outstanding ResourceRequests on the RM 
> will be:
> {code}
> location - containers
> ---------------------
> node1    - 5
> node2    - 5
> node3    - 5
> rack     - 5
> ANY      - 5
> {code}
> At this point, the scheduler may assign 5 containers to node1 and it will 
> never assign the 5 containers node3 asked for.
> AMRMClient should keep track of the outstanding allocations counts per 
> ContainerRequest and when gets to zero it should update the the RACK/ANY 
> decrementing the dangling requests. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to