[ 
https://issues.apache.org/jira/browse/YARN-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655384#comment-13655384
 ] 

Bikas Saha commented on YARN-660:
---------------------------------

bq. Coming back to current issues at hand. Correct me if I am wrong, but it 
looks like these StoredContainerRequests are only useful if I have a Task in my 
job which needs a container and the returned container will only be used by 
that task?
I am afraid that is wrong. Every scheduler has a bunch of tasks that it needs 
to allocate. It makes requests for containers. When containers are returned 
then it chooses which tasks to assign to those containers. That is the generic 
use case for schedulers. StoredContaineRequest is unrelated to it. 
AMRMClientImpl/AMRMClientAsync are already built for the use case specified. A 
bunch of requests can be made and when containers are allocated then the user 
can get them and do its second pass of scheduling.
bq.The current test-case is clearly not doing enough to explain how it is 
usable - we can implement a simple job which has tasks and needs containers 
inside this test-case and then illustrate the library usage.
Please read the code in TestAMRMClient.testAMRMClient() and the code in the 
DistributedShell example app. Both create an RM, register an application, 
request containers, make allocations and deallocations and then unregister the 
application. I believe the code clearly shows how all the api's can be used for 
the full lifecycle of an application. The new tests only test for newly added 
storage feature and bug fixes. I could add more test code to go through the 
same lifecycle operations when using StoreContainerRequest.

It might be useful if you can go through all the comments in YARN-103, YARN-277 
and YARN-417. It may help in understanding why the code/apis are the way they 
are. 7 contributors were involved in those jiras in writing and reviewing the 
code and I am confident in the technical and stylistic quality of the code as 
it stands. That being said, there is always room for improvement if you find 
issues.

Let me try to explain the contribution made by this jira so that we all can be 
on the same page.

When schedulers receive container allocations then they need to figure out 
which of their requests matched those containers so that they can remove those 
requests from the pending requests. Containers can match requests at 
node/rack/* level and hence the library itself cannot determine which pending 
request to remove. It needs to be told by the scheduler which request to 
remove. All of this already works if the scheduler can find matching requests 
for a container on its own or when requests were made at only at * priority 
(for locality other than * the RM needs client help to do book-keeping so that 
it does not end up allocating containers at unwanted locality).

This jira takes up the problem of helping schedulers find matching requests for 
allocated containers. Since the library is the entity that translates request 
to the RM, it would be best to let the library provide matching containers. If 
the scheduler maintains its own separate mapping information then its logic may 
diverge from the library's. So the main contribution of the jira is to store 
requests made via addContainerRequest() and retrieve matching requests via 
getMatchingRequests(). Stored containers must be removed or else the library 
will end up holding onto a lot of memory. If ContainerRequest is not 
constrained to a container count of 1 then doing this add and removal 
book-keeping becomes difficult to implement in a water-tight manner. That is 
why StoredContainerRequest was created which only constrains the container 
count to 1. Now the API becomes elegant and intuitive. addContainerRequest() to 
add requests. getMatchingRequests() to get all matching requests. Pick a 
container from matching requests and call removeContainerRequest() with it to 
remove it. That is all that there is to it. There are some other minor fixes to 
AMRMClientAsync. We are on the same page wrt this being a needed functionality.
The cookie added to StoredContainerRequest is very useful IMO but it strictly 
does not have to be there. Users can extend StoreContainerRequest to do the 
same thing and most users will IMO and so I added it there.
StoredContainerRequest or any extensions of ContainerRequest can be passed into 
the library. Doing it without using the help of generics allows for the 
possibility of writing non-type checked code as well as mixing different types 
of ContainerRequests. Besides, it will require users to make ugly unsafe casts 
in their code for which the compiler will generate warnings. While it may be 
fine for internal YARN code to have 100's of suppressed unchecked cast 
warnings, we should not force external user code to have to live with that. 
They may have their own coding guidelines. This makes the use of generics 
essential for this user-facing library. I can attach the patch with the 
non-generic version of the library and compare the number of compiler warnings 
in both patches.

Hopefully this clarifies and explains the changes in this patch. Please feel 
free to make comments on bugs and technical/logical incorrectness issues in the 
patch.
                
> Improve AMRMClient with cookies and matching requests
> -----------------------------------------------------
>
>                 Key: YARN-660
>                 URL: https://issues.apache.org/jira/browse/YARN-660
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.0.5-beta
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 2.0.5-beta
>
>         Attachments: YARN-660.1.patch, YARN-660.2.patch, YARN-660.3.patch, 
> YARN-660.4.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to