[
https://issues.apache.org/jira/browse/YARN-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655384#comment-13655384
]
Bikas Saha commented on YARN-660:
---------------------------------
bq. Coming back to current issues at hand. Correct me if I am wrong, but it
looks like these StoredContainerRequests are only useful if I have a Task in my
job which needs a container and the returned container will only be used by
that task?
I am afraid that is wrong. Every scheduler has a bunch of tasks that it needs
to allocate. It makes requests for containers. When containers are returned
then it chooses which tasks to assign to those containers. That is the generic
use case for schedulers. StoredContaineRequest is unrelated to it.
AMRMClientImpl/AMRMClientAsync are already built for the use case specified. A
bunch of requests can be made and when containers are allocated then the user
can get them and do its second pass of scheduling.
bq.The current test-case is clearly not doing enough to explain how it is
usable - we can implement a simple job which has tasks and needs containers
inside this test-case and then illustrate the library usage.
Please read the code in TestAMRMClient.testAMRMClient() and the code in the
DistributedShell example app. Both create an RM, register an application,
request containers, make allocations and deallocations and then unregister the
application. I believe the code clearly shows how all the api's can be used for
the full lifecycle of an application. The new tests only test for newly added
storage feature and bug fixes. I could add more test code to go through the
same lifecycle operations when using StoreContainerRequest.
It might be useful if you can go through all the comments in YARN-103, YARN-277
and YARN-417. It may help in understanding why the code/apis are the way they
are. 7 contributors were involved in those jiras in writing and reviewing the
code and I am confident in the technical and stylistic quality of the code as
it stands. That being said, there is always room for improvement if you find
issues.
Let me try to explain the contribution made by this jira so that we all can be
on the same page.
When schedulers receive container allocations then they need to figure out
which of their requests matched those containers so that they can remove those
requests from the pending requests. Containers can match requests at
node/rack/* level and hence the library itself cannot determine which pending
request to remove. It needs to be told by the scheduler which request to
remove. All of this already works if the scheduler can find matching requests
for a container on its own or when requests were made at only at * priority
(for locality other than * the RM needs client help to do book-keeping so that
it does not end up allocating containers at unwanted locality).
This jira takes up the problem of helping schedulers find matching requests for
allocated containers. Since the library is the entity that translates request
to the RM, it would be best to let the library provide matching containers. If
the scheduler maintains its own separate mapping information then its logic may
diverge from the library's. So the main contribution of the jira is to store
requests made via addContainerRequest() and retrieve matching requests via
getMatchingRequests(). Stored containers must be removed or else the library
will end up holding onto a lot of memory. If ContainerRequest is not
constrained to a container count of 1 then doing this add and removal
book-keeping becomes difficult to implement in a water-tight manner. That is
why StoredContainerRequest was created which only constrains the container
count to 1. Now the API becomes elegant and intuitive. addContainerRequest() to
add requests. getMatchingRequests() to get all matching requests. Pick a
container from matching requests and call removeContainerRequest() with it to
remove it. That is all that there is to it. There are some other minor fixes to
AMRMClientAsync. We are on the same page wrt this being a needed functionality.
The cookie added to StoredContainerRequest is very useful IMO but it strictly
does not have to be there. Users can extend StoreContainerRequest to do the
same thing and most users will IMO and so I added it there.
StoredContainerRequest or any extensions of ContainerRequest can be passed into
the library. Doing it without using the help of generics allows for the
possibility of writing non-type checked code as well as mixing different types
of ContainerRequests. Besides, it will require users to make ugly unsafe casts
in their code for which the compiler will generate warnings. While it may be
fine for internal YARN code to have 100's of suppressed unchecked cast
warnings, we should not force external user code to have to live with that.
They may have their own coding guidelines. This makes the use of generics
essential for this user-facing library. I can attach the patch with the
non-generic version of the library and compare the number of compiler warnings
in both patches.
Hopefully this clarifies and explains the changes in this patch. Please feel
free to make comments on bugs and technical/logical incorrectness issues in the
patch.
> Improve AMRMClient with cookies and matching requests
> -----------------------------------------------------
>
> Key: YARN-660
> URL: https://issues.apache.org/jira/browse/YARN-660
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 2.0.5-beta
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Fix For: 2.0.5-beta
>
> Attachments: YARN-660.1.patch, YARN-660.2.patch, YARN-660.3.patch,
> YARN-660.4.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira