[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858926#comment-15858926
 ] 

Hitesh Sharma commented on YARN-5501:
-------------------------------------

[~jlowe], thanks for the great feedback and time taken to respond.

Some more details on how attach and detach container actually work.

PoolManager creates the pre-initialized containers and they are not different 
from regular containers in any real way. When ContainerManager receives a 
startContainer request then it issues a DETACH_CONTAINER event. The detach 
really exists to ensure that we can cleanup the state associated with the 
pre-init container but avoid cleaning up the resources localized. 
ContainerManager listens for CONTAINER_DETACHED event and once it receives that 
then it creates the ContainerImpl for the requested container, but passes the 
information related to the detached container as the ContainerImpl c'tor. The 
ContainerManager also follows through the regular code paths of starting the 
container, which means that resource localization happens for the new 
container, and when it comes to raising the launch event then the ContainerImpl 
instead raises the ATTACH_CONTAINER event. This allows the ContainersLauncher 
to call the attachContainer on the executor, which is where we make the choice 
of launching the other processes required for that container. I hope this helps 
clarify things a little bit more.

bq. I'm thinking of a use-case where the container is a base set that applies 
to all instances of an app framework, but each app may need a few extra things 
localized to do an app-specific thing (think UDFs for Hive/Pig, etc.). Curious 
if that is planned and how to deal with the lifecycle of those "extra" per-app 
things.

Yes, the base set of things applies to all instances of the app framework. But 
localization is still done for each instance so you can for e.g. download a set 
of binaries via pre-initialization but more job specific things can come later.

bq. So it sounds like there is a new container ID generated in the 
application's container namespace as part of the "allocation" to fill the app's 
request, but this container ID is aliased to an already existing container ID 
in another application's namespace, not only at the container executor level 
but all the way up to the container ID seen at the app level, correct?

The application gets a container ID from YARN RM and uses that for all 
purposes. On the NM we internally switch to use the pre-init container ID as 
the PID. For e.g. pre-init container had the ID container1234 while the AM 
requested container had the ID containerABCD. Even though we reuse the existing 
pre-init container1234 to service the start container request on the NM we 
never surface container1234 to the application and the app always sees 
containerABCD.

bq. One idea is to treat these things like the page cache in Linux. In other 
words, we keep a cache of idle containers as apps run them. These containers, 
like page cache entries, will be quickly discarded if they are unused and we 
need to make room for other containers. We're simply caching successful 
containers that have been run on the cluster, ready to run another task just 
like it. Apps would still need to make some tweaks to their container code so 
it talks the yet-to-be-detailed-and-mysterious attach/detach protocol so they 
can participate in this automatic container cache, and there would need to be 
changes in how containers are requested so the RM can properly match a request 
to an existing container (something that already has to be done for any reuse 
approach). Seems like it would adapt well to shifting loads on the cluster and 
doesn't require a premeditated, static config by users to get their app load to 
benefit. Has something like that been considered?

That is a very interesting idea. If the app can provide some hints as to when 
it is good to consider a container pre-initialized then when the container 
finishes we can carry out the required operations to go back to the pre-init 
state. Thanks for bringing this up.

bq. I think that's going to be challenging for the apps in practice and will 
limit which apps can leverage this feature reliably. This is going to be 
challenging for containers runniing VMs whose memory limits need to be setup at 
startup (e.g.: JVMs). Minimally I think this feature needs a way for apps to 
specify that they do not have a way to communicate (or at least act upon) 
memory changes. In those cases YARN will have to decide on tradeoffs like a 
primed-but-oversized container that will run fast but waste grid resources and 
also avoid reusing a container that needs to grow to satisfy the app 
request.

Hmm..let me look at the code and see how container resizing works today. What 
you are saying makes sense, but in that case container resizing won't work as 
well. For our scenarios resource constraints are enforced via job objects or 
cgroups so things are ok.

bq. Also the container is already talking this yet-to-be-detailed attach/detach 
protocol, so I would expect any memory change request to also arrive via that 
communication channel. Why isn't that the case?

I gave some details of how attach/detach works and it is not really a protocol 
but state machine changes to ensure we update the YARN machinery accordingly.

bq. Making sure we don't mix users is the most basic step, but there's still 
the issue of credentials. There needs to be a way to convey app-specific 
credentials to these containers and make sure they don't leak between apps. The 
security design should be addressed sooner rather than later, because it's 
going to be difficult to patch it in after the fact.

I agree we need to do more thinking here. Let me get back on this.


bq. It sounds like you already have a working PoC and scenarios for it. These 
would be great to detail via flow/message sequence diagrams detailing the 
operation order for container init, attach, detach, restart, etc. It would also 
be great to detail what changes apps using this feature will see over what they 
do today (i.e.: if there's something changing re: container IDs, container 
killing, etc.) and what changes are required on their part in order to 
participate.

In practice we are using container pooling as a pure optimization. As I 
mentioned earlier one of our use cases involves starting some heavy processes, 
waiting for them to start, and then do the actual work by launching other 
processes within the same cgroup or job object. With pooling our AM requests 
for a pre-init container and since we have static config on how many pre-init 
containers are running it may or may not receive one. In the container launch 
command we check for the presence of these heavy processes and if they are 
found we skip initializing them, which saves quite some time. 

Please note that this is a PoC to put some of the ideas into practice. We are 
eager to see how this can be evolved into something more generic that is useful 
to the community. I will be happy to share some details and maybe post a WIP 
patch to give some clarity.

Open to ideas and suggestions.

> Container Pooling in YARN
> -------------------------
>
>                 Key: YARN-5501
>                 URL: https://issues.apache.org/jira/browse/YARN-5501
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Hitesh Sharma
>         Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to