[
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664849#comment-15664849
]
Arun Suresh commented on YARN-1593:
-----------------------------------
Thanks for driving this [~vvasudev]
At first glance, this looks similar in spirit to YARN-5501, and maybe even
supersedes it. It would be advantageous to model pooled containers as a system
container.
Further to the point raised by [~hitesh] about formalizing how we affinitize an
application's container to a Node on a which a dependent system container is
run, we were also investigating a scenario where an application might also need
a countable number of system containers on a Node. An initial thought was to
probably expose the container as a Generalized resource (YARN-3926). For eg,
assume spark Executors can be started as Pre-started containers on select
nodes. Assume a node A has 2 pre-started spark executors, and Node B has 4. A
spark app might have 3 ContainerRequests that requires <4 VCores, 2 GB, 2
spark-executors>, in which case the ResourceManager will ensure that 1 such
container is allocated on Node A and 2 on Node B.
Thoughts ?
> support out-of-proc AuxiliaryServices
> -------------------------------------
>
> Key: YARN-1593
> URL: https://issues.apache.org/jira/browse/YARN-1593
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager, rolling upgrade
> Reporter: Ming Ma
> Assignee: Varun Vasudev
> Attachments: SystemContainersandSystemServices.pdf
>
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the
> ShuffleHandler restart. If ShuffleHandler runs as a separate process,
> ShuffleHandler can continue to run during NM restart. NM can reconnect the
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will
> be implemented. AuxiliaryServices are considered YARN application specific
> and could consume lots of resources. Running AuxiliaryServices in separate
> processes allow easier resource management. NM could potentially stop a
> specific AuxiliaryServices process from running if it consumes resource way
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will
> continue to run. NM could reconnect to the running AuxiliaryService processes
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have
> immediate need for this. AuxiliaryService could run inside a container and
> its resource utilization could be taken into account by RM and RM could
> consider a specific type of applications overutilize cluster resource.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]