[
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844740#comment-13844740
]
Sandy Ryza commented on YARN-1404:
----------------------------------
Arun, I think I agree with most of the above and your proposal makes a lot of
sense to me.
There are numerous issues to tackle. On the YARN side:
* YARN has assumed since its inception that a container's resources belong to a
single application - we are likely to come across many subtle issues when
rethinking this assumption.
* While YARN has promise as a platform for deploying long-running services,
that functionality currently isn't stable in the way that much of the rest of
YARN is.
* Currently preemption means killing a container process - we would need to
change the way this mechanism works.
On the Datanode/Impala side:
* Rethink the way we deploy these services to allow them to run inside YARN
containers.
Stepping back a little, YARN does three things:
* Central Scheduling - decides who gets to run and when and where they get to
do so
* Deployment - ships bits across the cluster and runs container processes
* Enforcement - monitors container processes to make sure they stay within
scheduled limits
The central scheduling part is the most valuable to a framework like Impala
because it allows it to truly share resources on a cluster with other
processing frameworks. The second two are helpful - they allow us to
standardize the way work is deployed on a Hadoop cluster - but they aren't
enabling things that's fundamentally impossible without them. While these will
simplify things in the long term and create a more cohesive platform, Impala
currently has little tangible to gain by doing deployment and enforcement
inside YARN.
So, to summarize, I like the idea and would be both happy to see YARN move in
this direction and to help it do so. However, making Impala-YARN integration
depend on this fairly involved work would unnecessarily set it back. In the
short term, we have proposed a minimally invasive change (making it possible to
launch containers without starting processes) that would allow YARN to satisfy
our use case. I am confident that the change poses no risk from a security
perspective, from a stability perspective, or in terms of detracting from the
longer-term vision.
> Enable external systems/frameworks to share resources with Hadoop leveraging
> Yarn resource scheduling
> -----------------------------------------------------------------------------------------------------
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Affects Versions: 2.2.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its
> applications run workload in. External frameworks/systems could benefit from
> sharing resources with other Yarn applications while running their workload
> within long-running processes owned by the external framework (in other
> words, running their workload outside of the context of a Yarn container
> process).
> Because Yarn provides robust and scalable resource management, it is
> desirable for some external systems to leverage the resource governance
> capabilities of Yarn (queues, capacities, scheduling, access control) while
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user
> submits a query, the processing is broken into 'query fragments' which are
> run in multiple impalad processes leveraging data locality (similar to
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in
> the impalad. As the impalad shares the host with other services (HDFS
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it
> does not overload the cluster nodes, before running a 'query fragment' in a
> node, Impala requests the required amount of CPU and memory from Yarn. Once
> the requested CPU and memory has been allocated, Impala starts running the
> 'query fragment' taking care that the 'query fragment' does not use more
> resources than the ones that have been allocated. Memory is book kept per
> 'query fragment' and the threads used for the processing of the 'query
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container)
> process must be started via the corresponding NodeManager. Failing to do
> this, will result on the cancelation of the container allocation
> relinquishing the acquired resource capacity back to the pool of available
> resources. To avoid this, Impala starts a dummy container process doing
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU
> shares that are not used and Impala is re-issuing those CPU shares to another
> cgroup for the thread running the 'query fragment'. The cgroup CPU
> enforcement works correctly because of the CPU controller implementation (but
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests
> may be only memory with no CPU or viceversa. Because a container requires a
> process, complete absence of memory or CPU is not possible even if the dummy
> process is 'sleep', a minimal amount of memory and CPU is required for the
> dummy process.
> Because of this it is desirable to be able to have a container without a
> backing process.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)