[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844740#comment-13844740
 ] 

Sandy Ryza commented on YARN-1404:
----------------------------------

Arun, I think I agree with most of the above and your proposal makes a lot of 
sense to me.

There are numerous issues to tackle.  On the YARN side:
* YARN has assumed since its inception that a container's resources belong to a 
single application - we are likely to come across many subtle issues when 
rethinking this assumption.
* While YARN has promise as a platform for deploying long-running services, 
that functionality currently isn't stable in the way that much of the rest of 
YARN is.
* Currently preemption means killing a container process - we would need to 
change the way this mechanism works.

On the Datanode/Impala side:
* Rethink the way we deploy these services to allow them to run inside YARN 
containers.

Stepping back a little, YARN does three things:
* Central Scheduling - decides who gets to run and when and where they get to 
do so
* Deployment - ships bits across the cluster and runs container processes
* Enforcement - monitors container processes to make sure they stay within 
scheduled limits

The central scheduling part is the most valuable to a framework like Impala 
because it allows it to truly share resources on a cluster with other 
processing frameworks.  The second two are helpful - they allow us to 
standardize the way work is deployed on a Hadoop cluster - but they aren't 
enabling things that's fundamentally impossible without them.  While these will 
simplify things in the long term and create a more cohesive platform, Impala 
currently has little tangible to gain by doing deployment and enforcement 
inside YARN.

So, to summarize, I like the idea and would be both happy to see YARN move in 
this direction and to help it do so. However, making Impala-YARN integration 
depend on this fairly involved work would unnecessarily set it back.  In the 
short term, we have proposed a minimally invasive change (making it possible to 
launch containers without starting processes) that would allow YARN to satisfy 
our use case. I am confident that the change poses no risk from a security 
perspective, from a stability perspective, or in terms of detracting from the 
longer-term vision.


> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1404
>                 URL: https://issues.apache.org/jira/browse/YARN-1404
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to