[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1404:
-------------------------------------

    Description: 
Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
applications run workload in. External frameworks/systems could benefit from 
sharing resources with other Yarn applications while running their workload 
within long-running processes owned by the external framework (in other words, 
running their workload outside of the context of a Yarn container process). 

Because Yarn provides robust and scalable resource management, it is desirable 
for some external systems to leverage the resource governance capabilities of 
Yarn (queues, capacities, scheduling, access control) while supplying their own 
resource enforcement.

Impala is an example of such system. Impala uses Llama 
(http://cloudera.github.io/llama/) to request resources from Yarn.

Impala runs an impalad process in every node of the cluster, when a user 
submits a query, the processing is broken into 'query fragments' which are run 
in multiple impalad processes leveraging data locality (similar to Map-Reduce 
Mappers processing a collocated HDFS block of input data).

The execution of a 'query fragment' requires an amount of CPU and memory in the 
impalad. As the impalad shares the host with other services (HDFS DataNode, 
Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks).

To ensure cluster utilization that follow the Yarn scheduler policies and it 
does not overload the cluster nodes, before running a 'query fragment' in a 
node, Impala requests the required amount of CPU and memory from Yarn. Once the 
requested CPU and memory has been allocated, Impala starts running the 'query 
fragment' taking care that the 'query fragment' does not use more resources 
than the ones that have been allocated. Memory is book kept per 'query 
fragment' and the threads used for the processing of the 'query fragment' are 
placed under a cgroup to contain CPU utilization.

Today, for all resources that have been asked to Yarn RM, a (container) process 
must be started via the corresponding NodeManager. Failing to do this, will 
result on the cancelation of the container allocation relinquishing the 
acquired resource capacity back to the pool of available resources. To avoid 
this, Impala starts a dummy container process doing 'sleep 10y'.

Using a dummy container process has its drawbacks:

* the dummy container process is in a cgroup with a given number of CPU shares 
that are not used and Impala is re-issuing those CPU shares to another cgroup 
for the thread running the 'query fragment'. The cgroup CPU enforcement works 
correctly because of the CPU controller implementation (but the formal 
specified behavior is actually undefined).
* Impala may ask for CPU and memory independent of each other. Some requests 
may be only memory with no CPU or viceversa. Because a container requires a 
process, complete absence of memory or CPU is not possible even if the dummy 
process is 'sleep', a minimal amount of memory and CPU is required for the 
dummy process.

Because of this it is desirable to be able to have a container without a 
backing process.

  was:
Currently a container allocation requires to start a container process with the 
corresponding NodeManager's node.

For applications that need to use the allocated resources out of band from Yarn 
this means that a dummy container process must be started.

Impala/Llama is an example of such application which is currently starting a 
'sleep 10y' (10 years) process as the container process. And the resource 
capabilities are used out of by and the Impala process collocated in the node. 
The Impala process ensures the processing associated to that resources do not 
exceed the capabilities of the container. Also, if the container is 
lost/preempted/killed, Impala stops using the corresponding resources.

In addition, in the case of Llama, the current requirement of having a 
container process, gets complicates when hard resource enforcement (memory 
-ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama 
request resources with CPU and memory independently of each other. Some 
requests are CPU only and others are memory only. Unmanaged containers solve 
this problem as there is no underlying process with zero CPU or zero memory.



        Summary: Enable external systems/frameworks to share resources with 
Hadoop leveraging Yarn resource scheduling  (was: Add support for unmanaged 
containers)

Updated the summary and the description to better describe the use case driving 
this JIRA.

I've closed YARN-951 as "won't fix" as it is a workaround of the problem this 
JIRA is trying to address.

I don't think there is a need for an umbrella JIRA as this is the only change 
we need.


> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1404
>                 URL: https://issues.apache.org/jira/browse/YARN-1404
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to