[
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844954#comment-13844954
]
Sandy Ryza commented on YARN-1404:
----------------------------------
bq. The thing is to enable only central scheduling, YARN has to give up its
control over liveliness & enforcement and needs to create a new level of trust.
I'm not sure I entirely understand what you mean by create a new level of
trust. We are a long way from YARN managing all resources on a Hadoop cluster.
YARN implicitly understands that other trusted processes will be running
alongside it. The proposed change does not grant any users the ability to use
any resources without going through a framework trusted by the cluster
administrator.
bq. Like I said, we do have an implicit liveliness report - process liveliness.
And NodeManager depends on that today to inform the app of container-finishes.
It depends on that or the AM releasing the resources. Process liveliness is a
very imperfect signifier - a process can stick around due to an
accidentally-not-finished-thread even when all its work is done. I have seen
clusters where all MR task processes are killed by the AM without exiting
naturally and everything works fine.
I've tried to think through situations where this could be harmful:
Malicious application intentionally sits on cluster resources: They can do this
already by running a process with sleep(infinity)
Application unintentionally sits on cluster resources: This can already happen
if a container process forgets to terminate a non-daemon thread.
In both cases, preemption will prohibit an application from sitting on
resources above its fair share.
Is there a scenario I'm missing here?
bq. If there are alternative architectures that will avoid losing that control,
YARN will chose those options.
YARN is not a power-hungry conscious entity that gets to make decisions for us.
We as YARN committers and contributors get to decide what use cases we want to
support, and we don't need to choose a single one. We should of course be
careful with what we choose to support, but we should be restrictive when there
are concrete consequences of doing otherwise. Not simply when a use case
violates the abstract idea of YARN controlling everything.
If the deeper concern is that Impala and similar frameworks will opt not to run
fully inside YARN when that functionality is available, I think we would be
happy to switch over when YARN supports this in a stable manner. However, I
believe this is a long way away and depending on that work is not an option for
us.
> Enable external systems/frameworks to share resources with Hadoop leveraging
> Yarn resource scheduling
> -----------------------------------------------------------------------------------------------------
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Affects Versions: 2.2.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its
> applications run workload in. External frameworks/systems could benefit from
> sharing resources with other Yarn applications while running their workload
> within long-running processes owned by the external framework (in other
> words, running their workload outside of the context of a Yarn container
> process).
> Because Yarn provides robust and scalable resource management, it is
> desirable for some external systems to leverage the resource governance
> capabilities of Yarn (queues, capacities, scheduling, access control) while
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user
> submits a query, the processing is broken into 'query fragments' which are
> run in multiple impalad processes leveraging data locality (similar to
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in
> the impalad. As the impalad shares the host with other services (HDFS
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it
> does not overload the cluster nodes, before running a 'query fragment' in a
> node, Impala requests the required amount of CPU and memory from Yarn. Once
> the requested CPU and memory has been allocated, Impala starts running the
> 'query fragment' taking care that the 'query fragment' does not use more
> resources than the ones that have been allocated. Memory is book kept per
> 'query fragment' and the threads used for the processing of the 'query
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container)
> process must be started via the corresponding NodeManager. Failing to do
> this, will result on the cancelation of the container allocation
> relinquishing the acquired resource capacity back to the pool of available
> resources. To avoid this, Impala starts a dummy container process doing
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU
> shares that are not used and Impala is re-issuing those CPU shares to another
> cgroup for the thread running the 'query fragment'. The cgroup CPU
> enforcement works correctly because of the CPU controller implementation (but
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests
> may be only memory with no CPU or viceversa. Because a container requires a
> process, complete absence of memory or CPU is not possible even if the dummy
> process is 'sleep', a minimal amount of memory and CPU is required for the
> dummy process.
> Because of this it is desirable to be able to have a container without a
> backing process.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)