[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845953#comment-13845953
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

I just caught up with YARN-1197. Seems like some part of that solution is very 
relevant to this JIRA. For e.g.,
bq. Some daemon-based applications may want to start exactly one daemon in 
allocated node (like OpenMPI), such daemon will launch/monitoring workers (like 
MPI processes) itself. We can first allocate some containers for daemons, and 
adjust their size as application’s requirement. This will make YARN support 
two-staged scheduling. Described in YARN-1197

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-11 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845667#comment-13845667
 ] 

Arun C Murthy commented on YARN-1404:
-

bq. I have no technical reservations with the overall approach.

Since we agree on the approach and the direction we want to go; perhaps, we can 
now discuss how to get there?

We don't have to implement everything in the first go, we just need to 
implement enough to solve your goals of quick integration while being on the 
long-term path we want to get to. 

Does that make sense?

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845261#comment-13845261
 ] 

Sandy Ryza commented on YARN-1404:
--

bq. Other than saying you don't want to wait for impala-under-YARN integration, 
I haven't heard any technical reservations against this approach.
I have no technical reservations with the overall approach.  In fact I'm in 
favor of it.  My points are:
* We will not see this happen for a while and that the original approach on 
this JIRA supports a workaround that has no consequences for clusters not 
running Impala on YARN.
* I'm sure many that would love to take advantage of centrally resource-managed 
HDFS caching will be unwilling to deploy HDFS through YARN.  This will go for 
all sorts of legacy applications as well.  If, beside the changes Arun 
proposed, we can expose YARN's central scheduling independent from its 
deployment/enforcement, there would be a lot to gain. If this is within easy 
reach, I don't find arguments that YARN is philosophically opposed to it or 
that the additional freedom would allow cluster-configurers to shoot themselves 
in the foot satisfying.

I realize that we are rehashing many of the same arguments so I'm not sure how 
to make progress on this. I'll wait until Tucu returns from vacation to push 
further.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'slee

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845011#comment-13845011
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

bq. I'm not sure I entirely understand what you mean by create a new level of 
trust.
I thought that was already clear to everyone. See my comment 
[here|https://issues.apache.org/jira/browse/YARN-1404?focusedCommentId=13840905&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13840905].
 "YARN depends on the ability to enforce resource-usage restrictions".

YARN enables both resource scheduling and enforcement of those scheduling 
decisions. If resources sit outside of YARN, YARN cannot enforce the limits on 
their usage. For e.g, YARN cannot enforce the memory usage of a datanode. 
People may work around it by setting up Cgroups on these daemons, but that 
defeats the purpose of YARN in the first place. That is why I earlier proposed 
that impala/datanode run under YARN. When I couldn't find a solution otherwise, 
I revised my proposal to restrict it to be used with a special ACL so that 
other apps don't abuse the cluster by requesting unmanaged containers and not 
using those resources.

bq. It depends on that or the AM releasing the resources. Process liveliness is 
a very imperfect signifier ...
We cannot trust AMs to always release containers. If it were so imperfect, we 
should change YARN as it is today to not depend on liveliness. I'd leave it as 
an exercise to see how, once we remove process-liveliness in general, apps will 
release containers and how clusters get utilized. Bonus points for trying it on 
a shared multi-tenant cluster with user-written YARN apps.

My point is that Process liveliness + accounting based on that is a very 
understood model in the Hadoop land. The proposal for leases is to continue 
that.

bq. Is there a scenario I'm missing here?
One example that illustrates this. Today AMs can go away without releasing 
containers and YARN can kill the corresponding containers(as they are managed). 
If we don't have some kind of leases, and AMs that are unmanaged resources go 
away without explicit container-release, those resources are leaked.

bq. YARN is not a power-hungry conscious entity that gets to make decisions for 
us. Not simply when a use case violates the abstract idea of YARN controlling 
everything. [...]
 Of course, when I mean YARN, I mean the YARN community. You take it too 
literally.

I was pointing out your statements about "Impala currently has little tangible 
to gain by doing deployment and enforcement inside YARN", "However, making 
Impala-YARN integration depend on this fairly involved work would unnecessarily 
set it back". YARN community doesn't take decisions based on those things.

Overall, I didn't originally have a complete solution for making it happen - so 
came up with ACLs, leases. But delegation as proposed by Arun seems like one 
that solves all the problems.  Other than saying you don't want to wait for 
impala-under-YARN integration, I haven't heard any technical reservations 
against this approach.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impal

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844954#comment-13844954
 ] 

Sandy Ryza commented on YARN-1404:
--

bq. The thing is to enable only central scheduling, YARN has to give up its 
control over liveliness & enforcement and needs to create a new level of trust.
I'm not sure I entirely understand what you mean by create a new level of 
trust.  We are a long way from YARN managing all resources on a Hadoop cluster. 
 YARN implicitly understands that other trusted processes will be running 
alongside it.  The proposed change does not grant any users the ability to use 
any resources without going through a framework trusted by the cluster 
administrator.

bq. Like I said, we do have an implicit liveliness report - process liveliness. 
And NodeManager depends on that today to inform the app of container-finishes.
It depends on that or the AM releasing the resources.  Process liveliness is a 
very imperfect signifier - a process can stick around due to an 
accidentally-not-finished-thread even when all its work is done.  I have seen 
clusters where all MR task processes are killed by the AM without exiting 
naturally and everything works fine.

I've tried to think through situations where this could be harmful:
Malicious application intentionally sits on cluster resources: They can do this 
already by running a process with sleep(infinity)
Application unintentionally sits on cluster resources: This can already happen 
if a container process forgets to terminate a non-daemon thread.
In both cases, preemption will prohibit an application from sitting on 
resources above its fair share. 

Is there a scenario I'm missing here?

bq. If there are alternative architectures that will avoid losing that control, 
YARN will chose those options.
YARN is not a power-hungry conscious entity that gets to make decisions for us. 
 We as YARN committers and contributors get to decide what use cases we want to 
support, and we don't need to choose a single one.  We should of course be 
careful with what we choose to support, but we should be restrictive when there 
are concrete consequences of doing otherwise. Not simply when a use case 
violates the abstract idea of YARN controlling everything.

If the deeper concern is that Impala and similar frameworks will opt not to run 
fully inside YARN when that functionality is available, I think we would be 
happy to switch over when YARN supports this in a stable manner.  However, I 
believe this is a long way away and depending on that work is not an option for 
us.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resource

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844781#comment-13844781
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

{quote}
Stepping back a little, YARN does three things:
Central Scheduling - decides who gets to run and when and where they get to do 
so
Deployment - ships bits across the cluster and runs container processes
Enforcement - monitors container processes to make sure they stay within 
scheduled limits
The central scheduling part is the most valuable to a framework like Impala 
because it allows it to truly share resources on a cluster with other 
processing frameworks. The second two are helpful - they allow us to 
standardize the way work is deployed on a Hadoop cluster - but they aren't 
enabling things that's fundamentally impossible without them. While these will 
simplify things in the long term and create a more cohesive platform, Impala 
currently has little tangible to gain by doing deployment and enforcement 
inside YARN.
{quote}

Don't agree with that characterization. The thing is to enable only central 
scheduling, YARN has to give up its control over liveliness & enforcement and 
needs to create a new level of trust. If there are alternative architectures 
that will avoid losing that control, YARN will chose those options. The 
question is whether external systems want to take that option or not.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> pro

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844774#comment-13844774
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

bq. In this scenario, I think explicitly allowing for delegation of a container 
would solve the problem in a first-class manner.
This is an interesting solution that avoids the problems about trust, 
liveliness reporting and resource limitations' enforcement. +1 for considering 
something like this.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844772#comment-13844772
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

Re Tucu's reply

bq. Regarding ACLs and an on/off switch: IMO they are not necessary for the 
following reason. You need an external system installed and running in the node 
to use the resources of an unmanaged container. If you have direct access into 
the node to start the external system, you are 'trusted'. If you don't have 
direct access you cannot use the resources of an unmanaged container.
Unfortunately that is not enough. We are exposing an API on NodeManager that 
anybody can use. The ACL prevents that.

bq. In the case of managed containers we don't have a liveliness 'report' and 
the container process could very well be hung. In such scenario is the 
responsibility of the AM to detected the liveliness of the container process 
and react if it is considered hung.
Like I said, we do have an implicit liveliness report - process liveliness. And 
NodeManager depends on that today to inform the app of container-finishes.

bq. Regarding NM assume a whole lot of things about containers 3 bullet items: 
For the my current use case none if this is needed. It could be relatively easy 
to enable such functionality if a use case that needs it arises.
So, then we start off with the assumption that they are not needed? That 
creates two very different code paths for managed and unmanded containers. If 
possible we should avoid that.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory i

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844740#comment-13844740
 ] 

Sandy Ryza commented on YARN-1404:
--

Arun, I think I agree with most of the above and your proposal makes a lot of 
sense to me.

There are numerous issues to tackle.  On the YARN side:
* YARN has assumed since its inception that a container's resources belong to a 
single application - we are likely to come across many subtle issues when 
rethinking this assumption.
* While YARN has promise as a platform for deploying long-running services, 
that functionality currently isn't stable in the way that much of the rest of 
YARN is.
* Currently preemption means killing a container process - we would need to 
change the way this mechanism works.

On the Datanode/Impala side:
* Rethink the way we deploy these services to allow them to run inside YARN 
containers.

Stepping back a little, YARN does three things:
* Central Scheduling - decides who gets to run and when and where they get to 
do so
* Deployment - ships bits across the cluster and runs container processes
* Enforcement - monitors container processes to make sure they stay within 
scheduled limits

The central scheduling part is the most valuable to a framework like Impala 
because it allows it to truly share resources on a cluster with other 
processing frameworks.  The second two are helpful - they allow us to 
standardize the way work is deployed on a Hadoop cluster - but they aren't 
enabling things that's fundamentally impossible without them.  While these will 
simplify things in the long term and create a more cohesive platform, Impala 
currently has little tangible to gain by doing deployment and enforcement 
inside YARN.

So, to summarize, I like the idea and would be both happy to see YARN move in 
this direction and to help it do so. However, making Impala-YARN integration 
depend on this fairly involved work would unnecessarily set it back.  In the 
short term, we have proposed a minimally invasive change (making it possible to 
launch containers without starting processes) that would allow YARN to satisfy 
our use case. I am confident that the change poses no risk from a security 
perspective, from a stability perspective, or in terms of detracting from the 
longer-term vision.


> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container)

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844437#comment-13844437
 ] 

Arun C Murthy commented on YARN-1404:
-

Yes, agreed. Sorry, I thought it was clear that was what I proposing with:

{quote}
The implementation of this api would notify the NodeManager to change it's 
monitoring on the recipient container i.e. Impala or Datanode by modifying 
cgroup of the recipient container.
Similarly, the NodeManager could be instructed by the ResourceManager to 
preempt the resources of the source container for continuing to serve the 
global SLAs of the queues - again, this is implemented by modifying the cgroup 
of the recipient container. This will allow for ResouceManager/NodeManager to 
be explicitly in control of resources, even in the face of misbehaving AMs etc.
{quote}

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844398#comment-13844398
 ] 

Bikas Saha commented on YARN-1404:
--

Is the scenario having containers from multiple users asking for resources 
within their quota and then delegating them to a shared service to use on their 
behalf. The above would imply that datanode/impala/others would be running as 
yarn containers so that they can be targets for delegation. 

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844352#comment-13844352
 ] 

Arun C Murthy commented on YARN-1404:
-

I've opened YARN-1488 to track delegation of container resources.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844344#comment-13844344
 ] 

Arun C Murthy commented on YARN-1404:
-

I've spent time thinking about this in the context of running a myriad of 
external systems in YARN such as Impala, HDFS Caching (HDFS-4949) and some 
others.

The overarching goal is to allow YARN to act as a ResourceManager for the 
overall cluster *and* a Workload Manager for external systems i.e. this way 
Impala or HDFS can rely on YARN's queues for workload management, SLAs via 
preemption etc.

Is that a good characterization of the problem at hand?

I think it's a good goal to support - this will allow other external systems to 
leverage YARN's capabilities for both resource sharing and workload management.

Now, if we all agree on this - we can figure the best way to support this in a 
first-class manner.



Ok, the core requirement is for an external system (Impala, HDFS, others) to 
leverage YARN's workload management capabilities (queues etc.) to acquire 
resources (cpu, memory) *on behalf* of a particular entity (user, queue) for 
completing a user's request (run a query, cache a dataset in RAM). 

The *key* is that these external systems need to acquire resources on behalf of 
the user and ensure that the chargeback is applied to the correct user, queue 
etc.

This is a *brand new requirement* for YARN... so far, we have assumed that the 
entity acquiring the resource would also be actually utilizing the resource by 
launching a container etc. 

Here, it's clear that the requirement is that entity acquiring the resource 
would like to *delegate* the resource to an external framework. For e.g.
# A user query would like to acquire cpu, memory etc. for appropriate 
accounting chargeback and then delegate it to Impala.
# A user request for caching data would like to acquire memory for appropriate 
accounting chargeback and then delegate to the Datanode.



In this scenario, I think explicitly allowing for *delegation* of a container 
would solve the problem in a first-class manner.

We should add a new API to the NodeManager which would allow an application to 
*delegate* a container's resources to a different container:

{code:title=ContainerManagementProtocol.java|borderStyle=solid}  
public interface ContainerManagementProtocol {
  // ...
  public DelegateContainerResponse delegateContainer(DelegateContainerRequest 
request);
  // ...
}
{code}

{code:title=DelegateContainerRequest.java|borderStyle=solid}  
public abstract class DelegateContainerRequest {
  // ...
  public ContainerLaunchContext getSourceContainer();

  public ContainerId getTargetContainer();
  // ...
}
{code}


The implementation of this api would notify the NodeManager to change it's 
monitoring on the recipient container i.e. Impala or Datanode by modifying 
cgroup of the recipient container.

Similarly, the NodeManager could be instructed by the ResourceManager to 
preempt the resources of the source container for continuing to serve the 
global SLAs of the queues - again, this is implemented by modifying the cgroup 
of the recipient container. This will allow for ResouceManager/NodeManager to 
be explicitly in control of resources, even in the face of misbehaving AMs etc.



The result of the above proposal is very similar to what is already being 
discussed, the only difference being that this is explicit (NodeManager knows 
the source and recipient containers) and this allows for all existing features 
such as preemption, over-allocation of resources to YARN queues etc. to 
continue to work as today.



Thoughts?

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an exam

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840977#comment-13840977
 ] 

Alejandro Abdelnur commented on YARN-1404:
--

[~vinodkv], thanks for summarizing our offline chat.

Regarding *ACLs and an on/off switch*:

IMO they are not necessary for the following reason.

You need an external system installed and running in the node to use the 
resources of an unmanaged container. If you have direct access into the node to 
start the external system, you are 'trusted'. If you don't have direct access 
you cannot use the resources of an unmanaged container.

I think this is a very strong requirement already and it would avoid adding a 
new ACL and an on/off switch.

Regarding *Liveliness*:

In the case of managed containers we don't have a liveliness 'report' and the 
container process could very well be hang. In such scenario is the 
responsibility of the AM to detected the liveliness of the container process 
and react if it is considered hung.

In the case of unmanaged containers, the AM would the same responsibility. 

The only difference is that in the case of managed containers if the process 
exits the NM detects that, while in the case of unmanaged containers this 
responsibility would fall on the AM.

Because of this I think we could do without a leaseRenewal/liveliness call.

Regarding *NM assume a whole lot of things about containers* 3 bullet items:

For the my current use case none if this is needed. It could be relatively easy 
to enable such functionality if a use case that needs it arises.

Regarding *Can such trusted application mix and match managed and unmanaged 
containers?*:

In the way I envision how this will work, when an AM asks for a container and 
gets an allocation for from the RM, the RM does not know if the AM will start a 
managed or an unmanaged container.  It is only between the AM and the NM that 
this is known, when the ContainerLaunchContext is NULL.

Regarding *YARN-1040 should enabled starting unmanaged containers*:

If YARN-1040 would be implemented, yes, it would enable unmanaged containers. 
However the scope of YARN-1040 is much bigger than unmanaged containers. 

It should be also be possible implementing unmanaged containers as being 
discussed and later implement YARN-1040.

Does this make sense?




> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-11-21 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829640#comment-13829640
 ] 

Alejandro Abdelnur commented on YARN-1404:
--

The proposal to address this JIRA is:

* Allow a NULL ContainerLaunchContext in the startContainer() call, this 
signals the is not process to be started with the container.
* The ContainerLaunch logic would use a latch to lock when there is not 
associated process. The latch will be released on container completion 
(preemption or terminated by the AM)

The changes to achieve this are minimal and they do not alter at all the 
lifecycle of a container, nor in the RM, nor in the NM.

As previously mentioned by Bikas, this can be seen as a special case of the 
functionality that YARN-1040 is proposing for managing multiple processes with 
the same container. 

The scope of work of YARN-1040 is significantly larger and requires API 
changes, while this JIRA does not require API changes and the changes are not 
incompatible with each other.



> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1#6144)