[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821657#comment-13821657
 ] 

Alejandro Abdelnur commented on YARN-1404:
------------------------------------------

[[email protected]]

bq. 1. I'd be inclined to treat this as a special case of YARN-1040

I've just commented in YARN-1040 following Bikas' comment on this 
https://issues.apache.org/jira/browse/YARN-1040?focusedCommentId=13821597&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13821597

bq. It's dangerously easy to leak containers here; I know llama keeps an eye on 
things, but I worry about other people's code -though admittedly, any 
long-lived command line app "yes" could do the same.

We can have NM configs to disable no-process or multi-process, but still you 
can workaround this around by having a dummy process. This is how Llama is 
doing things today, but it is not ideal for several reasons.

IMO, from Yarn perspective we need to allow AMs to be able to do sophisticated 
things within the Yarn programming model (like you are trying to do with 
long-lived containers or what I'm doing with Llama).

bq. For the multi-process (and that includes processes=0), we really do need 
some kind of lease renewal option to stop containers being retained forever. It 
would become the job of the AM to do the renewal

As I've mentioned above, I don't think we need a special lease for this: 
https://issues.apache.org/jira/browse/YARN-1404?focusedCommentId=13820200&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820200
 (look for 'The reason I've taken the approach of leaving the container leases 
out of band is:')

[~vinodkv]

bq. -1 for this...

I think you are jumping too fast here.

bq. As I repeated on other JIRAs, please change the title with the problem 
statement instead of solutions.

IMO that makes completely sense for bugs, for improvements/new-features a 
description of it communicates more as it will be the commit message. The 
shortcomings the JIRA is trying to address should be captured in the 
description.

Take for example the following JIRA summaries, would you change them to 
describe a problem?

* AHS should support application-acls and queue-acls
* AM's tracking URL should be a URL instead of a string
* Add support for zipping/unzipping logs while in transit for the NM logs 
web-service
* YARN should have a ClusterId/ServiceId

bq. I indicated offline about llama with others. I don't think you need 
NodeManagers either to do what you want, forget about containers. All you need 
is use the ResourceManager/scheduler in isolation using MockRM/LightWeightRM 
(YARN-1385) - your need seems to be using the scheduling logic in YARN and not 
use the physical resources.

The whole point of Llama is to allow Impala to share resources in a real Yarn 
cluster doing other workloads like Map-Reduce. In other words, Impala/Llama and 
other AMs must share cluster resources. 


> Add support for unmanaged containers
> ------------------------------------
>
>                 Key: YARN-1404
>                 URL: https://issues.apache.org/jira/browse/YARN-1404
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: YARN-1404.patch
>
>
> Currently a container allocation requires to start a container process with 
> the corresponding NodeManager's node.
> For applications that need to use the allocated resources out of band from 
> Yarn this means that a dummy container process must be started.
> Impala/Llama is an example of such application which is currently starting a 
> 'sleep 10y' (10 years) process as the container process. And the resource 
> capabilities are used out of by and the Impala process collocated in the 
> node. The Impala process ensures the processing associated to that resources 
> do not exceed the capabilities of the container. Also, if the container is 
> lost/preempted/killed, Impala stops using the corresponding resources.
> In addition, in the case of Llama, the current requirement of having a 
> container process, gets complicates when hard resource enforcement (memory 
> -ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama 
> request resources with CPU and memory independently of each other. Some 
> requests are CPU only and others are memory only. Unmanaged containers solve 
> this problem as there is no underlying process with zero CPU or zero memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to