[
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822034#comment-13822034
]
Vinod Kumar Vavilapalli commented on YARN-1404:
-----------------------------------------------
bq. Vinod Kumar Vavilapalli, a lightweight RM is not sufficient because the
goal of llama is to be able to run frameworks that use unmanaged containers
alongside frameworks that don't. While Impala does its own resource
enforcement, it wants to coexist on a YARN instance with MR and other
frameworks that fit more naturally with the YARN model.
Well, this has been my problem, I'm sure others will agree. Proposing unmanaged
containers before explaining your key requirements keeps folks only looking at
JIRA in the dark.
bq. Are you saying YARN should never support containers that don't launch a
process? Is there anything gained by this?
If that need arises, and if there are no other first-class solutions, then yes.
Otherwise no.
bq. I think you are jumping too fast here
That's because I see multiple JIRAs all trying to achieve a common goal and
instead of discussing that design, we are shoe-horned into debating on
individual tickets that don't make up the overall goal.
bq. IMO that makes completely sense for bugs, for improvements/new-features a
description of it communicates more as it will be the commit message. The
shortcomings the JIRA is trying to address should be captured in the
description.
Agree that it is subjective. But in some of the tickets that potentially have a
solution-space > 1, I'd suggest renaming them. For e.g., this on can be renamed
to "support running a service that doesn't want to use YARN containers but
still co-exists with YARN"
bq. Take for example the following JIRA summaries, would you change them to
describe a problem?
bq. AM's tracking URL should be a URL instead of a string
bq. YARN should have a ClusterId/ServiceId
Yes, I'd change the above two. The other two are apt summaries. The goal should
be indicating the problem one is attacking. And my point here is not that you
or someone is making that mistake and others are not.
bq. The whole point of Llama is to allow Impala to share resources in a real
Yarn cluster doing other workloads like Map-Reduce. In other words,
Impala/Llama and other AMs must share cluster resources.
Well, you should have started with this requirement so that we can all discuss
and come up with a solution instead of putting in approaches that you think are
best. This was the same discussion we had in YARN-689 where it took a while
for the rest of us to understand the real requirements. Similarly, YARN-789 was
put in FairScheduler without giving considerations to the rest of the system.
bq. The AM that started the unmanaged container gets the
early-preemption/preemption/lost notification from the RM and notifies the out
of band process in the corresponding node to release the corresponding
resources. (Impala/Llama is doing this today with the dummy sleep containers)
That won't work for cases where RM wants to forcefully terminate in emergency
situations.
bq. A NM plugin notifies the collocated out of band process that the unmanaged
container as ended. This prompts the out of band process to release the
corresponding resources. (We are working on getting this in Impala/Llama).
This again is a new proposal which is never discussed.
Re this problem, I think you should create a ticket about supporting services
that want to use cluster and node level scheduling without using containers.
Then if you follow up with a requirement list, we can discuss solutions and an
end-to-end design. I can come with more solutions already, which may or may not
work depending on your requirements.
- Use the dynamic NM resource stuff that just went in and use signalling
between YARN NM and some outside component to dynamically adjust NM resources
- Run a long running service under YARN with containers that dynamically grow
and shrink
> Add support for unmanaged containers
> ------------------------------------
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Affects Versions: 2.2.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently a container allocation requires to start a container process with
> the corresponding NodeManager's node.
> For applications that need to use the allocated resources out of band from
> Yarn this means that a dummy container process must be started.
> Impala/Llama is an example of such application which is currently starting a
> 'sleep 10y' (10 years) process as the container process. And the resource
> capabilities are used out of by and the Impala process collocated in the
> node. The Impala process ensures the processing associated to that resources
> do not exceed the capabilities of the container. Also, if the container is
> lost/preempted/killed, Impala stops using the corresponding resources.
> In addition, in the case of Llama, the current requirement of having a
> container process, gets complicates when hard resource enforcement (memory
> -ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama
> request resources with CPU and memory independently of each other. Some
> requests are CPU only and others are memory only. Unmanaged containers solve
> this problem as there is no underlying process with zero CPU or zero memory.
--
This message was sent by Atlassian JIRA
(v6.1#6144)