[ 
https://issues.apache.org/jira/browse/YARN-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151477#comment-15151477
 ] 

Wangda Tan commented on YARN-4692:
----------------------------------

Thanks [~vinodkv] and other folks working on this, this documentation is pretty 
comprehensive already, some thoughts/suggestions:

1) For running containers, instead of classifying them into service/batch, I 
would prefer to tag them by application priority. For example, 0 is production 
service tasks, 5 is batch job, etc. The reason is
- Service container is not always important than other containers
- One important service can preempt containers from less important services.

2) A container is service or batch depends on duration of the task, we had lots 
of discussions on YARN-1039 already.

3) For 3.2.2 container auto restart, beyond restart container when it dies, we 
could let framework check health of running tasks. For example, support embeded 
REST API to get healthy status of containers. With this, framework can restart 
malfunctioning containers.

4) For 3.2.7 Scheduling / Queue model
Beyond queue model, we should consider long running containers when reserving 
large container on node.

5) Debuggability for service container is also very important,
- Tools similar to [cAdvisor|https://github.com/google/cadvisor] could be very 
helpful to figure out issues of service tasks
- We also need tool to show aggregated scheduling-related information of 
apps/queues/cluster.

*For comments from [~asuresh]:*
bq. we can give applications the ability to specify Preemptability of 
containers in a particular role...
Instead of adding a new field, I think we can reuse container priority and 
application priority to describe preemptability.

bq. Allow LR Applications to specify peak, min and variance/mean (also many 
transient and steady-state) of a Resource request to allow schedulers to make 
better allocation decisions.
I think this is hard for end user to know. Our framework should be able to 
figure out such metrics for running containers. For requested new containers, 
we'd better assume they will use 100% of requested resources.

bq. In YARN-4597 Chris Douglas proposed ...
In my mind, YARN-4597 is targeted to solve low latency batch tasks, if service 
tasks running for one hour or more, it's not a big deal to take several minutes 
to setup it.

And agree that reservation system (YARN-1051) is the utimate solution of queue 
model and container allocation for services

> [Umbrella] Simplified and first-class support for services in YARN
> ------------------------------------------------------------------
>
>                 Key: YARN-4692
>                 URL: https://issues.apache.org/jira/browse/YARN-4692
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: 
> YARN-First-Class-And-Simplified-Support-For-Services-v0.pdf
>
>
> YARN-896 focused on getting the ball rolling on the support for services 
> (long running applications) on YARN.
> I’d like propose the next stage of this effort: _Simplified and first-class 
> support for services in YARN_.
> The chief rationale for filing a separate new JIRA is threefold:
>  - Do a fresh survey of all the things that are already implemented in the 
> project
>  - Weave a comprehensive story around what we further need and attempt to 
> rally the community around a concrete end-goal, and
>  - Additionally focus on functionality that YARN-896 and friends left for 
> higher layers to take care of and see how much of that is better integrated 
> into the YARN platform itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to