[jira] [Commented] (YARN-4080) Capacity planning for long running services on YARN

2015-08-28 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720538#comment-14720538
 ] 

Subru Krishnan commented on YARN-4080:
--

[~mding], your proposal looks interesting and thanks for taking a look at 
YARN-1051. You are right that the main use case of the reservation system is to 
address SLAs but it can be used for capacity planning for long running services 
by specifying start time as now and deadline as infinity. This should provide 
more predictability for long running services as you can handle dynamic 
resource requirements of a service as YARN-1051 allows expressing time varying 
capacity. Additionally in combination with YARN-2877, you should be able to 
achieve the dynamic host based reservation mechanics you have proposed.

 Capacity planning for long running services on YARN
 ---

 Key: YARN-4080
 URL: https://issues.apache.org/jira/browse/YARN-4080
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Reporter: MENG DING

 YARN-1197 addresses the functionality of container resource resize. One major 
 use case of this feature is for long running services managed by Slider to 
 dynamically flex up and down resource allocation of individual components 
 (e.g., HBase region server), based on application metrics/alerts obtained 
 through third-party monitoring and policy engine. 
 One key issue with increasing container resource at any point of time is that 
 the additional resource needed by the application component may not be 
 available *on the specific node*. In this case, we need to rely on preemption 
 logic to reclaim the required resource back from other (preemptable) 
 applications running on the same node. But this may not be possible today 
 because:
 * preemption doesn't consider constraints of pending resource requests, such 
 as hard locality requirements, user limits, etc (being addressed in YARN-2154 
 and possibly in YARN-3769?) 
 * there may not be any preemptable container available due to the fact that 
 no queue is over its guaranteed capacity.
 What we need, ideally, is a way for YARN to support future capacity planning 
 of long running services. At the minimum, we need to provide a way to let 
 YARN know about the resource usage prediction/pattern of a long running 
 service. And given this knowledge, YARN should be able to preempt resources 
 from other applications to accommodate the resource needs of the long running 
 service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4080) Capacity planning for long running services on YARN

2015-08-25 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711622#comment-14711622
 ] 

MENG DING commented on YARN-4080:
-

Not sure if the title accurately reflects the problem. If you think there is a 
better way to describe the problem, please suggest.

For the use case presented in the description, one possible direction to 
consider is something like a dynamic host-based reservation (note, this is not 
the same as the current container reservation in YARN), for example:
* when asking for resource requirement, one can specify the initial resource 
capability, and a reserved resource capability on whatever host that the 
container is launched on. For example, I can say I want 2GB of initial resource 
for a container, and once that container is launched, reserve up to 16GB of 
resource for the container on that host, as I expect the resource usage of the 
container will fluctuate over time, and will sometime peak at 16GB.
* if this reserved resource is not fully utilized, it can still be allocated to 
other applications, but the scheduler will indicate that the allocated resource 
is revocable, such that no critical service should use this chunk of resource
* when scheduler is allocating new resource, it should first consider resource 
that has not been reserved
* preemption logic should also preempt these kind of revocable resource if 
needed

The above is similar to the dynamic reservation feature being implemented in 
Mesos: https://issues.apache.org/jira/browse/MESOS-2018

I also took a look at YARN-1051 to see if the current reservation system in 
YARN could help with this situation, but to the best of my knowledge, it seems 
to mainly address applications with a future start time and a predictable 
deadline. Please correct me if I am wrong.

Let me know if you have any thoughts, comments or ideas.

 Capacity planning for long running services on YARN
 ---

 Key: YARN-4080
 URL: https://issues.apache.org/jira/browse/YARN-4080
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Reporter: MENG DING

 YARN-1197 addresses the functionality of container resource resize. One major 
 use case of this feature is for long running services managed by Slider to 
 dynamically flex up and down resource allocation of individual components 
 (e.g., HBase region server), based on application metrics/alerts obtained 
 through third-party monitoring and policy engine. 
 One key issue with increasing container resource at any point of time is that 
 the additional resource needed by the application component may not be 
 available *on the specific node*. In this case, we need to rely on preemption 
 logic to reclaim the required resource back from other (preemptable) 
 applications running on the same node. But this may not be possible today 
 because:
 * preemption doesn't consider constraints of pending resource requests, such 
 as hard locality requirements, user limits, etc (being addressed in YARN-2154 
 and possibly in YARN-3769?) 
 * there may not be any preemptable container available due to the fact that 
 no application is over its guaranteed capacity.
 What we need, ideally, is a way for YARN to support future capacity planning 
 of long running services. At the minimum, we need to provide a way to let 
 YARN know about the resource usage prediction/pattern of a long running 
 service. And given this knowledge, YARN should be able to preempt resources 
 from other applications to accommodate the resource needs of the long running 
 service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)