[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100749#comment-14100749
 ] 

Karthik Kambatla commented on YARN-415:
---------------------------------------

[~eepayne] - Sorry again for coming in so late. I am not completely sure myself 
(yet) how we can use the timeline server or if it makes sense to do that. I 
guess I need to first understand what we are trying to accomplish here. Could 
you please correct me/comment on the following items. 
# The goal is to capture memory utilization at the app-level for chargeback. I 
like the goal, but would like to understand the usecases we have in mind. Is 
the chargeback simply to track the usage and may be financially charge the 
users. Or, is to influence future scheduling decisions? I agree that the RM 
should facilitate collecting this information, but should the collected info be 
available to the RM for future use? If not, do we want the RM to serve this 
info? 
# Do we want to charge the app only for the resources used to do meaningful 
work or do we also want to include failed/preempted containers? If we don't 
charge the app for failed containers, who are they charged to? Are we okay with 
letting some resources go uncharged? 
# How soon do we want this usage information? It might make sense to 
collect/expose this once the app is finished for certain kinds of applications. 
What is our story for long-running applications? 

As Jian suggested, I would be up for getting in those parts that we are clear 
about and file follow-up JIRAs for those that need more discussion. 

> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to