Kendall Thrapp commented on YARN-415:
1. Is the chargeback simply to track the usage and may be financially charge
the users. Or, is to influence future scheduling decisions? I agree that the RM
should facilitate collecting this information, but should the collected info be
available to the RM for future use? If not, do we want the RM to serve this
In addition to the goals [~eepayne] listed, another goal is to make it easier
for users to compare how code changes to a particular recurring Hadoop job
affect its resource usage. Assuming input data size didn't significantly
change, It'd be much more apparent after to the user after a code change if
there was a resulting significant change in the resource usage for their job.
Even without charging, I'm hoping that having the resource usage shown to the
user, without any extra work on their part, will make more people think about
their overall grid resource usage, instead of just run times.
> Capture memory utilization at the app-level for chargeback
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: resourcemanager
> Affects Versions: 0.23.6
> Reporter: Kendall Thrapp
> Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch,
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch,
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch,
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt,
> YARN-415.201406262136.txt, YARN-415.201407042037.txt,
> YARN-415.201407071542.txt, YARN-415.201407171553.txt,
> YARN-415.201407172144.txt, YARN-415.201407232237.txt,
> YARN-415.201407242148.txt, YARN-415.201407281816.txt,
> YARN-415.201408062232.txt, YARN-415.201408080204.txt,
> YARN-415.201408092006.txt, YARN-415.201408132109.txt,
> YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.patch
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage. To start out, I'd like to
> get the memory utilization of an application. The unit should be MB-seconds
> or something similar and, from a chargeback perspective, the memory amount
> should be the memory reserved for the application, as even if the app didn't
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web
> Services REST API.
This message was sent by Atlassian JIRA