Eric Payne updated YARN-415:

    Attachment: YARN-415.201408181938.txt

[~jianhe], thank you for your continuing reviews and comments.

Particularly in work-preserving AM restart, current AM is actually the one 
who's managing previous running containers. Running containers in scheduler are 
already transferred to the current AM. So running containers metrics are 
transferred as well. I think it'll be confusing if finished containers are 
still charged back against the previous dead attempt. Btw, YARN-1809 will add 
the attempt web page where we could show attempt-specific metrics also. 
You are correct. In the work-preserving AM restart case, the live containers 
are transferred to the new attempt for the remaining lifetime of the container, 
and then when the container completes, the original attempt gets the 
CONTAINER_FINISHED event. But I see your point about being consistent in the 
work-preserving AM restart case. So, I have attached a patch which will charge 
container usage to the current attempt, whether the container is running or 
Regarding the problem of metrics persistency. Agree that it doesn't solve the 
problem for running apps in general. Maybe we can have the state store changes 
in a separate jira and discuss more there, so that we can get this in first. 
Yes, I would appreciate it if we could continue this discussion on a separate 

> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.patch
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.

This message was sent by Atlassian JIRA

Reply via email to