[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-415:
----------------------------

    Attachment: YARN-415.201408150030.txt

{quote}
- Can you please elaborate in what scenario we need the following extra check?
{code}
    // Only add in the running containers if this is the active attempt.
    RMAppAttempt currentAttempt = rmContext.getRMApps()
                   .get(attemptId.getApplicationId()).getCurrentAppAttempt();
    if (currentAttempt != null &&
        currentAttempt.getAppAttemptId().compareTo(attemptId) == 0) {
      ApplicationResourceUsageReport appResUsageReport = rmContext
            .getScheduler().getAppResourceUsageReport(attemptId);
      if (appResUsageReport != null) {
        memorySeconds += appResUsageReport.getMemorySeconds();
        vcoreSeconds += appResUsageReport.getVcoreSeconds();
      }
    }
{code}
{quote}

An app could have multiple attempts if, for example, the first attempt died in 
the middle and the RM starts a second attempt for this app. In that situation, 
when RMAppAttemptMetrics#getRMAppMetrics is called for the first attempt, we 
only want to report the info for the completed containers, and when it is 
called for the second (running) attempt, we want to report for both completed 
and running containers. Of course, this is a little misleading when you have 
work-preserving restart enabled, and the running containers didn't die with the 
first attempt. While they are running, they are reported as the metrics for the 
second attempt, but when they complete, their metrics go back into the first 
attempt. Since these metrics are only reported at the app level, I think this 
should be okay. The important thing is that the running metrics only get 
reported once and don't get double-counted.

{quote}
- Also, currentAttempt.getAppAttemptId().compareTo(attemptId) == 0, we can use 
equals instead which looks more intuitive. 
{quote}
Good point. I made the change.

{quote}
- getFinishedMemorySeconds and getFinishedVcoreSeconds methods are not used.
- For setFinishedVcoreSeconds and setFinishedMemorySeconds, we can just use 
updateResourceUtilization
{quote}
I used updateResourceUtilization as you suggested, and removed the getters and 
setters.

{quote}
- RMStateStore#removeApplication: no need to calculate the memory utilization 
when removing the app. Saving some cost for the loop of attempts
{quote}
Good catch. I removed this calculation.


> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to