[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066145#comment-14066145
 ] 

Wangda Tan commented on YARN-415:
---------------------------------

Hi [~eepayne],
I've spent some time to review and think about the JIRA. I have a 

1. Revert changes of SchedulerAppReport, we already have changed 
ApplicationResourceUsageReport, and memory utilization should be a part of 
resource usage report.

2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify 
RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running 
resource utilization.

3. put
{code}
         ._("Resources:",
            String.format("%d MB-seconds, %d vcore-seconds", 
                app.getMemorySeconds(), app.getVcoreSeconds()))
{code}
from "Application Overview" to "Application Metrics", and rename it to 
"Resource Seconds". It should be considered as a part of application metrics 
instead of overview.

4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to 
make it can be efficiently accessed by multi-thread.

5. I think it's better to add a new method in SchedulerApplicationAttempt like 
getMemoryUtilization, which will only return memory/cpu seconds. We do this to 
prevent locking scheduling thread when showing application metrics on web UI.
getMemoryUtilization will be used by 
RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running 
resource utilization. And used by 
SchedulerApplicationAttempt#getResourceUsageReport as well.

The MemoryUtilization class may contain two fields: 
runningContainerMemory(VCore)Seconds.

6. Since compute running container resource utilization is not O(1), we need 
scan all containers under an application. I think it's better to cache a 
previous compute result, and it will be recomputed after several seconds (maybe 
1-3 seconds should be enough) elapsed.

And you can modify SchedulerApplicationAttempt#liveContainers to be a 
ConcurrentHashMap. With #6, get memory utilization to show metrics on web UI 
will not lock scheduling thread at all.

Please let me know if you have any comments here,

Thanks,
Wangda


> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to