[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786577#comment-13786577
 ] 

Jason Lowe commented on YARN-415:
---------------------------------

The latest patch no longer applies to trunk.  Could you please refresh it?  
Some review comments:

General:
* Nit: the extra plurality of VirtualCoresSeconds sounds a bit odd, wondering 
if it should be VirtualCoreSeconds or VcoreSeconds in the various places it 
appears.

ApplicationCLI:
* UI wording: In the code it's vcore-seconds but the UI says CPU-seconds.  I'm 
wondering if users are going to interpret CPU to be a hardware core, and I'm 
not sure a vcore will map to a hardware core in the typical case.  The 
configuration properties refer to vcores, so we should probably use 
vcore-seconds here for consistency.  Curious what others think about this, as I 
could be convinced to leave it as CPU.

RMAppAttempt has just a spurious whitespace change

RMAppAttemptImpl:
* Nit: containerAllocated and containerFinished are private and always called 
from transitions, so acquiring the write lock is unnecessary.
* ContainerFinishedTransition.transition does not call containerFinished when 
it's the AM container.  We "leak" the AM container and consider it always 
running if an AM crashes.

RMContainerEvent:
* Nit: whitespace between the constructor definitions would be nice.

TestRMAppAttemptTransitions:
* Nit: it would be cleaner and easier to read if we add a new 
allocateApplicationAttemptAtTime method and have the existing 
allocateApplicationAttempt method simply call it with -1 rather than change all 
those places to pass -1.

Speaking of "leaking" containers, is there something we can do to audit/assert 
that applications that have completed don't have running containers?  If we 
lose track of a container finished event, the consumed resources are going to 
keep increasing indefinitely.  It's a bug in the RM either way but wondering if 
there's some warning/sanity checking we can do to keep the metric from becoming 
utterly useless when it occurs.  Capping it at the end of the application would 
at least prevent it from growing beyond the application lifetime.  Then again, 
letting it grow continuously at least is more indicative something went 
terribly wrong with the accounting and therefore the metric can't be trusted.  
Just thinking out loud, not sure what the best solution is.

> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to