[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields

2015-02-06 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310136#comment-14310136
 ] 

Kendall Thrapp commented on YARN-3143:
--

Thanks [~jlowe] for debugging and the super quick patch and thanks [~eepayne] 
and [~kihwal] for reviewing.

> RM Apps REST API can return NPE or entries missing id and other fields
> --
>
> Key: YARN-3143
> URL: https://issues.apache.org/jira/browse/YARN-3143
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.5.2
>Reporter: Kendall Thrapp
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-3143.001.patch
>
>
> I'm seeing intermittent null pointer exceptions being returned by
> the YARN Apps REST API.
> For example:
> {code}
> http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
> {code}
> JSON Response was:
> {code}
> {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}
> {code}
> At a glance appears to be only when we query for unfinished apps (i.e. 
> finalStatus=UNDEFINED).  
> Possibly related, when I do get back a list of apps, sometimes one or more of 
> the apps will be missing most of the fields, like id, name, user, etc., and 
> the fields that are present all have zero for the value.  
> For example:
> {code}
> {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}
> {code}
> Let me know if there's any other information I can provide to help debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields

2015-02-04 Thread Kendall Thrapp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kendall Thrapp updated YARN-3143:
-
Description: 
I'm seeing intermittent null pointer exceptions being returned by
the YARN Apps REST API.

For example:
{code}
http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
{code}

JSON Response was:
{code}
{"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}
{code}

At a glance appears to be only when we query for unfinished apps (i.e. 
finalStatus=UNDEFINED).  

Possibly related, when I do get back a list of apps, sometimes one or more of 
the apps will be missing most of the fields, like id, name, user, etc., and the 
fields that are present all have zero for the value.  

For example:
{code}
{"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}
{code}

Let me know if there's any other information I can provide to help debug.

  was:
I'm seeing intermittent null pointer exceptions being returned by
the YARN Apps REST API.

For example:
http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED

JSON Response was:
{"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}

At a glance appears to be only when we query for unfinished apps (i.e. 
finalStatus=UNDEFINED).  

Possibly related, when I do get back a list of apps, sometimes one or more of 
the apps will be missing most of the fields, like id, name, user, etc., and the 
fields that are present all have zero for the value.  

For example:
{"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}

Let me know if there's any other information I can provide to help debug.


> RM Apps REST API can return NPE or entries missing id and other fields
> --
>
> Key: YARN-3143
> URL: https://issues.apache.org/jira/browse/YARN-3143
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.5.2
>Reporter: Kendall Thrapp
>
> I'm seeing intermittent null pointer exceptions being returned by
> the YARN Apps REST API.
> For example:
> {code}
> http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
> {code}
> JSON Response was:
> {code}
> {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}
> {code}
> At a glance appears to be only when we query for unfinished apps (i.e. 
> finalStatus=UNDEFINED).  
> Possibly related, when I do get back a list of apps, sometimes one or more of 
> the apps will be missing most of the fields, like id, name, user, etc., and 
> the fields that are present all have zero for the value.  
> For example:
> {code}
> {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}
> {code}
> Let me know if there's any other information I can provide to help debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields

2015-02-04 Thread Kendall Thrapp (JIRA)
Kendall Thrapp created YARN-3143:


 Summary: RM Apps REST API can return NPE or entries missing id and 
other fields
 Key: YARN-3143
 URL: https://issues.apache.org/jira/browse/YARN-3143
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.2
Reporter: Kendall Thrapp


I'm seeing intermittent null pointer exceptions being returned by
the YARN Apps REST API.

For example:
http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED

JSON Response was:
{"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}

At a glance appears to be only when we query for unfinished apps (i.e. 
finalStatus=UNDEFINED).  

Possibly related, when I do get back a list of apps, sometimes one or more of 
the apps will be missing most of the fields, like id, name, user, etc., and the 
fields that are present all have zero for the value.  

For example:
{"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}

Let me know if there's any other information I can provide to help debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-18 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139336#comment-14139336
 ] 

Kendall Thrapp commented on YARN-415:
-

Thanks [~eepayne], [~aklochkov], [~jianhe], [~wangda], [~kasha], [~sandyr] and 
[~jlowe] for all your effort on this!  Looking forward to being able to use 
this feature.

> Capture aggregate memory allocation at the app-level for chargeback
> ---
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Kendall Thrapp
>Assignee: Eric Payne
> Fix For: 2.6.0
>
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
> YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
> YARN-415.201409040036.txt, YARN-415.201409092204.txt, 
> YARN-415.201409102216.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-415) Capture memory allocation at the app-level for chargeback

2014-08-21 Thread Kendall Thrapp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kendall Thrapp updated YARN-415:


Summary: Capture memory allocation at the app-level for chargeback  (was: 
Capture memory utilization at the app-level for chargeback)

> Capture memory allocation at the app-level for chargeback
> -
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
> YARN-415.201408181938.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory allocation at the app-level for chargeback

2014-08-21 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105569#comment-14105569
 ] 

Kendall Thrapp commented on YARN-415:
-

Updated the JIRA title to say allocation instead of utilization.

> Capture memory allocation at the app-level for chargeback
> -
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
> YARN-415.201408181938.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-18 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101445#comment-14101445
 ] 

Kendall Thrapp commented on YARN-415:
-

{quote}
1. Is the chargeback simply to track the usage and may be financially charge 
the users. Or, is to influence future scheduling decisions? I agree that the RM 
should facilitate collecting this information, but should the collected info be 
available to the RM for future use? If not, do we want the RM to serve this 
info?
{quote}
In addition to the goals [~eepayne] listed, another goal is to make it easier 
for users to compare how code changes to a particular recurring Hadoop job 
affect its resource usage.  Assuming input data size didn't significantly 
change, It'd be much more apparent after to the user after a code change if 
there was a resulting significant change in the resource usage for their job.  
Even without charging, I'm hoping that having the resource usage shown to the 
user, without any extra work on their part, will make more people think about 
their overall grid resource usage, instead of just run times.



> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-04-23 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978384#comment-13978384
 ] 

Kendall Thrapp commented on YARN-415:
-

Hi Andrey, any update on this?  Thanks!

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-01-08 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865739#comment-13865739
 ] 

Kendall Thrapp commented on YARN-415:
-

Thanks Andrey for all your work on this!  I'm looking forward to being able to 
use this.  Any updates?

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798511#comment-13798511
 ] 

Kendall Thrapp commented on YARN-415:
-

Thanks Andrey for implementing this.  I'm looking forward to being able to use 
it.  Just a reminder to also update the REST API docs 
(http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API).

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-691) Invalid NaN values in Hadoop REST API JSON response

2013-05-16 Thread Kendall Thrapp (JIRA)
Kendall Thrapp created YARN-691:
---

 Summary: Invalid NaN values in Hadoop REST API JSON response
 Key: YARN-691
 URL: https://issues.apache.org/jira/browse/YARN-691
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp


I've been occasionally coming across instances where Hadoop's Cluster 
Applications REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API)
 has returned JSON that PHP's json_decode function failed to parse.  I've 
tracked the syntax error down to the presence of the unquoted word NaN 
appearing as a value in the JSON.  For example:

"progress":NaN,

NaN is not part of the JSON spec, so its presence renders the whole JSON string 
invalid.  Hadoop needs to return something other than NaN in this case -- 
perhaps an empty string or the quoted string "NaN".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-19 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606949#comment-13606949
 ] 

Kendall Thrapp commented on YARN-462:
-

Karthik, I think your suggestion for transparent project queues under the leaf 
queues is an interesting idea and would also meet my requirements.

> Project Parameter for Chargeback
> 
>
> Key: YARN-462
> URL: https://issues.apache.org/jira/browse/YARN-462
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need 
> to be able to associate applications with "projects", e.g. "pipeline X", 
> "property Y".  This would allow us to aggregate on this property, thereby 
> helping us compute grid resource usage for the entire "project".  Currently, 
> for a given application, two things we know about it are the user that 
> submitted it and the queue it was submitted to.  Below, I'll explain why 
> neither of these is adequate for enterprise-level chargeback and 
> understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects.  When one 
> of our real users submits an application on a Hadoop grid, they're presumably 
> not usually doing it for themselves.  They're doing work for some project or 
> team effort, so its that team or project that should be "charged" for all its 
> users applications.  Maintaining outside lists of associations between users 
> and projects is error-prone because it is time-sensitive and requires 
> continued ongoing maintenance.  New users join organizations, users leave and 
> users even change projects.  Furthermore, users may split their time between 
> multiple projects, making it ambiguous as to which of a user's projects a 
> given application should be charged.  Also, there can be headless users, 
> which can be even more difficult to link to a project and can be shared 
> between teams or projects.
> Why Not Queues?
> The purpose of queues is for scheduling.  Overloading the queues concept to 
> also mean who should be "charged" for an application can have a detrimental 
> effect on the primary purpose of queues.  It could be manageable in the case 
> of a very small number of projects sharing a cluster, but doesn't scale to 
> tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
> between 50 projects, creating 50 separate queues will result in inefficient 
> use of the cluster resources.  Furthermore, a given project may desire more 
> than one queue for different types or priorities of applications.  
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue 
> who to "charge" for a given application, I propose a straightforward approach 
> where that information be explicitly supplied when the application is 
> submitted, just like we do with queues.  Let's use a charge card analogy: 
> when you buy something online, you don't just say who you are and how to ship 
> it, you also specify how you're paying for it.  Similarly, when submitting an 
> application in YARN, you could explicitly specify to whom it's resource usage 
> should be associated (a project, team, cost center, etc).
> This new configuration parameter should default to being optional, so that 
> organizations not interested in chargeback or project-level resource tracking 
> can happily continue on as if it wasn't there.  However, it should be 
> configurable at the cluster-level such that, a given cluster to could elect 
> to make it required, so that all applications would have an associated 
> project.  The value of this new parameter should be exposed via the Resource 
> Manager UI and Resource Manager REST API, so that users and tools can make 
> use of it for chargeback, utilization metrics, etc.
> I'm undecided on what to name the new parameter, as I like the flexibility in 
> the ways it could be used.  It is essentially just an additional party other 
> than user or queue that an application can be associated with, so its use is 
> not just limited to a chargeback scenario.  For example, an organization not 
> interested in chargeback could still use this parameter to communicate useful 
> information about a application (e.g. pipelineX.stageN) and aggregate like 
> applications.
> Enforcement
> Couldn't users just specify this information as a prefix for their job names? 
>  Yes, but the missing piece this could provides is enforcement.  Ideally, I'd 
> like this parameter to work very much like how the queues work.  Like already 
> exists with queues, it'd be ideal if a given user couldn't just specify any 
> old value for this pa

[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-03-13 Thread Kendall Thrapp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kendall Thrapp updated YARN-473:


Description: 
The Capacity Scheduler REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
 is not returning the correct number of pending applications.  
numPendingApplications is almost always zero, even if there are dozens of 
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler 
webpage is also showing an incorrect but different number of pending 
applications.  For example, the cluster I'm looking at right now currently has 
15 applications in the ACCEPTED state, but the Cluster Metrics table near the 
top of the page says there are only 2 pending apps.  The REST API says there 
are zero pending apps.

  was:
The Capacity Scheduler REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
 is not returning the correct number of pending applications.  
numPendingApplications is almost always zero, even if there are dozens of 
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler 
webpage is als showing an incorrect but different number of pending 
applications.  For example, the cluster I'm looking at right now currently has 
15 applications in the ACCEPTED state, but the Cluster Metrics table near the 
top of the page says there are only 2 pending apps.  The REST API says there 
are zero pending apps.


> Capacity Scheduler webpage and REST API not showing correct number of pending 
> applications
> --
>
> Key: YARN-473
> URL: https://issues.apache.org/jira/browse/YARN-473
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>
> The Capacity Scheduler REST API 
> (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
>  is not returning the correct number of pending applications.  
> numPendingApplications is almost always zero, even if there are dozens of 
> pending apps.
> In investigating this, I discovered that the Resource Manager's Scheduler 
> webpage is also showing an incorrect but different number of pending 
> applications.  For example, the cluster I'm looking at right now currently 
> has 15 applications in the ACCEPTED state, but the Cluster Metrics table near 
> the top of the page says there are only 2 pending apps.  The REST API says 
> there are zero pending apps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-03-13 Thread Kendall Thrapp (JIRA)
Kendall Thrapp created YARN-473:
---

 Summary: Capacity Scheduler webpage and REST API not showing 
correct number of pending applications
 Key: YARN-473
 URL: https://issues.apache.org/jira/browse/YARN-473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp


The Capacity Scheduler REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
 is not returning the correct number of pending applications.  
numPendingApplications is almost always zero, even if there are dozens of 
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler 
webpage is als showing an incorrect but different number of pending 
applications.  For example, the cluster I'm looking at right now currently has 
15 applications in the ACCEPTED state, but the Cluster Metrics table near the 
top of the page says there are only 2 pending apps.  The REST API says there 
are zero pending apps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601258#comment-13601258
 ] 

Kendall Thrapp commented on YARN-462:
-

And yes, the case where entity A (real or headless user) is part of two other 
entities (teams or projects) B and C and submits jobs to both queues is one of 
the tricky issues I'm hoping to solve.  Another case is where last week user A 
was part of team B, but this week is now part of team C, and not wanting any 
ambiguity in attributing user A's resource usage to the correct team, no matter 
what day's metrics I'm looking at.  In large enough organizations, that's not 
necessarily a rare occurrence.  

> Project Parameter for Chargeback
> 
>
> Key: YARN-462
> URL: https://issues.apache.org/jira/browse/YARN-462
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need 
> to be able to associate applications with "projects", e.g. "pipeline X", 
> "property Y".  This would allow us to aggregate on this property, thereby 
> helping us compute grid resource usage for the entire "project".  Currently, 
> for a given application, two things we know about it are the user that 
> submitted it and the queue it was submitted to.  Below, I'll explain why 
> neither of these is adequate for enterprise-level chargeback and 
> understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects.  When one 
> of our real users submits an application on a Hadoop grid, they're presumably 
> not usually doing it for themselves.  They're doing work for some project or 
> team effort, so its that team or project that should be "charged" for all its 
> users applications.  Maintaining outside lists of associations between users 
> and projects is error-prone because it is time-sensitive and requires 
> continued ongoing maintenance.  New users join organizations, users leave and 
> users even change projects.  Furthermore, users may split their time between 
> multiple projects, making it ambiguous as to which of a user's projects a 
> given application should be charged.  Also, there can be headless users, 
> which can be even more difficult to link to a project and can be shared 
> between teams or projects.
> Why Not Queues?
> The purpose of queues is for scheduling.  Overloading the queues concept to 
> also mean who should be "charged" for an application can have a detrimental 
> effect on the primary purpose of queues.  It could be manageable in the case 
> of a very small number of projects sharing a cluster, but doesn't scale to 
> tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
> between 50 projects, creating 50 separate queues will result in inefficient 
> use of the cluster resources.  Furthermore, a given project may desire more 
> than one queue for different types or priorities of applications.  
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue 
> who to "charge" for a given application, I propose a straightforward approach 
> where that information be explicitly supplied when the application is 
> submitted, just like we do with queues.  Let's use a charge card analogy: 
> when you buy something online, you don't just say who you are and how to ship 
> it, you also specify how you're paying for it.  Similarly, when submitting an 
> application in YARN, you could explicitly specify to whom it's resource usage 
> should be associated (a project, team, cost center, etc).
> This new configuration parameter should default to being optional, so that 
> organizations not interested in chargeback or project-level resource tracking 
> can happily continue on as if it wasn't there.  However, it should be 
> configurable at the cluster-level such that, a given cluster to could elect 
> to make it required, so that all applications would have an associated 
> project.  The value of this new parameter should be exposed via the Resource 
> Manager UI and Resource Manager REST API, so that users and tools can make 
> use of it for chargeback, utilization metrics, etc.
> I'm undecided on what to name the new parameter, as I like the flexibility in 
> the ways it could be used.  It is essentially just an additional party other 
> than user or queue that an application can be associated with, so its use is 
> not just limited to a chargeback scenario.  For example, an organization not 
> interested in chargeback could still use this parameter to communicate useful 
> information about a application (e.g. pipelineX.stageN) and aggregate like 
> applicatio

[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601245#comment-13601245
 ] 

Kendall Thrapp commented on YARN-462:
-

Thanks for the questions and feedback.  Yes, first I should clarify what I 
intended by chargeback.  I'm looking to be able quantify cluster resource usage 
(memory, CPU, HDFS, etc.) for every application, and then roll that up to the 
project level.  This would allow us to accurately charge the customer (i.e. 
team/project) for their grid usage (either literally or just informatively).  I 
want to provide incentive for more efficient coding, as well as make it easier 
for teams to compare their resource usage across different software versions of 
their Hadoop applications, config parameter changes, etc.

I had originally hoped that hierarchical queues could serve this purpose as 
well, but have since run into several issues with this approach.  The first is 
that it doesn't scale for clusters with large numbers of projects.  I've seen 
large clusters shared between over a hundred different projects, each with 
their own teams of users.  If I recall correctly, queues can't be assigned less 
than 1% of the total capacity, so it wouldn't be possible to give each of these 
project their own queue.  Even if we could, I suspect this could result in too 
much overhead for the scheduler and too much fragmentation of the cluster 
resources, which could result in poorer overall utilization.

The second issue is that the project-per-queue approach conflicts with how I 
see users wanting to use our queues.  In many cases I see queues being used to 
distinguish application priorities, ensuring that high priority time-sensitive 
jobs get the resources they need to finish on time, while big but lower 
priority and less time-sensitive jobs are constrained by being in a smaller 
queue.  I'd expect a lot of pushback from our users for any chargeback-focused 
queue configuration that had a negative impact on job run times and meeting 
SLAs.  The idea of the project/chargeback parameter decouples the two.

> Project Parameter for Chargeback
> 
>
> Key: YARN-462
> URL: https://issues.apache.org/jira/browse/YARN-462
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need 
> to be able to associate applications with "projects", e.g. "pipeline X", 
> "property Y".  This would allow us to aggregate on this property, thereby 
> helping us compute grid resource usage for the entire "project".  Currently, 
> for a given application, two things we know about it are the user that 
> submitted it and the queue it was submitted to.  Below, I'll explain why 
> neither of these is adequate for enterprise-level chargeback and 
> understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects.  When one 
> of our real users submits an application on a Hadoop grid, they're presumably 
> not usually doing it for themselves.  They're doing work for some project or 
> team effort, so its that team or project that should be "charged" for all its 
> users applications.  Maintaining outside lists of associations between users 
> and projects is error-prone because it is time-sensitive and requires 
> continued ongoing maintenance.  New users join organizations, users leave and 
> users even change projects.  Furthermore, users may split their time between 
> multiple projects, making it ambiguous as to which of a user's projects a 
> given application should be charged.  Also, there can be headless users, 
> which can be even more difficult to link to a project and can be shared 
> between teams or projects.
> Why Not Queues?
> The purpose of queues is for scheduling.  Overloading the queues concept to 
> also mean who should be "charged" for an application can have a detrimental 
> effect on the primary purpose of queues.  It could be manageable in the case 
> of a very small number of projects sharing a cluster, but doesn't scale to 
> tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
> between 50 projects, creating 50 separate queues will result in inefficient 
> use of the cluster resources.  Furthermore, a given project may desire more 
> than one queue for different types or priorities of applications.  
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue 
> who to "charge" for a given application, I propose a straightforward approach 
> where that information be explicitly supplied when the application is 
> submitted, just like we do with queues.  Let's use a cha

[jira] [Updated] (YARN-462) Project Parameter for Chargeback

2013-03-08 Thread Kendall Thrapp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kendall Thrapp updated YARN-462:


Issue Type: New Feature  (was: Improvement)

> Project Parameter for Chargeback
> 
>
> Key: YARN-462
> URL: https://issues.apache.org/jira/browse/YARN-462
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need 
> to be able to associate applications with "projects", e.g. "pipeline X", 
> "property Y".  This would allow us to aggregate on this property, thereby 
> helping us compute grid resource usage for the entire "project".  Currently, 
> for a given application, two things we know about it are the user that 
> submitted it and the queue it was submitted to.  Below, I'll explain why 
> neither of these is adequate for enterprise-level chargeback and 
> understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects.  When one 
> of our real users submits an application on a Hadoop grid, they're presumably 
> not usually doing it for themselves.  They're doing work for some project or 
> team effort, so its that team or project that should be "charged" for all its 
> users applications.  Maintaining outside lists of associations between users 
> and projects is error-prone because it is time-sensitive and requires 
> continued ongoing maintenance.  New users join organizations, users leave and 
> users even change projects.  Furthermore, users may split their time between 
> multiple projects, making it ambiguous as to which of a user's projects a 
> given application should be charged.  Also, there can be headless users, 
> which can be even more difficult to link to a project and can be shared 
> between teams or projects.
> Why Not Queues?
> The purpose of queues is for scheduling.  Overloading the queues concept to 
> also mean who should be "charged" for an application can have a detrimental 
> effect on the primary purpose of queues.  It could be manageable in the case 
> of a very small number of projects sharing a cluster, but doesn't scale to 
> tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
> between 50 projects, creating 50 separate queues will result in inefficient 
> use of the cluster resources.  Furthermore, a given project may desire more 
> than one queue for different types or priorities of applications.  
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue 
> who to "charge" for a given application, I propose a straightforward approach 
> where that information be explicitly supplied when the application is 
> submitted, just like we do with queues.  Let's use a charge card analogy: 
> when you buy something online, you don't just say who you are and how to ship 
> it, you also specify how you're paying for it.  Similarly, when submitting an 
> application in YARN, you could explicitly specify to whom it's resource usage 
> should be associated (a project, team, cost center, etc).
> This new configuration parameter should default to being optional, so that 
> organizations not interested in chargeback or project-level resource tracking 
> can happily continue on as if it wasn't there.  However, it should be 
> configurable at the cluster-level such that, a given cluster to could elect 
> to make it required, so that all applications would have an associated 
> project.  The value of this new parameter should be exposed via the Resource 
> Manager UI and Resource Manager REST API, so that users and tools can make 
> use of it for chargeback, utilization metrics, etc.
> I'm undecided on what to name the new parameter, as I like the flexibility in 
> the ways it could be used.  It is essentially just an additional party other 
> than user or queue that an application can be associated with, so its use is 
> not just limited to a chargeback scenario.  For example, an organization not 
> interested in chargeback could still use this parameter to communicate useful 
> information about a application (e.g. pipelineX.stageN) and aggregate like 
> applications.
> Enforcement
> Couldn't users just specify this information as a prefix for their job names? 
>  Yes, but the missing piece this could provides is enforcement.  Ideally, I'd 
> like this parameter to work very much like how the queues work.  Like already 
> exists with queues, it'd be ideal if a given user couldn't just specify any 
> old value for this parameter.  It could be configurable such that a given 
> user only has permission to submit applications for specific "projects".  
> Submitting an application with this

[jira] [Created] (YARN-462) Project Parameter for Chargeback

2013-03-08 Thread Kendall Thrapp (JIRA)
Kendall Thrapp created YARN-462:
---

 Summary: Project Parameter for Chargeback
 Key: YARN-462
 URL: https://issues.apache.org/jira/browse/YARN-462
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp


Problem Summary

For the purpose of chargeback and better understanding of grid usage, we need 
to be able to associate applications with "projects", e.g. "pipeline X", 
"property Y".  This would allow us to aggregate on this property, thereby 
helping us compute grid resource usage for the entire "project".  Currently, 
for a given application, two things we know about it are the user that 
submitted it and the queue it was submitted to.  Below, I'll explain why 
neither of these is adequate for enterprise-level chargeback and understanding 
resource allocation needs.

Why Not Users?

Its not individual users that are paying the bill -- its projects.  When one of 
our real users submits an application on a Hadoop grid, they're presumably not 
usually doing it for themselves.  They're doing work for some project or team 
effort, so its that team or project that should be "charged" for all its users 
applications.  Maintaining outside lists of associations between users and 
projects is error-prone because it is time-sensitive and requires continued 
ongoing maintenance.  New users join organizations, users leave and users even 
change projects.  Furthermore, users may split their time between multiple 
projects, making it ambiguous as to which of a user's projects a given 
application should be charged.  Also, there can be headless users, which can be 
even more difficult to link to a project and can be shared between teams or 
projects.

Why Not Queues?

The purpose of queues is for scheduling.  Overloading the queues concept to 
also mean who should be "charged" for an application can have a detrimental 
effect on the primary purpose of queues.  It could be manageable in the case of 
a very small number of projects sharing a cluster, but doesn't scale to tens or 
hundreds of projects sharing a cluster.  If a given cluster is shared between 
50 projects, creating 50 separate queues will result in inefficient use of the 
cluster resources.  Furthermore, a given project may desire more than one queue 
for different types or priorities of applications.  

Proposed Solution

Rather than relying on external tools to infer through the user and/or queue 
who to "charge" for a given application, I propose a straightforward approach 
where that information be explicitly supplied when the application is 
submitted, just like we do with queues.  Let's use a charge card analogy: when 
you buy something online, you don't just say who you are and how to ship it, 
you also specify how you're paying for it.  Similarly, when submitting an 
application in YARN, you could explicitly specify to whom it's resource usage 
should be associated (a project, team, cost center, etc).

This new configuration parameter should default to being optional, so that 
organizations not interested in chargeback or project-level resource tracking 
can happily continue on as if it wasn't there.  However, it should be 
configurable at the cluster-level such that, a given cluster to could elect to 
make it required, so that all applications would have an associated project.  
The value of this new parameter should be exposed via the Resource Manager UI 
and Resource Manager REST API, so that users and tools can make use of it for 
chargeback, utilization metrics, etc.

I'm undecided on what to name the new parameter, as I like the flexibility in 
the ways it could be used.  It is essentially just an additional party other 
than user or queue that an application can be associated with, so its use is 
not just limited to a chargeback scenario.  For example, an organization not 
interested in chargeback could still use this parameter to communicate useful 
information about a application (e.g. pipelineX.stageN) and aggregate like 
applications.

Enforcement

Couldn't users just specify this information as a prefix for their job names?  
Yes, but the missing piece this could provides is enforcement.  Ideally, I'd 
like this parameter to work very much like how the queues work.  Like already 
exists with queues, it'd be ideal if a given user couldn't just specify any old 
value for this parameter.  It could be configurable such that a given user only 
has permission to submit applications for specific "projects".  Submitting an 
application with this parameter being anything other than what the given user 
is allowed, would cause the application to be rejected in the same manner as if 
the user has specified an invalid queue.

Again, so as to have no effect on organizations not interested in this feature, 
this enforcement should be off by default, but config

[jira] [Created] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-02-22 Thread Kendall Thrapp (JIRA)
Kendall Thrapp created YARN-415:
---

 Summary: Capture memory utilization at the app-level for chargeback
 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp


For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage.  To start out, I'd like to get 
the memory utilization of an application.  The unit should be MB-seconds or 
something similar and, from a chargeback perspective, the memory amount should 
be the memory reserved for the application, as even if the app didn't use all 
that memory, no one else was able to use it.

(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n * 
lifetime of container n)

It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).

This new metric should be available both through the RM UI and RM Web Services 
REST API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira