[
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945933#comment-14945933
]
Ruslan Dautkhanov commented on YARN-462:
----------------------------------------
It's probably related to https://issues.apache.org/jira/browse/YARN-415
> Project Parameter for Chargeback
> --------------------------------
>
> Key: YARN-462
> URL: https://issues.apache.org/jira/browse/YARN-462
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: resourcemanager
> Affects Versions: 0.23.6
> Reporter: Kendall Thrapp
>
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need
> to be able to associate applications with "projects", e.g. "pipeline X",
> "property Y". This would allow us to aggregate on this property, thereby
> helping us compute grid resource usage for the entire "project". Currently,
> for a given application, two things we know about it are the user that
> submitted it and the queue it was submitted to. Below, I'll explain why
> neither of these is adequate for enterprise-level chargeback and
> understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects. When one
> of our real users submits an application on a Hadoop grid, they're presumably
> not usually doing it for themselves. They're doing work for some project or
> team effort, so its that team or project that should be "charged" for all its
> users applications. Maintaining outside lists of associations between users
> and projects is error-prone because it is time-sensitive and requires
> continued ongoing maintenance. New users join organizations, users leave and
> users even change projects. Furthermore, users may split their time between
> multiple projects, making it ambiguous as to which of a user's projects a
> given application should be charged. Also, there can be headless users,
> which can be even more difficult to link to a project and can be shared
> between teams or projects.
> Why Not Queues?
> The purpose of queues is for scheduling. Overloading the queues concept to
> also mean who should be "charged" for an application can have a detrimental
> effect on the primary purpose of queues. It could be manageable in the case
> of a very small number of projects sharing a cluster, but doesn't scale to
> tens or hundreds of projects sharing a cluster. If a given cluster is shared
> between 50 projects, creating 50 separate queues will result in inefficient
> use of the cluster resources. Furthermore, a given project may desire more
> than one queue for different types or priorities of applications.
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue
> who to "charge" for a given application, I propose a straightforward approach
> where that information be explicitly supplied when the application is
> submitted, just like we do with queues. Let's use a charge card analogy:
> when you buy something online, you don't just say who you are and how to ship
> it, you also specify how you're paying for it. Similarly, when submitting an
> application in YARN, you could explicitly specify to whom it's resource usage
> should be associated (a project, team, cost center, etc).
> This new configuration parameter should default to being optional, so that
> organizations not interested in chargeback or project-level resource tracking
> can happily continue on as if it wasn't there. However, it should be
> configurable at the cluster-level such that, a given cluster to could elect
> to make it required, so that all applications would have an associated
> project. The value of this new parameter should be exposed via the Resource
> Manager UI and Resource Manager REST API, so that users and tools can make
> use of it for chargeback, utilization metrics, etc.
> I'm undecided on what to name the new parameter, as I like the flexibility in
> the ways it could be used. It is essentially just an additional party other
> than user or queue that an application can be associated with, so its use is
> not just limited to a chargeback scenario. For example, an organization not
> interested in chargeback could still use this parameter to communicate useful
> information about a application (e.g. pipelineX.stageN) and aggregate like
> applications.
> Enforcement
> Couldn't users just specify this information as a prefix for their job names?
> Yes, but the missing piece this could provides is enforcement. Ideally, I'd
> like this parameter to work very much like how the queues work. Like already
> exists with queues, it'd be ideal if a given user couldn't just specify any
> old value for this parameter. It could be configurable such that a given
> user only has permission to submit applications for specific "projects".
> Submitting an application with this parameter being anything other than what
> the given user is allowed, would cause the application to be rejected in the
> same manner as if the user has specified an invalid queue.
> Again, so as to have no effect on organizations not interested in this
> feature, this enforcement should be off by default, but configurable at the
> cluster level such that it could be turned on for clusters wanting to use it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)