There are 2 types of data that can be collected. graphite type, (aka: KEY, VALUE, Timestamp) than can be nicely put into a dashboard and as Steven mentioned, collecting metadata (aka TIMESTAMP, framework id, task id, executor, resources, slaveid, etc) will help out in organizing resource consumption, by app, by team or departament. More on the organizational view of the apps that run in your cluster.
Looks like a nice holiday project ;) -Pablo On Thu, Dec 18, 2014 at 12:48 PM, Thomas Petr <[email protected]> wrote: > > We (HubSpot) currently have a cron job that enumerates all tasks running > on the slaves and pushes resource usage data into OpenTSDB. We then use > lead.js <http://lead.github.io/> to query / visualize this data. The cron > job isn't open source, but I could look into releasing it if anyone is > interested. I've also thought about adding this functionality into our > Singularity <https://github.com/hubspot/singularity> framework, but if it > was directly supported by the mesos master (pumping task resource usage > into graphite / OpenTSDB), that'd be pretty cool. > > -Tom > > On Thu, Dec 18, 2014 at 3:25 PM, Andrew Ortman < > [email protected]> wrote: >> >> I imagine we will also be running into the same need. Our plan right >> now was to write an quick service that polls the API the dashboard uses to >> retrieve metric information for each slave and then pipe that data directly >> to something like graphite for logging. I haven’t looked too much into this >> yet though >> >> >> On Dec 18, 2014, at 2:05 PM, Niklas Nielsen <[email protected]> wrote: >> >> Hi Steven, >> >> Alex Rukletsov and I worked on this as a proof-of-concept piece in the >> mesos-master last week, providing the same kind of graphs as you describe >> in the dashboard. >> We have a good idea about how to implement this now and we can start a >> discussion on JIRA on how to proceed (I can create it shortly). >> My first thought is that this should be pluggable; having something >> similar to "status update decorators" >> Alongside hanging key-value pairs of the status update, you can keep >> track of the life-time/size of tasks and do the resource math. >> >> There are some interesting problems to solve when it gets to master >> fail-over, but let's try to enumerate those in the ticket. >> >> Thanks, >> Niklas >> >> On Thu, Dec 18, 2014 at 11:56 AM, Steven Schlansker < >> [email protected]> wrote: >>> >>> I am running a corporate Mesos cluster, shared by a number of teams and >>> projects. >>> We are looking to get some insight into our usage of precious computing >>> resources. For example, I'd like to be able to present a report breaking >>> down CPU-hour and RAM GB-hour utilization by service, team, or other >>> relevant grouping. >>> >>> How I'd imagine this works: >>> >>> * Collect Mesos statistics per task (allocated CPU, CPU utilization, >>> allocated memory, memory utilization, disk utilization) periodically (say, >>> once a minute) >>> * Collect task metadata from a pluggable source (mapping from Mesos task >>> to service name, team name, any other metadata you wish to use to group >>> tasks) >>> * Generate dashboard / reports by aggregating task data over axes >>> provided by metadata input >>> >>> Has anyone started on such a project? >>> >>> Thanks, >>> Steven >>> >>> >> >> -- >> Niklas >> >> >>

