We (HubSpot) currently have a cron job that enumerates all tasks running on the slaves and pushes resource usage data into OpenTSDB. We then use lead.js <http://lead.github.io/> to query / visualize this data. The cron job isn't open source, but I could look into releasing it if anyone is interested. I've also thought about adding this functionality into our Singularity <https://github.com/hubspot/singularity> framework, but if it was directly supported by the mesos master (pumping task resource usage into graphite / OpenTSDB), that'd be pretty cool.
-Tom On Thu, Dec 18, 2014 at 3:25 PM, Andrew Ortman < [email protected]> wrote: > > I imagine we will also be running into the same need. Our plan right now > was to write an quick service that polls the API the dashboard uses to > retrieve metric information for each slave and then pipe that data directly > to something like graphite for logging. I haven’t looked too much into this > yet though > > > On Dec 18, 2014, at 2:05 PM, Niklas Nielsen <[email protected]> wrote: > > Hi Steven, > > Alex Rukletsov and I worked on this as a proof-of-concept piece in the > mesos-master last week, providing the same kind of graphs as you describe > in the dashboard. > We have a good idea about how to implement this now and we can start a > discussion on JIRA on how to proceed (I can create it shortly). > My first thought is that this should be pluggable; having something > similar to "status update decorators" > Alongside hanging key-value pairs of the status update, you can keep > track of the life-time/size of tasks and do the resource math. > > There are some interesting problems to solve when it gets to master > fail-over, but let's try to enumerate those in the ticket. > > Thanks, > Niklas > > On Thu, Dec 18, 2014 at 11:56 AM, Steven Schlansker < > [email protected]> wrote: >> >> I am running a corporate Mesos cluster, shared by a number of teams and >> projects. >> We are looking to get some insight into our usage of precious computing >> resources. For example, I'd like to be able to present a report breaking >> down CPU-hour and RAM GB-hour utilization by service, team, or other >> relevant grouping. >> >> How I'd imagine this works: >> >> * Collect Mesos statistics per task (allocated CPU, CPU utilization, >> allocated memory, memory utilization, disk utilization) periodically (say, >> once a minute) >> * Collect task metadata from a pluggable source (mapping from Mesos task >> to service name, team name, any other metadata you wish to use to group >> tasks) >> * Generate dashboard / reports by aggregating task data over axes >> provided by metadata input >> >> Has anyone started on such a project? >> >> Thanks, >> Steven >> >> > > -- > Niklas > > >

