We (HubSpot) currently have a cron job that enumerates all tasks running on
the slaves and pushes resource usage data into OpenTSDB. We then use lead.js
<http://lead.github.io/> to query / visualize this data. The cron job isn't
open source, but I could look into releasing it if anyone is interested.
I've also thought about adding this functionality into our Singularity
<https://github.com/hubspot/singularity> framework, but if it was directly
supported by the mesos master (pumping task resource usage into graphite /
OpenTSDB), that'd be pretty cool.

-Tom

On Thu, Dec 18, 2014 at 3:25 PM, Andrew Ortman <
[email protected]> wrote:
>
>  I imagine we will also be running into the same need. Our plan right now
> was to write an quick service that polls the API the dashboard uses to
> retrieve metric information for each slave and then pipe that data directly
> to something like graphite for logging. I haven’t looked too much into this
> yet though
>
>
>  On Dec 18, 2014, at 2:05 PM, Niklas Nielsen <[email protected]> wrote:
>
>  Hi Steven,
>
>  Alex Rukletsov and I worked on this as a proof-of-concept piece in the
> mesos-master last week, providing the same kind of graphs as you describe
> in the dashboard.
> We have a good idea about how to implement this now and we can start a
> discussion on JIRA on how to proceed (I can create it shortly).
> My first thought is that this should be pluggable; having something
> similar to "status update decorators"
>  Alongside hanging key-value pairs of the status update, you can keep
> track of the life-time/size of tasks and do the resource math.
>
>  There are some interesting problems to solve when it gets to master
> fail-over, but let's try to enumerate those in the ticket.
>
>  Thanks,
> Niklas
>
> On Thu, Dec 18, 2014 at 11:56 AM, Steven Schlansker <
> [email protected]> wrote:
>>
>> I am running a corporate Mesos cluster, shared by a number of teams and
>> projects.
>> We are looking to get some insight into our usage of precious computing
>> resources.  For example, I'd like to be able to present a report breaking
>> down CPU-hour and RAM GB-hour utilization by service, team, or other
>> relevant grouping.
>>
>> How I'd imagine this works:
>>
>> * Collect Mesos statistics per task (allocated CPU, CPU utilization,
>> allocated memory, memory utilization, disk utilization) periodically (say,
>> once a minute)
>> * Collect task metadata from a pluggable source (mapping from Mesos task
>> to service name, team name, any other metadata you wish to use to group
>> tasks)
>> * Generate dashboard / reports by aggregating task data over axes
>> provided by metadata input
>>
>> Has anyone started on such a project?
>>
>> Thanks,
>> Steven
>>
>>
>
>  --
> Niklas
>
>
>

Reply via email to