We're using collectd (https://collectd.org/) to send system metrics to Graphite, and also using the https://github.com/rayrod2030/collectd-mesos collectd plugin to pull stats directly from the Apache Mesos stats endpoint.
This works pretty well for us, and seems kind-of similar to the Diamond approach (TIL Diamond, will have to look into that). On 19 January 2016 at 21:18, Joe Smith <yasumo...@gmail.com> wrote: > TellApart also has a rather active fork of Diamond (they're working to > merge it back upstream ~soonish) that you can take a look at > https://github.com/tellapart/Diamond. They use it to monitor both Apache > Mesos and Apache Aurora. > > Twitter has an internal monitoring system, and we have an agent which is > installed via RPM/puppet on each host that scrapes the metrics pages and > pushes data to our time series database. If you wanted to setup an agent > through Aurora itself, you'd need support to have one-task per machine > <https://issues.apache.org/jira/browse/AURORA-1075> (which would be cool, > but could lead to a circular dependency since Aurora or Mesos could go down > and not launch your monitoring agents). > > I'd likely recommend using the same system you use for deploying Mesos as > that for getting your monitoring agents onto your hosts. > > On Tue, Jan 19, 2016 at 12:17 PM, Tomek Janiszewski <jani...@gmail.com> > wrote: > >> Hi >> >> In our setup we are using Diamond with default system collectors and one >> custom collector (based on >> https://github.com/python-diamond/Diamond/pull/106 but with some >> improvements). Some other solutions were presented at MesosCon: >> https://www.youtube.com/watch?v=yLkc17HFEb8 >> https://www.youtube.com/watch?v=zlgAT_xFNzU >> >> Tomek >> >> wt., 19.01.2016 o 21:04 użytkownik Michał Łowicki <mlowi...@gmail.com> >> napisał: >> >>> Hi, >>> >>> I've read Mesos Observability Metrics >>> <http://mesos.apache.org/documentation/latest/monitoring/> which gives >>> nice overview of cluster's health. What about other parameters like I/O >>> usage (disk, network), number of processes etc. Maybe there are some tools >>> or their configurations dedicated for Mesos? (we're mostly using Diamond >>> and StatsD which reports to Graphite). How to launch such tools - >>> separately from Mesos or launch as a part of long-running tasks? >>> >>> -- >>> BR, >>> Michał Łowicki >>> >> >