There is also Prometheus. It tends to integrate with the Mesos-ecosystem in various ways:
* There are exporters that can be used to scrape Mesos metrics (https://github.com/prometheus/mesos_exporter or https://github.com/mesosphere/mesos_exporter) * And exporters to scrape metrics of specific framworks (such as Aurora https://github.com/tommyulfsparre/aurora_exporter)<https://github.com/tommyulfsparre/aurora_exporter> * Prometheus also integrates with the service-discovery mechanism of Marathon, Kubernetes and Aurora. This makes it rather easy to also scrape custom metrics of the individual services running on your cluster (http://prometheus.io/docs/operating/configuration/#marathon-sd-configurations-marathon_sd_configs) ________________________________ From: Tom Arnfeld <[email protected]> Sent: Tuesday, January 19, 2016 10:27 PM To: [email protected] Subject: Re: Monitoring We're using collectd (https://collectd.org/) to send system metrics to Graphite, and also using the https://github.com/rayrod2030/collectd-mesos collectd plugin to pull stats directly from the Apache Mesos stats endpoint. This works pretty well for us, and seems kind-of similar to the Diamond approach (TIL Diamond, will have to look into that). On 19 January 2016 at 21:18, Joe Smith <[email protected]<mailto:[email protected]>> wrote: TellApart also has a rather active fork of Diamond (they're working to merge it back upstream ~soonish) that you can take a look at https://github.com/tellapart/Diamond. They use it to monitor both Apache Mesos and Apache Aurora. Twitter has an internal monitoring system, and we have an agent which is installed via RPM/puppet on each host that scrapes the metrics pages and pushes data to our time series database. If you wanted to setup an agent through Aurora itself, you'd need support to have one-task per machine<https://issues.apache.org/jira/browse/AURORA-1075> (which would be cool, but could lead to a circular dependency since Aurora or Mesos could go down and not launch your monitoring agents). I'd likely recommend using the same system you use for deploying Mesos as that for getting your monitoring agents onto your hosts. On Tue, Jan 19, 2016 at 12:17 PM, Tomek Janiszewski <[email protected]<mailto:[email protected]>> wrote: Hi In our setup we are using Diamond with default system collectors and one custom collector (based on https://github.com/python-diamond/Diamond/pull/106 but with some improvements). Some other solutions were presented at MesosCon: https://www.youtube.com/watch?v=yLkc17HFEb8 https://www.youtube.com/watch?v=zlgAT_xFNzU Tomek wt., 19.01.2016 o 21:04 uzytkownik Michal Lowicki <[email protected]<mailto:[email protected]>> napisal: Hi, I've read Mesos Observability Metrics<http://mesos.apache.org/documentation/latest/monitoring/> which gives nice overview of cluster's health. What about other parameters like I/O usage (disk, network), number of processes etc. Maybe there are some tools or their configurations dedicated for Mesos? (we're mostly using Diamond and StatsD which reports to Graphite). How to launch such tools - separately from Mesos or launch as a part of long-running tasks? -- BR, Michal Lowicki

