There is also Prometheus.  It tends to integrate with the Mesos-ecosystem in 
various ways:


* There are exporters that can be used to scrape Mesos metrics 
(https://github.com/prometheus/mesos_exporter or  
https://github.com/mesosphere/mesos_exporter)

* And exporters to scrape metrics of specific framworks (such as Aurora 
https://github.com/tommyulfsparre/aurora_exporter)<https://github.com/tommyulfsparre/aurora_exporter>

* Prometheus also integrates with the service-discovery mechanism of Marathon, 
Kubernetes and Aurora. This makes it rather easy to also scrape custom metrics 
of the individual services running on your cluster 
(http://prometheus.io/docs/operating/configuration/#marathon-sd-configurations-marathon_sd_configs)



________________________________
From: Tom Arnfeld <[email protected]>
Sent: Tuesday, January 19, 2016 10:27 PM
To: [email protected]
Subject: Re: Monitoring

We're using collectd (https://collectd.org/) to send system metrics to 
Graphite, and also using the https://github.com/rayrod2030/collectd-mesos 
collectd plugin to pull stats directly from the Apache Mesos stats endpoint.

This works pretty well for us, and seems kind-of similar to the Diamond 
approach (TIL Diamond, will have to look into that).

On 19 January 2016 at 21:18, Joe Smith 
<[email protected]<mailto:[email protected]>> wrote:
TellApart also has a rather active fork of Diamond (they're working to merge it 
back upstream ~soonish) that you can take a look at 
https://github.com/tellapart/Diamond. They use it to monitor both Apache Mesos 
and Apache Aurora.

Twitter has an internal monitoring system, and we have an agent which is 
installed via RPM/puppet on each host that scrapes the metrics pages and pushes 
data to our time series database. If you wanted to setup an agent through 
Aurora itself, you'd need support to have one-task per 
machine<https://issues.apache.org/jira/browse/AURORA-1075> (which would be 
cool, but could lead to a circular dependency since Aurora or Mesos could go 
down and not launch your monitoring agents).

I'd likely recommend using the same system you use for deploying Mesos as that 
for getting your monitoring agents onto your hosts.

On Tue, Jan 19, 2016 at 12:17 PM, Tomek Janiszewski 
<[email protected]<mailto:[email protected]>> wrote:
Hi

In our setup we are using Diamond with default system collectors and one custom 
collector (based on https://github.com/python-diamond/Diamond/pull/106 but with 
some improvements). Some other solutions were presented at MesosCon:
https://www.youtube.com/watch?v=yLkc17HFEb8
https://www.youtube.com/watch?v=zlgAT_xFNzU

Tomek

wt., 19.01.2016 o 21:04 uzytkownik Michal Lowicki 
<[email protected]<mailto:[email protected]>> napisal:
Hi,

I've read Mesos Observability 
Metrics<http://mesos.apache.org/documentation/latest/monitoring/> which gives 
nice overview of cluster's health. What about other parameters like I/O usage 
(disk, network), number of processes etc. Maybe there are some tools or their 
configurations dedicated for Mesos? (we're mostly using Diamond and StatsD 
which reports to Graphite). How to launch such tools - separately from Mesos or 
launch as a part of long-running tasks?

--
BR,
Michal Lowicki


Reply via email to