Alright. You've convinced me I should put this online, but I'll have to clean it up first. I would have originally but had one weird question.
In my company, we almost certainly will have multiple potentially large Mesos clusters as different failure domains. The issue I've got is relatively simple, the solution is not. The collectd daemon runs on every single Mesos node (master or slave) in the firm. If it is running on a master, it will get data from localhost:$master_port/metrics/snapshot. It then checked if json_data["master/elected"] == 1 and ONLY reports the data if it is the elected master. By default it uses a namespace like: mesos.master.* If you have multiple clusters this obviously won't work, so I added an optional cluster name attribute to the collectd config. That way it will report to: mesos.archive.* where your mesos cluster name is "archive". How would you propose a decent way to namespace master stats that are cluster wide. I had to lie (in the collectd plugin code) and change the host as a metric such as: collectd.$HOSTNAME.master.* seems to make absolutely no sense when there is a single elected master per cluster. On Tuesday, March 10, 2015, Steven Schlansker <sschlans...@opentable.com> wrote: > We would use (and probably contribute back to!) such an improved plugin as > well, if you do polish it up be sure to announce to this list :) > > On Mar 10, 2015, at 2:05 PM, Dan Dong <dongda...@gmail.com <javascript:;>> > wrote: > > > Hi, Jeff, > > Thanks, is your plugin working together with collectd? It would be > great to publish it! > > > > A general question for Mesos: Is there a method to monitor > CPU/Memory/Disk usages of jobs from different frameworks(e.g: > Hadoop/Mapreduce, Spark etc)? (Not necessarily to generate figures, text > format numbers are quite enough.) > > So e.g for a hadoop job, when it's finished, we can collect the general > metrics of it? Ideally although there are many jobs from different > frameworks running at the same time on mesos, > > we still could get their metrics respectively. > > > > Cheers, > > Dan > > > > 2015-03-10 15:46 GMT-05:00 Jeff Schroeder <jeffschroe...@computer.org > <javascript:;>>: > > I installed it and played with it for a bit but was somewhat > underwhelmed with it. It doesn't support slaves and all of the hardcoding > with duplication isn't my favorite. I ended up writing a single plugin to > support both masters and slaves and putting it on every node in my Mesos > cluster. > > > > Would it be worth polishing up a bit and throwing on github? > > > > > > On Tuesday, March 10, 2015, Dan Dong <dongda...@gmail.com <javascript:;>> > wrote: > > Hi, All, > > Does anybody use this mesos-collectd-plugin: > > https://github.com/rayrod2030/collectd-mesos > > > > I have installed collectd and this plugin, then configured it as > instructions and restarted the collectd daemon, why seems nothing happens > on the mesos:5050 web UI( python plugin has been turned on in > collectd.conf). > > > > My question is: > > 1. Should I install collectd and this mesos-collectd-plugin on each > master and slave nodes and restart collectd daemon? (This is what I have > done.) > > 2. Should the config file mesos-master.conf only configured on master > node and > > mesos-slave.conf only configured on slave node?(This is what I have > done.) > > Or both of them should only appear on master node? > > 3. Is there an example( or a figure) of what output one is expected to > see by this plugin? > > > > Cheers, > > Dan > > > > > > > > -- > > Text by Jeff, typos by iPhone > > > > -- Text by Jeff, typos by iPhone