Hi,
this is happening in two completely different (but with the same deployment
method) Ganglia headnodes.
I'm monitoring about 500 VM's (on each headnode), separated by clusters of
different sizes. From time to time, the summary graphs over some cluster stop
reporting, showing zero activity,
Error 1 sending messages are a red
herring.
If you are seeing gaps it's most likely that storage system is not
keeping up. What version of ganglia are you using and are you
using rrdcached ?
Vladimir
On 05/19/2014 10:20 AM,
Hi,
I am using Ganglia Web Frontend version 3.5.12 and Ganglia Web Backend (gmetad)
version 3.6.0. The Gmond version on the nodes is not consistent, since they are
being set by different users, on different environments. But I believe their
version is not below 3.1.7.
No, I am not using
I would definitely consider rrdcached
backed by some SSDs. That is what I use.
3.7.0 which is in testing has some additional performance
enhancements but I think your issue really is I/O.
Vladimir
On 05/19/2014 10:46 AM, Cristovao Jose
4 matches
Mail list logo