Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-21 Thread Chris Burroughs
Thanks very much Nicholas. Your reply was very helpful and we are going to try out your settings changes and patches. On 09/17/2012 09:03 AM, Nicholas Satterly wrote: Hi Chris, I've discovered there are two contributing factors to problems like this. 1. the number of metrics being sent

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-19 Thread Nicholas Satterly
Hi Peter, Thanks for the feedback. I've added a thread mutex to the hosts hash table as you suggested and will send a pull request in the next day or so. Regards, Nick On Mon, Sep 17, 2012 at 8:25 PM, Peter Phaal peter.ph...@gmail.com wrote: Nicholas, It makes sense to multi-thread gmond,

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-19 Thread Peter Phaal
Nick, I think you probably need two mutexes if you want to avoid blocking the UDP thread unnecessarily. 1. a mutex on the hastable that must be grabbed by the TCP thread when it walks the hash table and the UDP thread would grab it any time it adds or removes an entry from the hash table. 2. a

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-19 Thread Nicholas Satterly
Hi Peter, I've submitted another pull request covering a mutex for the hostdata hash table. Thanks again for your guidance. Regards, Nick On Wed, Sep 19, 2012 at 5:53 PM, Peter Phaal peter.ph...@gmail.com wrote: Nick, I think you probably need two mutexes if you want to avoid blocking the

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-19 Thread Neil Mckee
in gmond.c:process_tc_accept_channel() could those goto statements close the socket and return without relinquishing the mutex? Neil On Sep 19, 2012, at 8:45 AM, Nicholas Satterly wrote: Hi Peter, Thanks for the feedback. I've added a thread mutex to the hosts hash table as you

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-17 Thread Nicholas Satterly
Hi Chris, I've discovered there are two contributing factors to problems like this. 1. the number of metrics being sent (possibly in short bursts) can overflow the UDP receive buffer. 2. the time it takes to process metrics in the UDP receive buffer causes TCP connections from the gmetad's to

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-17 Thread Peter Phaal
Nicholas, It makes sense to multi-thread gmond, but looking at your patch, I don't see any locking associated with the hosts hashtable. Isn't there a possible race if new hosts/metrics are added to the hashtable by the UDP thread at the same time the hashtable is being walked by the TCP thread?

[Ganglia-general] Impact of gmond polling on data collection

2012-09-14 Thread Chris Burroughs
We use ganglia to monitor 500 hosts in multiple datacenters with about 90k unique host:metric pairs per DC. We use this data for all of the cool graphs in the web UI and for passive alerting. One of our checks is to measure TN of load_one on every box (we want to make sure gmond is working and