Re: [Ganglia-developers] How do we deal with very large clusters in the webui

2011-03-07 Thread Spike Spiegel
Hi, On Thu, Mar 3, 2011 at 11:11 PM, Jim Greene jim.gre...@gmail.com wrote: -Don't show any individual hosts, only the aggregate and the load/network/etc levels for the whole cluster we did this on the main page for grids by adding one line of php that excluded the bulk of our computing grid.

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-21 Thread Spike Spiegel
On Sun, Dec 20, 2009 at 7:35 PM, Vladimir Vuksan vl...@vuksan.com wrote: If you lose a day or two or even a week of trending data that is not gonna be disaster as long as that data is present somewhere else. sure, but where? how would the ganglia frontend tell? Thus I proposed a simple

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Spike Spiegel
On Mon, Dec 14, 2009 at 2:00 AM, Vladimir Vuksan vli...@veus.hr wrote: I think you guys are complicating much :-). Can't you simply have multiple gmetads in different sites poll a single gmond. That way if one gmetad fails data is still available and updated on the other gmetads. That is what

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Spike Spiegel
On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon care...@sajinet.com.pe wrote: a) you are only concerned with redundancy and not looking for scalability - when I say scalability, I refer to the idea of maybe 3 or more gmetads running in parallel collecting data from huge numbers of

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-13 Thread Spike Spiegel
On Fri, Dec 11, 2009 at 1:34 PM, Daniel Pocock dan...@pocock.com.au wrote: Thanks for sharing this - could you comment on the total number of RRDs per gmetad, and do you use rrdcached? the largest colo has 140175 rrds and we use the tmpfs + cron hack, no rrdcached. I was thinking about

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-05 Thread Spike Spiegel
On Wed, Nov 25, 2009 at 4:20 PM, Daniel Pocock dan...@pocock.com.au wrote: One problem I've been wondering about recently is the scalability of gmetad/rrdtool. [cut] In a particularly large organisation, moving around the RRD files as clusters grow could become quite a chore.  Is anyone

Re: [Ganglia-developers] Feeble attempt at gmond aliasing

2009-10-09 Thread Spike Spiegel
On Fri, Oct 2, 2009 at 9:59 PM, Jesse Becker haw...@gmail.com wrote: On Fri, Oct 2, 2009 at 10:35, Brad Nicholes bnicho...@novell.com wrote: How well does this fit into the previous discussions of using a GUID to identify a box rather than an IP or FQDN?  Are aliasing and GUID identifiers

Re: [Ganglia-developers] Feeble attempt at gmond aliasing

2009-10-09 Thread Spike Spiegel
On Fri, Oct 9, 2009 at 9:48 PM, Jesse Becker haw...@gmail.com wrote: The GUID discussion I refered to was if gmond/gmetad should be rewritten, top-to-bottom, to use GUIDs instead of relying on DNS/IP addresses.  My understanding is that everything would have use them, including the .rrd files

Re: [Ganglia-developers] Another interface for Ganglia stats

2009-09-26 Thread Spike Spiegel
On Tue, Sep 22, 2009 at 9:05 AM, Vladimir Vuksan vli...@veus.hr wrote: I guess a lot of the conversation depends on what you want and expect Ganglia to be used for. For example there are a lot of people out there that are using Ganglia for performance monitoring and using Nagios NRPE to get

Re: [Ganglia-developers] Fwd: [Ganglia-general] Another interface for Ganglia stats

2009-09-17 Thread Spike Spiegel
On Fri, Sep 18, 2009 at 8:32 AM, Bernard Li bern...@vanhpc.org wrote: Forwarding this to ganglia-developers since this is a more -devel related discussion.  Also can get spike's opinions in ;-) remember that you asked for it :P On Wed, Sep 16, 2009 at 11:49 AM, Vladimir Vuksan vli...@veus.hr

[Ganglia-developers] RRD_update illegal attempt to update using time 1252671437 when last update time is 1252671437 (minimum one second step)

2009-09-11 Thread Spike Spiegel
Hi, our gmetad boxes (2 of them) with 12 data sources, 6 of which are gmetad and 6 gmonds, are spamming syslog like mad with the following message: Sep 6 06:33:32 localhost.localdomain /usr/sbin/gmetad[2526]: RRD_update (/var/lib/ganglia/rrds/...metric.rrd): illegal attempt to update using time

[Ganglia-developers] gmetad spamming logs with unable to write root epilog

2009-09-11 Thread Spike Spiegel
Hi, recently we added better monitoring for our ganglia infrastructure and one of the checks for gmetad contacts it on port 8651, looks for some XML string and exits (receiving 20+ MBs of xml every time we run the check isn't an option). The 'exists' part means sending a RST before gmetad has

Re: [Ganglia-developers] metric loss and send channel failures in a multi-channel setup

2009-08-22 Thread Spike Spiegel
On Mon, Aug 17, 2009 at 7:56 PM, Spike Spiegelfsm...@gmail.com wrote: thanks for your input, I've given this a go and there's a patch attached to this email that I'd like to hear comments about. I've never used apr before, but based on the documentation [1] apr_array_push will allocate new

[Ganglia-developers] metric loss and send channel failures in a multi-channel setup

2009-08-17 Thread Spike Spiegel
Hi, we have a setup with 2 unicast channels and we recently ran across an issue where we lost a bunch of metrics submitted with gmetric due to a problem with dns that made one of the two channels unreachable. I traced this back to libgmond.c and Ganglia_udp_send_channels_create(...) where the

Re: [Ganglia-developers] Thoughts on host spoofing

2009-02-06 Thread Spike Spiegel
On Fri, Feb 6, 2009 at 2:52 PM, Rick Cobb rc...@quantcast.com wrote: My thought is that the fewer underlying services a monitoring system needs to work, the more likely it is to work. Absolutely, but dns itself is actually a good example of how introducing a dependency was necessary to make a

Re: [Ganglia-developers] gmond python module interface

2009-01-31 Thread Spike Spiegel
Hi, provided that I haven't had the time to look at this part of the code yet and that I agree it would be much nicer to have a gmetric-like behavior, On Sun, Feb 1, 2009 at 12:21 AM, David Stainton dstainton...@gmail.com wrote: I like using gmetric to monitor... so I wrote gmetric-daemon

Re: [Ganglia-developers] gmetad protocol and propagating errors back to the client

2009-01-23 Thread Spike Spiegel
On Thu, Jan 22, 2009 at 6:55 PM, Carlo Marcelo Arenas Belon care...@sajinet.com.pe wrote: the interactive port was designed to mimic the behaviour from the original gmetad port which always returns the whole tree. why's that? if I wanted the whole tree I'd query the non interactive port,

Re: [Ganglia-developers] CVE

2009-01-23 Thread Spike Spiegel
On Fri, Jan 23, 2009 at 11:52 PM, Brad Nicholes bnicho...@novell.com wrote: * http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-0242 Ganglia 3.1.1 allows remote attackers to cause a denial of service via a request to the gmetad service with a path does not exist, which causes Ganglia

Re: [Ganglia-developers] Possible REST interface to the interactiveport?

2009-01-21 Thread Spike Spiegel
On Wed, Jan 21, 2009 at 2:52 AM, Brad Nicholes bnicho...@novell.com wrote: Yep, I was also thinking that a RESTful output module for gmetad-python would probably be the easiest solution I haven't used gmetad-python yet so one concern would be performances and how it'd behave having to

[Ganglia-developers] gmetad protocol and propagating errors back to the client

2009-01-21 Thread Spike Spiegel
Hi, right now when gmetad fails an error is logged and in some cases the connection to the client interrupted returning invalid XML or in other cases (item not found or broken request) the entire tree is returned. This imho is bad behavior and code should be added to inform the client of the

Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-18 Thread Spike Spiegel
On Sun, Jan 18, 2009 at 7:35 PM, Carlo Marcelo Arenas Belon care...@sajinet.com.pe wrote: other than that looks good to me. could you check the simplified one?, this problem was introduced in 2003 and therefore affects all versions of ganglia since then (including 2.5.7 which is not supported

Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-18 Thread Spike Spiegel
On Mon, Jan 19, 2009 at 5:44 AM, Carlo Marcelo Arenas Belon care...@sajinet.com.pe wrote: agree, but that is to be done in the context of getting multi-patch committed and backported, but not in fixing this buffer overflow in the interactive port, which is what BUG223 is about. ok, guess I'll

Re: [Ganglia-developers] Possible REST interface to the interactive port?

2009-01-17 Thread Spike Spiegel
Hi, On Sat, Jan 17, 2009 at 5:04 AM, john allspaw jalls...@yahoo.com wrote: Hey all - Wondering if there's ever been any talk about serving up the interactive port info via REST? I am kinda working on this already although not in the form of a ganglia patch, but as an external application

Re: [Ganglia-developers] patches for: [Sec] Gmetad server BoF andnetwork overload + [Feature] multiple requests per conn oninteractive port

2009-01-15 Thread Spike Spiegel
On Fri, Jan 16, 2009 at 7:04 AM, Kostas Georgiou k.georg...@imperial.ac.uk wrote: On Thu, Jan 15, 2009 at 01:41:53PM -0700, Brad Nicholes wrote: On 1/15/2009 at 8:56 AM, in message 496efa2a02ac0003a...@lucius.provo.novell.com, Brad Nicholes bnicho...@novell.com wrote: After taking a