[Ganglia-developers] Change in default RRAs
Hi friends, just found out by chance that the default RRAs for gmetad have changed some time ago? What was the rationale for this? This is an almost 59x increase in database size. OK, disk is cheap, but still a factor, especially for large clusters. Just curious Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de-- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gemtric4j and spoofing
Hi Daniel, so, what is the story. If I look at the code in ./gmetric4j/src/main/java/ganglia/gmetric/Protocolv31x.java, I find: public void announce( String name, String value, GMetricType type, String units, GMetricSlope slope, int tmax, int dmax, String groupName) throws Exception { Ganglia_metric_id metric_id = new Ganglia_metric_id(); metric_id.host = InetAddress.getLocalHost().getHostName(); metric_id.name = name; metric_id.spoof = false; if ( isTimeToSendMetadata( name ) ) { encodeGMetric( metric_id, name, value, type, units, slope, tmax, dmax, groupName ); send(xdr.getXdrData(),xdr.getXdrLength()); } encodeGValue( metric_id, value ); send(xdr.getXdrData(),xdr.getXdrLength()); } which seems to indicate that spoofing is off by default for the V3.1 protocol ?!? Thanks Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Martin Knoblauch kn...@knobisoft.de To: dan...@pocock.com.au dan...@pocock.com.au Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Friday, September 14, 2012 5:33 PM Subject: gemtric4j and spoofing Hi Daniel, seems you are the master of gmetric4j. Short question: does it support spoofing? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] gemtric4j and spoofing
Hi Daniel, seems you are the master of gmetric4j. Short question: does it support spoofing? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de-- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] 3.3.5 tagged
From: Bernard Li bern...@vanhpc.org To: Daniel Pocock dan...@pocock.com.au Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net; Carlo Marcelo Arenas Belon care...@sajinet.com.pe Sent: Tuesday, March 27, 2012 10:24 PM Subject: Re: [Ganglia-developers] 3.3.5 tagged I thought the original idea was that the web component was going to be a separate entity and thus can be released at different cycles from the other components. If we are again releasing web at the same time as ganglia-core then this is back to how things were originally when the code is in SVN. Just my $0.02. Cheers, Bernard Add my coins to that. Wanted to write the same. Just lets be progressive and split out the WEB part from the data collection part and let them move at their own pace. That cross project link in the repo is most confusing anyway. Cheers Martin On Tuesday, March 27, 2012, Daniel Pocock dan...@pocock.com.au wrote: On 27/03/2012 16:52, Carlo Marcelo Arenas Belon wrote: On Mon, Mar 26, 2012 at 04:50:18PM +0100, Daniel Pocock wrote: Release 3.3.5 The release has now been tagged in git commit = 9db9beea062c7ce5e5b4d10ed553c9b7cea7642e wrong bundle : carenas@dell ~/src/git/ganglia $ git describe --tags 3.3.5 carenas@dell ~/src/git/ganglia $ cd web/ carenas@dell ~/src/git/ganglia/web $ git describe --tags 3.3.2-3 while web has since had a lot more fixes added as shown by : carenas@dell ~/src/git/ganglia-web $ git describe --tags 3.3.4-14-g7383ed8 carenas@dell ~/src/git/ganglia-web $ git diff --stat 3.3.2-3.. | cat Makefile | 2 +- api/host.php | 9 ++--- cluster_view.php | 4 ++-- functions.php | 15 +++ graph.php | 5 +++-- header.php | 1 + inspect_graph.php | 4 ++-- templates/default/views_view.tpl | 16 8 files changed, 42 insertions(+), 14 deletions(-) Does this result in any actual breakage: does 3.3.5 break anything that was working in 3.3.1 or 3.3.0? If the answer to that question is `no', then we ignore this issue and 3.3.5 continues to be the release candidate. Are these all fixes that belong in the 3.3.x release, or are some of them features that belong in 3.4.x? I am not automatically increasing the pointer to the web submodule because I think that only crucial things should be accepted in 3.3.x releases from now on - the alternative is that a) we freeze the web repo against all non-essential commits until the release is finally finished b) I update the pointer to the web repo on every 3.3.x release attempt -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia-3.3.1: How to get back the old web interface ??
- Original Message - From: Jeff Buchbinder rufustfire...@gmail.com To: Bernard Li bern...@vanhpc.org Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net; Martin Knoblauch kn...@knobisoft.de Sent: Thursday, March 1, 2012 5:22 PM Subject: Re: [Ganglia-developers] Ganglia-3.3.1: How to get back the old web interface ?? On Thu, Mar 1, 2012 at 1:36 AM, Bernard Li bern...@vanhpc.org wrote: Hi Vladimir: On Wed, Feb 29, 2012 at 7:03 AM, Vladimir Vuksan vli...@veus.hr wrote: If you'd like to rework the templates to reinstate the old behavior ie. call it legacy templates that would be fine. Hmmm... is it a simple template change or is it more involved? I thought a whole bunch of the PHP files got changed so would it be possible to have the old-style and new style GUI co-exist in the same code tree? Besides templating and caching, the look and feel of the older could be accomplished through templating, but they probably couldn't exist in the same directory. That being said, nothing stops you from dropping a copy of the old UI code elsewhere, since it uses the same gmetad data source. I have run the old Ganglia web interface at the same time as the new one -- they just have to be in different directories. Jeff That is simple. But IMHO we should keep the old code in the repository and maybe even build RPMs for it. What speaks against putting the code back as legacy-web?. But frankly, I would have preferred that the new code was stored as gweb-2 and the old code kept as web. Or was there a real pressing reason for the reorganization. Thanks Martin -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Ganglia gmond memory leak?
Hi, after running my plain gmond for about 48 hours under valgrind control I get the following summary: ==5428== LEAK SUMMARY: ==5428== definitely lost: 24,804 bytes in 13 blocks ==5428== indirectly lost: 20 bytes in 4 blocks ==5428== possibly lost: 352,725 bytes in 141 blocks ==5428== still reachable: 1,342,402 bytes in 2,189 blocks ==5428== suppressed: 0 bytes in 0 blocks ==5428== Reachable blocks (those to which a pointer was found) are not shown. ==5428== To see them, rerun with: --leak-check=full --show-reachable=yes So we may have lost about 400 KB plus 1.3 MB of stuff that we [likely] just fail to free on exit. Not to bad. ==5428== ==5428== For counts of detected and suppressed errors, rerun with: -v ==5428== Use --track-origins=yes to see where uninitialised values come from ==5428== ERROR SUMMARY: 97221 errors from 203 contexts (suppressed: 68 from 5) Those errors are kind of interesting. Most of it just comes out of Python with no traces of ganglia code. Not sure how to track/document that in a useful way. I will rerun the valgrind tracing with the suggested options on a 3.3.1 gmond for some time. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Martin Knoblauch kn...@knobisoft.de To: Aidan Wong aidanw...@attinteractive.com; Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; ganglia-general ganglia-gene...@lists.sourceforge.net Sent: Monday, February 27, 2012 8:21 PM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? Hi Aidan, for what it is worth, I cannot reproduce the growing memory consumption on a small 3.2.0 grid using only standard metrics in unicast mode. Running now for a few hours. Will check again tomorrow. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Aidan Wong aidanw...@attinteractive.com To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; ganglia-general ganglia-gene...@lists.sourceforge.net Sent: Thursday, February 23, 2012 8:34 AM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? I've restarted the gmond process and memory usage drops until gmond hogs memory over time. Any Ganglia contributors who may want to chime in on this memory leak issue? I'm on Ganglia 3.2.0. Are there any improvements on version 3.3.1 addressing this issue? Thanks From: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com Date: Wed, 22 Feb 2012 16:31:58 -0600 To: Aidan Wong aidanw...@attinteractive.com, ganglia-general ganglia-gene...@lists.sourceforge.net Subject: RE: Ganglia gmond memory leak? I have seen the same behavior in my environment but do not have a solution. Nathan From:Aidan Wong [mailto:aidanw...@attinteractive.com] Sent: Wednesday, February 22, 2012 4:10 PM To: ganglia-general Subject: [Ganglia-general] Ganglia gmond memory leak? Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you CONFIDENTIALITY NOTICE: This e-mail and any files transmitted with it are intended solely for the use of the individual or entity to whom they are addressed and may contain confidential and privileged information protected by law. If you received this e-mail in error, any review, use, dissemination, distribution, or copying of the e-mail is strictly prohibited. Please notify the sender immediately by return e-mail and delete all copies from your system. -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list ganglia-gene...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you
Re: [Ganglia-developers] Ganglia-3.3.1: How to get back the old web interface ??
From: Bernard Li bern...@vanhpc.org To: Martin Knoblauch kn...@knobisoft.de Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Wednesday, February 29, 2012 9:06 AM Subject: Re: [Ganglia-developers] Ganglia-3.3.1: How to get back the old web interface ?? Hi Martin: On Tue, Feb 28, 2012 at 2:41 AM, Martin Knoblauch kn...@knobisoft.de wrote: While I think it is an interesting development, I do not think it is ready for general consuption (more later). Big question: is there a way to configure back the old behaviour?? You can just download the old source, build the web RPM and install that instead of what comes with the new version. It should just work. Hi Bernard, hey - while correct and obvious, this is not what I asked for :-) I just think it would be good to have a config option that brings back the interface to the old look/simplicity/speed. Cheers, Bernard Martin -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Ganglia-3.3.1: How to get back the old web interface ??
Hi, in order to stay current with current ganglia, I built SLES11 RPMs of 3.3.1 and installed them on one of my clusters. Only then I realized that the new WEB interface is now standard. To late ... :-( While I think it is an interesting development, I do not think it is ready for general consuption (more later). Big question: is there a way to configure back the old behaviour?? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de-- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia 3.3.0 released
Hi Vladimir, congratulations to the new release. Short questions on compatibility: will it work with a 3.1.7/3.2.0 RRD database? Will a 3.3.0 gmetad work with 3.1.7/3.2.0 gmonds? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Vladimir Vuksan vli...@veus.hr To: ganglia-developers@lists.sourceforge.net; ganglia-us...@lists.sourceforge.net Sent: Wednesday, February 1, 2012 11:38 PM Subject: [Ganglia-developers] Ganglia 3.3.0 released This was gonna be the 4.0.0 release however we received feedback that making a major version bump may get cause issues with various Linux distribution packaging policies e.g. Fedora. Therefore it's been rebranded as 3.3.0. Announcement is here http://ganglia.info/?p=489 Enjoy, Vladimir -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Looking for 3.1.7 binaries/rpms for RHEL-5.x on IA64
From: Daniel Pocock dan...@pocock.com.au To: ganglia-developers@lists.sourceforge.net Sent: Wednesday, December 21, 2011 7:58 AM Subject: Re: [Ganglia-developers] Looking for 3.1.7 binaries/rpms for RHEL-5.x on IA64 someone have those available? Species on the extinction list - I know, but a customer has a bunch of those. I believe I successfully built x86_64 RPMs using the spec file when testing the 3.1.7 release Hi Daniel, X86_64 != IA64 :-) IA64 really lacks support on all levels :-( RH does not even have it for EL6. Pity, because it is a great HPC CPU. I fell back to just building gmond, as this is all I need. Was difficult enough without some of the -devel packages the customer did not have. Back to building GNU stuff :-) So, the urgency for me is over. Cheers Martin -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Looking for 3.1.7 binaries/rpms for RHEL-5.x on IA64
Hi folks, someone have those available? Species on the extinction list - I know, but a customer has a bunch of those. Thanks in advance Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de-- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] revisiting bogus spikes
Hi Cameron, [adding the developers list] OK: 1) we write the unmodified data in line 233 to capture the raw counters. That is what we are using in line 227 for the comparison 2) ns is created and returned by hash_lookup 3) The ULONG_MAX logic in line 231 is there because we need to ensure that the result is always positive. Needed because the variables are unsigned. 4) update_ifdata is called once by metric_init and then every time one of the byte/pkts_in/out collectors fires Now this does not solve your problem ... Question: do you see any of the debug messages that should be created by update_ifdata in case of something unusual? That should help to get an idea on how the interface counters on your machine(s) look like. Lokk in /var/log/messages, or just start gmond noninteractive. Hmm. Another question: do you compile gmond in 64-bit or 32-bit mode? The ULONG_MAX logic may/will fail in 32-bit mode, if the kernel is 64-bit. It could even be that the interface counters on 32-bit kernels are written as 64-bit values. Hope this helps Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Cameron L. Spitzer cspit...@nvidia.com To: ganglia-gene...@lists.sourceforge.net ganglia-gene...@lists.sourceforge.net Sent: Thu, April 28, 2011 3:21:04 AM Subject: [Ganglia-general] revisiting bogus spikes Once again I've been asked to make Ganglia usable on Linux hosts with the Broadcom NIC with the 32-bit byte counters. E.g., HP Proliant 580 G5, a rather popular machine where Ganglia doesn't work out of the box. So I'm trying to understand ganglia-3.1.7/libmetrics/linux/metrics.c again. In update_ifdata(), we parse /proc/net/dev for the current bytes and packets in and out. There's a structure ns (declared where?) of type net_dev_stats, representing the previous sample? I'm not sure exactly what ns represents. There's a sanity check at line 227 if ( rbi = ns-rbi ) for whether the counter went up or down. If it went down, we assume the counter rolled around, and guess the value is negative, and invert it, line 231. l_bytes_in += ULONG_MAX - ns-rbi + rbi; (I don't understand how that is supposed to work.) Then, regardless of whether the sample passed or failed the sanity check, it's saved in the ns structure. Line 233, ns-rpi = rpi; After the parsing is all done, and the crazy value is in ns, an optional reasonableness test (REMOVE_BOGUS_SPIKES) returns early if any of the numbers are extremely large. Otherwise it updates the static running counts and then returns. On our HP 580G5s, defining REMOVE_BOGUS_SPIKES had no effect. The network traffic graphs become useless within a minute of starting gmond. The part I don't understand is when the line 227 check fails, we put the known-bad data in ns anyway. I'd appreciate it if someone familiar with update_ifdata() could explain its logic. When is this routine called? (I can see modules/network/mod_net.c calls it via bytes_in_func(), but I haven't figured out when net_metric_handler() is called. Maybe that would explain how bogus data in ns doesn't matter.) Is there any way to keep way out-of-scale data out of these graphs? Thanks for any help. -Cameron in Los Gatos This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] 3.1 branch backport proposals
Hi Bernard, for what it is worth... +1 for both. Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Bernard Li bern...@vanhpc.org To: ganglia-developers@lists.sourceforge.net Sent: Tue, January 25, 2011 9:29:52 PM Subject: [Ganglia-developers] 3.1 branch backport proposals Hi all: Could someone please vote on the following two backport proposals for 3.1? * build: Install manpages in appropriate locations when `make install` is run http://sourceforge.net/apps/trac/ganglia/changeset/2299 http://sourceforge.net/apps/trac/ganglia/changeset/2301 +1: bernardli * build: Include BUGS file to distribution tarball http://sourceforge.net/apps/trac/ganglia/changeset/2455 +1: bernardli bernardli: depends on Install manpages in appropriate locations when `make install` is run Thanks! Bernard -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -i -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Fw: [Ganglia-general] How can gmetad be configured for 2 clusters?
really adding the developers ... - Forwarded Message From: Martin Knoblauch kn...@knobisoft.de To: David Birdsong david.birds...@gmail.com; Whit Blauvelt w...@transpect.com Cc: ganglia-gene...@lists.sourceforge.net Sent: Sat, November 13, 2010 8:34:43 AM Subject: Re: [Ganglia-general] How can gmetad be configured for 2 clusters? - Original Message From: David Birdsong david.birds...@gmail.com To: Whit Blauvelt w...@transpect.com Cc: Martin Knoblauch kn...@knobisoft.de; ganglia-gene...@lists.sourceforge.net Sent: Fri, November 12, 2010 9:56:26 PM Subject: Re: [Ganglia-general] How can gmetad be configured for 2 clusters? On Fri, Nov 12, 2010 at 9:19 AM, Whit Blauvelt w...@transpect.com wrote: On Fri, Nov 12, 2010 at 08:35:44AM -0800, Martin Knoblauch wrote: In order to separate the two clusters, they need to run on different ports. In addition: when you list more than one node on the data_source, this does not define the cluster. I just adds failover capability. gmetad will only talk to one of the hosts at a time. If that fails, it will try the next on the list. Thanks Martin. That was the whole trick. I was making the assumption that gmetad, being meta, would be the gatherer of data from the nodes. Understanding that the gmonds go ahead and consolidate that changes the picture entirely. As my five-year-old sometimes says, Silly me. Whit While I can't argue against something that clearly fixed this for you, this doesn't sound correct and it would be nice to hear this clarified. Sure every host would have info about every other host, but each host's xml tree should have all the nodes in a nested in their corresponding cluster tags. Gmetad could hit any host and pick up info about both clusters on any host, but it should know to distribute the updates from the xml stream to the correct clusters and not 'cross pollinate' the two. As far as I know, every gmond just puts all the information it has inside its own cluster tags. It does not care about the cluster tags it receives from other gmonds. It has always been the task of gmetad to build up the correct XML for the complete grid. Therefore it is vital that the gmond configuration for multiple clusters is correct. One could argue that this behaviour of gmond needs improvement. One solution could be that it aggregates only data coming from the cluster. On the other hand, the cluster tag is just optional. What should a gmond without such a tag do about data from tagged gmonds? I still favor correct configuration. In any case, I am adding ganglia developers to CC. But the confusion shows, that documentation might be lacking ... Cheers Martin -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.7 ready for testing
- Original Message From: Daniel Pocock dan...@pocock.com.au To: kn...@knobisoft.de Cc: ganglia-developers@lists.sourceforge.net; ganglia-gene...@lists.sourceforge.net ganglia-gene...@lists.sourceforge.net Sent: Tue, March 2, 2010 12:23:32 PM Subject: Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.7 ready for testing Thanks to those who provided feedback - any objections to making 3.1.7 generally available? I would like to make it GA within the next 1-2 days now. unless there is a [severe] regression compared to 3.1.2 - just let it escape. You know, the perfect is the enemy of the good. Cheers Martin Michael Perzl wrote: I have successfully compiled and tested 3.1.7 on - AIX 5.1 ML04 - AIX 5.3 ML00 - AIX 5.3 TL07 - AIX 6.1 TL03 Regards, Michael On 02/22/2010 12:15 PM, Daniel Pocock wrote: Just a reminder - any feedback is welcome, or feel free to discuss 3.1.7 on IRC It would be good to have positive confirmation of which platforms this has been tested on, so far, I have tested - Debian lenny, - RHEL3/4/5, - CentOS 5, - Solaris 8 and - Cygwin. and Brad has done some testing on SLES10 Regards, Daniel Daniel Pocock wrote: I've tagged 3.1.7 and built a tarball: http://ganglia.info/testing/ganglia-3.1.7.tar.gz The md5sum for 3.1.7 is: 6aa5e2109c2cc8007a6def0799cf1b4c Since 3.1.6, only two things have changed and may need to be tested again by those who tested 3.1.6: - the build system (support for commas in CFLAGS) - the multicpu module - percentages reported differently This is not confirmation that the release is in GA status - a further notification will be sent when the testing period has elapsed without any serious defect. Users are invited to test the tarball and submit feedback. Please do not commit on branches/monitor-core-3.1 until after 3.1.7 goes GA, in case further tweaks are needed to facilitate a successful release. Below are the release notes from the STATUS file. Other documentation has also changed since 3.1.2 and should be reviewed: GANGLIA 3.1 STATUS: -*-text-*- Last modified at [$Date: 2010-02-17 11:01:08 + (Wed, 17 Feb 2010) $] The current version of this file can be found at: * http://ganglia.svn.sourceforge.net/svnroot/ganglia/branches/monitor-core-3.1/STATUS Release history: 3.1.7 : Tagged: Feb 17, 2010 3.1.6 : Tagged: Feb 4, 2010 (not released for GA) 3.1.5(hargrave) : Tagged: Nov 24, 2009 (not released for GA) 3.1.4(hargrave) : Tagged: Oct 26, 2009 (not released for GA) 3.1.3(avenger): Tagged: Sep 19, 2009 (not released for GA) 3.1.2(langley): Released: Feb 17, 2009 3.1.1(wien) : Released: Sep 10, 2008 3.1.0(amelia) : Released: Jul 30, 2008 Contributors looking for a mission: * Just do an egrep on TODO, XXX or FIXME in the source. * Review the bug database at: http://bugzilla.ganglia.info/ * Open bugs in the bug database. * Implement a feature from the wishlist at: http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list CURRENT RELEASE NOTES: (Please update this area with a brief description of bug fixes and enhancements that have been backported for the current release) Note: 3.1.3, 3.1.4, 3.1.5 and 3.1.6 never became GA, therefore, the release notes for all of them are combined below. 3.1.7: * Fix build support for RHEL5/issue with commas in CFLAGS * multicpu module: show CPU utilization as a value between 0-100% for each core 3.1.6: * Merge commit 1966 from trunk to fix contrib/removespikes.pl * Bootstrapping with Debian 5.0 (lenny) versions of autotools for this and future releases. http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05352.html http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04688.html * Require user to explicitly specify sysconfdir when building from source, due to the fact that the old behavior was not consistent with the documented behavior. * Configuration files and scripts are now created during the install phase rather than during configure. This allows values such as @sysconfdir@ to be used in the template configuration files. * Abolish the use of release names - only release numbers will be used to distinguish versions in future * libmetrics: workaround system header conflict in DFBSD= 2.4 (BUG245) * Use PCRE regex matching to configure metrics using the name_match directive * rrdcached support * gmetad now uses apr and the sleep intervals between polls are randomized in a way that supports shorter polling intervals * FreeBSD support: fixes for crashes
Re: [Ganglia-developers] versioning confusion
- Original Message From: Brad Nicholes bnicho...@novell.com To: Martin Knoblauch kn...@knobisoft.de; Ramon Bastiaans ramon.bastia...@sara.nl Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Thu, February 4, 2010 4:33:31 PM Subject: Re: [Ganglia-developers] versioning confusion On 2/4/2010 at 6:50 AM, in message 4b6ad096.8030...@sara.nl, Ramon Bastiaans wrote: Ahh, I see. On 02/04/2010 12:11 PM, Martin Knoblauch wrote: If we were to make release candidates publically available with a release number other than major.minor.revision (for example 3.1.3rc1), we would also be required to put this same release number in the source code itself to ensure that there is a differentiation between a release candidate and the official release since both would be made public (one during the testing period and the other being an official release). In order to transition the release candidate, in this case to an official release, we would be required to explode the tarball, change the version number, retag SVN with the changed file and revision number, re-boot strap the source code, recreate the tarball and then finally make the new tarball publically available under the final release number. All of this leaves the final tarball open to potential problems. It just makes more sense from a testing and release prospective to release the tarball in the exact condition as it was tested. This leaves no possibility for errors or problems creeping into the final released tarball. So, why not put the rc or pre Tag into an GANGLIA_EXTRA_VERSION and embed that into the code. That way there would be no confusion about what is in the tarball. Then we could have as many testing releases before the final one. SVN tags are cheap. What am I missing? I mean, now we are confuing people with skipped releases. Another option would be to tag and tar the source code under the final release version number and make it available for testing. Then if bugs are found during testing, fix the bugs, retag and retar under the same version number. The problem with this is that we could end up with multiple different tarballs all with the same version number publically available. The only way to tell which one was the real release would be by the date on the tarball rather than version number. much to convoluted and confusing. Agreed. Anyway, you can read more about this process on the Ganglia wiki page at http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works This release process was basically patterned after the way that the Apache httpd project produces testing and official tarballs. As I said in the past, that process may work for Apache. I do not see many skipped releases there. Maybe they have a more strict project management. Personally I think Ganglia is to small for that. Watching the discussions here, I see us spend more time on process than on progress. But I maybe burnt by day-job. There I am forced to follow a lot of completetly bogus (technically) processes, just to make some beancounters an process-engineers happy (no, I dont like either). Cheers Martin -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] versioning confusion
- Original Message From: Ramon Bastiaans ramon.bastia...@sara.nl To: ganglia-developers@lists.sourceforge.net Sent: Thu, February 4, 2010 10:19:03 AM Subject: [Ganglia-developers] versioning confusion Hi, I haven't been following all the discussions lately, but I'm getting a bit confused on what the latest Ganglia 3.1 release is. I see communications about 3.1.6 on the developer list, while the latest downloadable version on www.ganglia.info is still 3.1.2. What happened with version 3.1.3, 3.1.4 and 3.1.5? Skipping versions is highly confusing to me and I don't really understand the reasoning behind it. Or is the website simply not updated? 3.1.3 .. 3.1.5 were canned during testing. Apparently our process does not allow for fixing bugs/regressions between tagging and final release, so it was decided to never publish the intermediates. One of the reasons might be lack of good beta testing (which I am guilty of myself :-(, but I do not really understand, why we couldn't just keep 3.1.3 as the name of the release. Cheers Martin -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Policy on updating files in 3.1.x/contrib
Hi, what is the policy for updating files in the contrib directory of 3.0.x and 3.1.x? Do I need to do the backport approval dance (*)? Or can I just go ahead. The removespikes.pl file needs an update in the 3.1.x branch. Cheers Martin (*) No, I never liked the process ... -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Policy on updating files in 3.1.x/contrib
- Original Message From: Daniel Pocock dan...@pocock.com.au To: Martin Knoblauch kn...@knobisoft.de Cc: ganglia-developers@lists.sourceforge.net Sent: Wed, February 3, 2010 1:04:51 PM Subject: Re: [Ganglia-developers] Policy on updating files in 3.1.x/contrib what is the policy for updating files in the contrib directory of 3.0.x and 3.1.x? Do I need to do the backport approval dance (*)? Or can I just go ahead. The removespikes.pl file needs an update in the 3.1.x branch. Any updates to 3.1 require co-ordination from the release manager (myself) when a release is imminent (as it is now). Generally, let me know the commit number(s) on trunk and then I will let you know if you can backport it on 3.1.6 or wait for 3.1.7. According to the policies, the release manager has the final say, but I am open to consider anyone who has an opinion for/against a particular patch. To backport something for 3.0, it needs to meet two criteria: - formal approval (vote) - it must have already been backported to 3.1 Hi Daniel, please consider r1966 for inclusion into 3.1.x Being in contrib, it has no (zero) impact on the core functionality. The current commit in 3.1 (r1699) is plain broken. It is no issue for 3.0.x, as the file does not exist there. In order to avoid the process and because 3.0.x should only get critical fixes, I will not request inclusion. Cheers Martin -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Policy on updating files in 3.1.x/contrib
- Original Message From: Daniel Pocock dan...@pocock.com.au To: Martin Knoblauch kn...@knobisoft.de Cc: ganglia-developers@lists.sourceforge.net Sent: Wed, February 3, 2010 4:47:23 PM Subject: Re: [Ganglia-developers] Policy on updating files in 3.1.x/contrib Martin Knoblauch wrote: - Original Message From: Daniel Pocock To: Martin Knoblauch Cc: ganglia-developers@lists.sourceforge.net Sent: Wed, February 3, 2010 1:04:51 PM Subject: Re: [Ganglia-developers] Policy on updating files in 3.1.x/contrib what is the policy for updating files in the contrib directory of 3.0.x and 3.1.x? Do I need to do the backport approval dance (*)? Or can I just go ahead. The removespikes.pl file needs an update in the 3.1.x branch. Any updates to 3.1 require co-ordination from the release manager (myself) when a release is imminent (as it is now). Generally, let me know the commit number(s) on trunk and then I will let you know if you can backport it on 3.1.6 or wait for 3.1.7. According to the policies, the release manager has the final say, but I am open to consider anyone who has an opinion for/against a particular patch. To backport something for 3.0, it needs to meet two criteria: - formal approval (vote) - it must have already been backported to 3.1 Hi Daniel, please consider r1966 for inclusion into 3.1.x Being in contrib, it has no (zero) impact on the core functionality. The current commit in 3.1 (r1699) is plain broken. Ok, you can go ahead and apply this patch on monitor-core-3.1 Please include a note about it in the STATUS file as part of the commit Done. Unfortunatelly I botched the commit message, but the commit itself is OK. Cheers Martin -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [RFC] status update for removing ganglia release names from the code
Hi folks, my comment to that thread still stays [1], and I do not think that coming up with a name is so difficult that it actually can block a release (seems 3.1.x has bigger problems than that :-). But it really seems to be used nowhere (I thought the name was displayed on the web page ?), so lets come to a closure on this. So my proposals are: a) display the name on the web-page to make it non-dead or b) nuke it a) of course preferred. Cheers Martin [1] http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg04698.html-- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Carlo Marcelo Arenas Belon care...@sajinet.com.pe To: Jesse Becker haw...@gmail.com Cc: ganglia-developers@lists.sourceforge.net Sent: Thu, December 3, 2009 1:44:19 PM Subject: [Ganglia-developers] [RFC] status update for removing ganglia release names from the code Jesse There is a backport request for 3.1 labeled build: remove ganglia release name from the code and that has a veto from you which I would like to see reconsidered. your objection refers to a thread[1] that includes the explanation of why this backport proposal is consistent with the consensus at that time (and which has since changed[2]) as it only removes the name from the web frontend configuration where it wasn't being used (dead code): http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg04719.html It is important to note that since the proposal has been stalled for a long time it won't be able to cleanly be backported from trunk and so to simplify the reviewing process a conflict free version of it is attached to this email. Carlo [1] http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg04697.html [2] http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05246.html -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Question on the Ganglia RRD Database
Hi Michael, yes, your reply is very helpful as it corrects my understanding of how the RRD database is defined. Makes all a lot of sense now. So I am basically defining my RRAs as follows, with a 4 sec polling intervall: RRAs RRA:AVERAGE:0.5:1:315 \ RRA:AVERAGE:0.5:3:315 \ RRA:AVERAGE:0.5:24:315 \ RRA:AVERAGE:0.5:72:315 \ RRA:AVERAGE:0.5:504:315 \ RRA:AVERAGE:0.5:1008:315 \ RRA:AVERAGE:0.5:2016:315 \ RRA:AVERAGE:0.5:6048:315 \ RRA:AVERAGE:0.5:12096:315 \ RRA:AVERAGE:0.5:21600:370 With 5% margin from 20-minutes to 6-month and 370 days for the year. Cheers Martin - Original Message From: Michael Perzl mich...@perzl.org To: ganglia-developers@lists.sourceforge.net Sent: Wed, November 25, 2009 4:55:19 PM Subject: Re: [Ganglia-developers] Question on the Ganglia RRD Database Hi Martin, I think this is how the default monitoring intervals have to be interpreted: RRAs \ RRA:AVERAGE:0.5:1:240\ RRA:AVERAGE:0.5:24:240 \ RRA:AVERAGE:0.5:168:240 \ RRA:AVERAGE:0.5:672:240 \ RRA:AVERAGE:0.5:5760:370 used for display of Take 240 samples at15 seconds intervalshour Take 240 samples at 24 × 15 seconds (= 6 minutes) intervalsday Take 240 samples at 168 × 15 seconds (= 42 minutes) intervalsweek Take 240 samples at 672 × 15 seconds (= 168 minutes) intervalsmonth Take 370 samples at 5760 × 15 seconds (= 24 hours)intervalsyear So I think for your case you have to decide how many samples of the chosen sampling rate (20 minutes, 8 hours etc.) you want to collect which then determines the overall time interval covered by this specific sampling rate. The main question is: How granular do you want the sampling rate to be for a given time interval? This then determines: a) the number of multiples of 15 seconds (to get the sampling rate) b) the total number of samples required (number of samples x sampling rate = time interval) Hope that helps. Regards, Michael On 11/25/2009 02:24 PM, Martin Knoblauch wrote: Hi folks, currently I am setting up monitoring for a cluster, where the demand is to have additional monitoring intervalls. We want to see stuff like 20-minutes, 8-hours, 2-weeks, 3-month and 6-month. Doing so seems easy, but I have a question on the RRA definitions. The default setup seems to be (assuming a 15 second polling intervall): hour- RRA:AVERAGE:0.5:1:244 day - RRA:AVERAGE:0.5:24:244 week - RRA:AVERAGE:0.5:168:244 month - RRA:AVERAGE:0.5:672:244 (more like 4-weeks :-) year- RRA:AVERAGE:0.5:5760:374 (367.86 days) So from hour to month we have 244 datapoints with nicely increasing steps (1,24*1,7*24*1,4*7*24*1). So why are we doing it differently for the year? I would have expected the year RRA to be RRA:AVERAGE:0.5:8784:244 (366 days). Any particular reasons for this? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Question on the Ganglia RRD Database
Hi folks, currently I am setting up monitoring for a cluster, where the demand is to have additional monitoring intervalls. We want to see stuff like 20-minutes, 8-hours, 2-weeks, 3-month and 6-month. Doing so seems easy, but I have a question on the RRA definitions. The default setup seems to be (assuming a 15 second polling intervall): hour- RRA:AVERAGE:0.5:1:244 day - RRA:AVERAGE:0.5:24:244 week - RRA:AVERAGE:0.5:168:244 month - RRA:AVERAGE:0.5:672:244 (more like 4-weeks :-) year- RRA:AVERAGE:0.5:5760:374 (367.86 days) So from hour to month we have 244 datapoints with nicely increasing steps (1,24*1,7*24*1,4*7*24*1). So why are we doing it differently for the year? I would have expected the year RRA to be RRA:AVERAGE:0.5:8784:244 (366 days). Any particular reasons for this? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Avg Utilization
- Original Message From: Witham, Timothy D [EMAIL PROTECTED] To: Brad Nicholes [EMAIL PROTECTED]; [EMAIL PROTECTED] [EMAIL PROTECTED]; ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Monday, December 8, 2008 5:48:31 PM Subject: Re: [Ganglia-developers] Avg Utilization Windows users are looking at the figure and thinking that `Avg Utilization' refers to CPU utilization (from the cpu_report graph). Maybe both are needed: cluster_util_load: displayed as `Avg Utilization (Load)' cluster_util_cpu: displayed as `Avg Utilization (CPU)' Can anyone suggest a better way to name these figures, or would this be an acceptable patch? I would remove all that and put the numbers on the graphs only, like done in trunk. I think it is clearer to have the % numbers on all of the graphs. That way the user sees average values for all metrics plotted, right there in the graph legend. So they can look at CPU or Load or Memory or anything, instead of wondering what the number off to the side is. I have voted this way in the 3.1 STATUS file. can't vote at the moment, but I really like the display of the average(s) on the overview pages. My vote is +1. IMO the graphs are already to cluttered for a quick overview and I have ideas for adding even more clutter (that matters to me, like the date the graph was produced :-). But if clutter doesn't matter, why not do both? I do not believe that the additional overhead is such a problem. Cheers Martin -- SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] printing output with rrdtool
- Original Message From: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] To: Jesse Becker [EMAIL PROTECTED] Cc: ganglia-developers ganglia-developers@lists.sourceforge.net Sent: Saturday, September 13, 2008 8:46:30 PM Subject: Re: [Ganglia-developers] printing output with rrdtool On Fri, Sep 12, 2008 at 03:57:37PM -0400, Jesse Becker wrote: On Fri, Sep 12, 2008 at 13:33, Carlo Marcelo Arenas Belon the following commit (r1754 in ganglia's svn) seems to be patching the fix proposed by Jason as part of BUG37 and that was committed in r1595 and has been left unconsistent (as not all uses of this feature has been converted to use /dev/null). Then the other instances should be converted, IMO. Committed revision 1760. seen the commit and the remark about Windows. Please ignore my ignorance, but wouldn't NUL: serve the purpose on the evil OS? Cheers Martin - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Anyone experience petabyte peaks in network metric in ganglia 3.x.y ?
- Original Message From: Escobio, Roger [EMAIL PROTECTED] To: Bernard Li [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 10, 2008 9:19:47 PM Subject: Re: [Ganglia-general] Anyone experience petabyte peaks in network metric in ganglia 3.x.y ? -Original Message- From: Bernard Li [mailto:[EMAIL PROTECTED] Sent: September 10, 2008 3:05 PM To: Escobio, Roger [CMB-IT] Cc: [EMAIL PROTECTED] Subject: Re: [Ganglia-general] Anyone experience petabyte peaks in network metric in ganglia 3.x.y ? Hi Roger: On Wed, Sep 10, 2008 at 11:52 AM, Martin Knoblauch wrote: I created a patch again linux/metrics.c (3.1.1 version) to add the counterdiff function found in *bsd/metrics.c Are you interested in it? Just let me know and I'll send it to the list Yes please. I am definitely like to have a look at your patch. In case the patch is too large to be sent to the mailing-list, you could also file a bug and upload the patch via bugzilla.ganglia.info. Well, is not big As I said, it is just a copy paste from *bsd code , so not big deal :-) But at least compile and not coredump gmond in our linux :-) Roger, [changed Mailing List to ganglia-developers] just to understand things right, your patch is only a code cleanup and you still need the #ifdef REMOVE_BOGUS_SPIKES to get rid of the spikes. Correct? Some comments on the patch: --- libmetrics/linux/metrics.c-ori 2008-09-09 18:54:40.0 + +++ libmetrics/linux/metrics.c 2008-09-09 19:09:44.0 + @@ -222,40 +222,20 @@ if ( !ns ) return; rbi = strtoul( p, p ,10); -if ( rbi = ns-rbi ) { - l_bytes_in += rbi - ns-rbi; -} else { - debug_msg(update_ifdata(%s) - Overflow in rbi: %lu - %lu,caller,ns-rbi,rbi); - l_bytes_in += ULONG_MAX - ns-rbi + rbi; -} +l_bytes_in = counterdiff(rbi,ns-rbi,ULONG_MAX, 0); Shouldn't that be += counterdiff/...? l_bytes_in is cummulated over all NICs. ns-rbi = rbi; rpi = strtoul( p, p ,10); -if ( rpi = ns-rpi ) { - l_pkts_in += rpi - ns-rpi; -} else { - debug_msg(updata_ifdata(%s) - Overflow in rpi: %lu - %lu,caller,ns-rpi,rpi); - l_pkts_in += ULONG_MAX - ns-rpi + rpi; -} +l_pkts_in = counterdiff(rpi,ns-rpi,ULONG_MAX, 0); ditto ns-rpi = rpi; for (i = 0; i 6; i++) strtol(p, p, 10); rbo = strtoul( p, p ,10); -if ( rbo = ns-rbo ) { - l_bytes_out += rbo - ns-rbo; -} else { - debug_msg(update_ifdata(%s) - Overflow in rbo: %lu - %lu,caller,ns-rbo,rbo); - l_bytes_out += ULONG_MAX - ns-rbo + rbo; -} +l_bytes_out = counterdiff(rbo,ns-rbo,ULONG_MAX, 0); ditto ns-rbo = rbo; rpo = strtoul( p, p ,10); -if ( rpo = ns-rpo ) { - l_pkts_out += rpo - ns-rpo; -} else { - debug_msg(update_ifdata(%s) - Overflow in rpo: %lu - %lu,caller,ns-rpo,rpo); - l_pkts_out += ULONG_MAX - ns-rpo + rpo; -} +l_pkts_out = counterdiff(rpo,ns-rpo,ULONG_MAX, 0); ditto ns-rpo = rpo; } p = index (p, '\n') + 1;// skips a line @@ -1305,3 +1285,40 @@ val.f = most_full; return val; } + +static unsigned long +counterdiff(unsigned long oldval, unsigned long newval, unsigned long maxval, unsigned long maxdiff) +{ + unsigned long diff; + + if (maxdiff == 0) + maxdiff = maxval; + + /* Paranoia */ + if (oldval maxval || newval maxval) + return 0; Really cannot happen with maxval being ULONG_MAX. Even the paranoid should feel safe here :-) + + /* +* Tackle the easy case. Don't worry about maxdiff here because +* we're SOL if it happens (i.e. assuming a reset just makes +* matters worse).This +*/ + if (oldval = newval) + return (newval - oldval); + + /* +* Now the tricky part. If we assume counters never get reset, +* this is easy. Unfortunaly, they do get reset on some +* systems, so we need to try and deal with that. Our huristic +* is that if out difference is greater then maxdiff and newval +* is less or equal to maxdiff, then we've probably been reset +* rather then actually wrapping. Obviously, you need to be +* careful to poll often enough
Re: [Ganglia-developers] Updated patches available for trunk
Original Message From: Witham, Timothy D [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Sent: Wednesday, August 27, 2008 6:20:21 PM Subject: RE: [Ganglia-developers] Updated patches available for trunk please see my comment on #193. Your proposal to add averages to the graphs is great, but goes over the original request in #193, which also is great. Ok, I can submit it separately. But IMHO, if the number is displayed on the graph, then the same number doesn't need to be displayed on the HTML. Seems redundant, which is why I put it in that existing bug as an alternate way to reach the goal. Hi Timothy, the original #193 proposal makes the number stand out on its own, which I really like. Is the extra call to rrdtool really that expensive? So, having the average on the load graphs as well is fine. But I kind of fear that the graphs may get cluttered. And if asked, personally I would even more love to see a timestamp - at least on the enlarged versions of the graphs. Good for reporting/documentation purposes. Cheers Martin - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia release name
- Original Message From: Bernard Li [EMAIL PROTECTED] To: Carlo Marcelo Arenas Belon [EMAIL PROTECTED]; ganglia-developers ganglia-developers@lists.sourceforge.net Sent: Friday, August 22, 2008 2:23:13 AM Subject: [Ganglia-developers] Ganglia release name Hi Carlo: It looks like in this commit, you have removed the release name for Ganglia: http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=revrevision=1703 I didn't see you re-add the name some place else, so I assume your proposal is to get rid of release name for future releases completely? if true, I would find it very sad. Even if it has no technical use, it somehow belongs to Ganglia. Some people still wonder what the names stand for :-) Just my 0.02 € Martin - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Bugzilla Bug 193: Avg Load percentages and overall cluster utilization.
- Original Message From: Bernard Li [EMAIL PROTECTED] To: ganglia-developers ganglia-developers@lists.sourceforge.net Sent: Wednesday, August 20, 2008 2:33:02 AM Subject: [Ganglia-developers] Bugzilla Bug 193: Avg Load percentages and overall cluster utilization. Dear all: Hi Bernard, http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=193 This patch has not been tested with meta-sources (i.e. gmetad aggregating gmetad) and thus the average utilization numbers are incorrect for grid of grids. thanks for spotting. It just shows that the number of deployment scenarios is just to big for the patch/feature developers. And that we cannot assume that a release will have been completely tested for all scenarios. I currently have an incomplete fix, but I need to get consensus as to what average utilization really means for grid of grids: should average utilization for a grid be load average divided by the number of cpus for the *entire* meta-grid or just over the grid in question? Alternatively, we can rollback this backport and punt it until 3.1.2. No real strong feelings. On a related note, I think we should distinguish between a Grid and a Meta-Grid (i.e. a grid of grids) in the Front End -- do people care? Definitely a good idea, as it seems to be a more and more common case. Cheers Martin - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-svn] SF.net SVN: ganglia:[1538]trunk/monitor-core/Makefile.am
- Original Message From: Brad Nicholes [EMAIL PROTECTED] To: Jesse Becker [EMAIL PROTECTED]; ganglia-developers ganglia-developers@lists.sourceforge.net Sent: Thursday, July 10, 2008 9:48:20 PM Subject: Re: [Ganglia-developers] [Ganglia-svn] SF.net SVN: ganglia:[1538]trunk/monitor-core/Makefile.am On 7/10/2008 at 12:52 PM, in message , Jesse Becker wrote: On Thu, Jul 10, 2008 at 13:15, Brad Nicholes wrote: I'm OK with it either way. If we add contrib/ to the package, then we should still have someplace where we put stuff that we like and think is valuable, but haven't approved yet. Does that make sense? However a download page on the wiki or some other kind of web directory listing might make it easier to reference for the user. This is exactly what contrib directories are for: things that are useful and worth distributing as a courtesy, but are *not* directly supported by the main development team. If something is ever promoted/taken over by main developers, then it gets removed from contrib/, and added into the proper location elsewhere in the project. So what does that mean? Should contrib/ be part of the tarballs, snapshots, releases or just an SVN repository location for misc. stuff? I personally would put them into any archive that gets into the hand of developers: tarballs, src-RPMs, ... They don't neccessarily have to be part of the binary packages. On the other hand, would it hurt? What about /usr/share/ganglia/contrib/ ? Cheers Martin - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] relicensing the web frontend as GNU GPL v2
Hi Carlo, v2/v2+ is fine with me. Nice and clear, almost understandable to a human being (as opposed to a lawyer). Btw. what is the overall licensing status? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] To: ganglia-developers@lists.sourceforge.net Sent: Saturday, April 19, 2008 9:14:19 AM Subject: [Ganglia-developers] relicensing the web frontend as GNU GPL v2 most likely just a formality, as the web frontend templating system was based on the GPLv2+ TemplatePower class from the very beginning (at least as shown from the history in svn). a quick line count from the files involved says the contributers that will need to consent will be (including number of lines committed from all files in the web directory including non php files which could be as well discarded as an alternative) : 38 bnicholes 87 carenas 410 knobi1 426 bernardli 686 hawson 830 sacerdoti 3940 massie the web/COPYING file will need to be updated after that so that the use of class.TemplatePower.inc.php is consistent with the rest of the frontend code. as stated in the title, GPLv2 only will be my suggestion, but I am also ok with GPLv2+ or GPLv3 if someone has a really good argument for it. Carlo - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak with gmetad r1224
- Original Message From: Brad Nicholes [EMAIL PROTECTED] To: Kumar vaibhav [EMAIL PROTECTED]; Martin Knoblauch [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Saturday, April 12, 2008 5:20:54 PM Subject: Re: [Ganglia-developers] Memory leak with gmetad r1224 This has been fixed already. Check out r1229 r1229 is for gmetad. Kumar complains about gmond. Right? Martin Brad On 4/12/2008 at 3:09 AM, in message [EMAIL PROTECTED], Martin Knoblauch wrote: Hi Kumar, any chance to get valgrind snapshots for various runtimes to distinguish between one-time allocations that stay for the entire lifetime and stuff that actually grows? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Kumar vaibhav To: Brad Nicholes Cc: Ganglia Developers Sent: Saturday, April 12, 2008 9:09:59 AM Subject: Re: [Ganglia-developers] Memory leak with gmetad r1224 Hi, I am seeing memory leaks in gmond also. It is memory footprint is growing with time Vaibhav Brad Nicholes wrote: On 4/10/2008 at 5:13 PM, in message , Bernard Li wrote: Hi guys: Looks like we might have introduced memory leak in gmetad recently. I don't have the exact numbers, but the memory usage is definitely growing. I left my gmetad running for 2 days, and it was consuming ~500MB and there is only one host. You're right, I am seeing it also. I will take a look to see if I can spot what might have caused this. Brad - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaon e ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaon e ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak with gmetad r1224
Hi Kumar, any chance to get valgrind snapshots for various runtimes to distinguish between one-time allocations that stay for the entire lifetime and stuff that actually grows? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Kumar vaibhav [EMAIL PROTECTED] To: Brad Nicholes [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Saturday, April 12, 2008 9:09:59 AM Subject: Re: [Ganglia-developers] Memory leak with gmetad r1224 Hi, I am seeing memory leaks in gmond also. It is memory footprint is growing with time Vaibhav Brad Nicholes wrote: On 4/10/2008 at 5:13 PM, in message , Bernard Li wrote: Hi guys: Looks like we might have introduced memory leak in gmetad recently. I don't have the exact numbers, but the memory usage is definitely growing. I left my gmetad running for 2 days, and it was consuming ~500MB and there is only one host. You're right, I am seeing it also. I will take a look to see if I can spot what might have caused this. Brad - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Install locations of gmetric and gstat
- Original Message From: Jesse Becker [EMAIL PROTECTED] To: Bernard Li [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Wednesday, April 2, 2008 9:17:09 PM Subject: Re: [Ganglia-developers] Install locations of gmetric and gstat On Wed, Apr 2, 2008 at 3:01 PM, Bernard Li wrote: On Wed, Apr 2, 2008 at 11:48 AM, Jesse Becker wrote: Gmetric injects metrics to the collection framework which gmond/gmetad belongs to, so to quote Martin, by logic, they should belong in the same location. Well, both ssh and sshd are part of a secure communications framework. Would you put ssh in /usr/sbin? :-) Now, ssh is the *user* tool that is needed to use the ssh service. It needs to be in the standard user PATH. gmetric on the other hand is a tool that does not belong/need-to-be in the hand of common users. Usually only administrators define what metrics should go into the ganglia stream. Therefore its place should be both near to gmond and out of the standard user PATH. And of course gstat is a user tool again. Just read access to the data stream, not possible to do any harm. I'll quote the FHS: /usr/sbin : Non-essential standard system binaries /usr/bin : Most user commands Based on that, I'll buy the gmetric in /usr/sbin argument. Actually to me .../sbin always stand for stuff that the dirty masses should not see by default :-) But then I have never been known for my political correctness :-)) Cheers Martin - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Install locations of gmetric and gstat
Hi Bernard, by logic gmetric definitely belongs to .../sbin. Personally I think gstat and gexec belong to .../bin. They are user commands and they are not really part of the collection framework. Oh, it is also good to see that gstat moved out of gmond. That always irritated me. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Bernard Li [EMAIL PROTECTED] To: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Wednesday, April 2, 2008 8:37:30 PM Subject: [Ganglia-developers] Install locations of gmetric and gstat Currently gmetric and gstat are installed in /usr/bin, whereas gmond and gmetad are installed in /usr/sbin. IMHO I think all binaries should be installed to /usr/sbin. One might argue that maybe gstat should be made available to users, but I think gmetric should definitely be confined to /usr/sbin. Thoughts? Cheers, Bernard - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Commit bugfix for bz #76 into 3.0.X
Hi folks, any objections to commit the attached patch for a longstanding gmetad problem into 3.0.X? I already put it into trunk a few days ago. The fix was developed by Timothy on top of 3.0.6 and is in production use. Please vote. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de Index: server.c === --- server.c (revision 1102) +++ server.c (working copy) @@ -285,8 +285,18 @@ return 0; case SUMMARY: - return source_summary((Source_t*) node, client); +/* use the mutex to avoid reporting incomplete sums -twitham (bug#76) */ + if (((Source_t*)node)-sum_finished) + pthread_mutex_lock(((Source_t*)node)-sum_finished); + + int i = source_summary((Source_t*) node, client); + + if (((Source_t*)node)-sum_finished) + pthread_mutex_unlock(((Source_t*)node)-sum_finished); + + return i; + default: break; } - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
- Original Message From: Kumar Vaibhav [EMAIL PROTECTED] To: Jesse Becker [EMAIL PROTECTED] Cc: Martin Knoblauch [EMAIL PROTECTED]; Ganglia Developers ganglia-developers@lists.sourceforge.net; Bernard Li [EMAIL PROTECTED] Sent: Friday, March 21, 2008 8:16:42 AM Subject: Re: [Ganglia-developers] Memory leak in gmond Hi All, I am still seeing some memory leak in the nodes Now the problem is not in the deaf mode but in the mute mode. To reduce the debugging complexity I am running the 3.0.7 on 2 nodes one in deaf mode and other in mute mode. The deaf mode is working fine and the node in mute mode is giving memory leak. Here is the o/p of the valgrind for the node with mute mode. Hi Kumar, while I almost assume that some/most of the leaks that you are seeing are one-time allocations that just live until process-end, I am at least confused about the ones from hash_lookup. This is part of a metrics sampling function which should not be called at all in mute mode - unless I am not completely wrong. Could you do the valgrind runs twice, with different total run-times. Just to see which of the leaks accumulate. ==21588== ==21588== Process terminating with default action of signal 2 (SIGINT) ==21588==at 0x3F810C485F: poll (in /lib64/libc-2.5.so) ==21588==by 0x41D7B1: apr_pollset_poll (poll.c:504) ==21588==by 0x405846: main (gmond.c:1269) --21588-- Discarding syms at 0x4D41000-0x4F4C000 in /lib64/libnss_files-2.5.so due to munmap() ==21588== ==21588== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 1) --21588-- --21588-- supp:5 Fedora-Core-6-hack3-ld25 ==21588== malloc/free: in use at exit: 740,602 bytes in 1,190 blocks. ==21588== malloc/free: 2,574 allocs, 1,384 frees, 946,209 bytes allocated. ==21588== ==21588== searching for pointers to 1,190 not-freed blocks. ==21588== checked 479,904 bytes. ==21588== ==21588== 5 bytes in 1 blocks are still reachable in loss record 1 of 16 ==21588==at 0x4A05809: malloc (vg_replace_malloc.c:149) ==21588==by 0x4111FF: cfg_init (confuse.c:1087) ==21588==by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) ==21588==by 0x405529: process_configuration_file (gmond.c:180) ==21588==by 0x405627: main (gmond.c:1815) ==21588== I think this is a one-time alloc from reading the config file. ==21588== ==21588== 19 bytes in 4 blocks are still reachable in loss record 2 of 16 ==21588==at 0x4A05809: malloc (vg_replace_malloc.c:149) ==21588==by 0x3F810750E1: strndup (in /lib64/libc-2.5.so) ==21588==by 0x40806A: hash_lookup (metrics.c:151) ==21588==by 0x408D75: bytes_out_func (metrics.c:425) ==21588==by 0x40418C: Ganglia_collection_group_collect (gmond.c:1540) ==21588==by 0x404FC8: process_collection_groups (gmond.c:1662) ==21588==by 0x40600E: main (gmond.c:1913) ==21588== Now, this one is from bytes_out_func. Likely a one-time allcation. How many network interfaces has that system got? What are they named? And I wonder why it is called at all in mute mode. ==21588== ==21588== 22 bytes in 2 blocks are still reachable in loss record 3 of 16 ==21588==at 0x4A05809: malloc (vg_replace_malloc.c:149) ==21588==by 0x406740: gengetopt_strdup (cmdline.c:64) ==21588==by 0x40689E: cmdline_parser (cmdline.c:100) ==21588==by 0x4055BD: main (gmond.c:1780) ==21588== One-time allocation. ==21588== ==21588== 56 bytes in 1 blocks are still reachable in loss record 4 of 16 ==21588==at 0x4A05809: malloc (vg_replace_malloc.c:149) ==21588==by 0x4111D2: cfg_init (confuse.c:1083) ==21588==by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) ==21588==by 0x405529: process_configuration_file (gmond.c:180) ==21588==by 0x405627: main (gmond.c:1815) ==21588== One-time allocation. ==21588== ==21588== 192 bytes in 4 blocks are still reachable in loss record 5 of 16 ==21588==at 0x4A05809: malloc (vg_replace_malloc.c:149) ==21588==by 0x408057: hash_lookup (metrics.c:144) ==21588==by 0x408D75: bytes_out_func (metrics.c:425) ==21588==by 0x40418C: Ganglia_collection_group_collect (gmond.c:1540) ==21588==by 0x404FC8: process_collection_groups (gmond.c:1662) ==21588==by 0x40600E: main (gmond.c:1913) ==21588== See my comment above. That looks like 4 net_dev_stats structures. Likely one-time allcations. But should not happen at all in mute mode. Are you running in 32-bit or 64-bit mode? Seems we can save 8-bytes per struct by better sorting the members. ==21588== ==21588== 192 bytes in 1 blocks are still reachable in loss record 6 of 16 ==21588==at 0x4A05809: malloc (vg_replace_malloc.c:149) ==21588==by 0x41BDC1: apr_allocator_create (apr_pools.c:90) ==21588==by 0x41C55C: apr_pool_initialize (apr_pools.c:506) ==21588==by 0x41A7C4: apr_initialize (start.c:55) ==21588==by 0x40EC9F: Ganglia_pool_create (libgmond.c:494) ==21588==by 0x4055DA
[Ganglia-developers] Sanitized versin of linux/metrics.c for 3.1.x
Hi, I just checked in a sanitized version of linux/metrics.c for the 3.1.x stable branch. The code to remove bogus spikes in networking has been #ifdef-ed, as it uses assumptions that are local to certain setups. I also added some FIXME comments on stuff that should be rewritten in the future. - per-interface network metrics - consolidate funtions reading /proc/meminfo and /proc/stat Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [patch] change privateclusters auth header to include clustername
Hi Ramon, looks harmless enough. Could you make a similar patch against trunk please? From my side +1 for both trunk and 3.0.X Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Ramon Bastiaans [EMAIL PROTECTED] To: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Thursday, March 6, 2008 11:59:36 AM Subject: [Ganglia-developers] [patch] change privateclusters auth header to include clustername Hi, I've made a little patch to the webfrontend of 3.0.7. The problem is that Ganglia always says Ganglia Private Cluster, for ALL private clusters in the authentication header. This way you can't let Firefox or Internet Exporer remember a different password for each cluster. Since the Firefox password manager for example associates the password with the string in the authentication header, you will have to keep on entering your individual private cluster password again and again. I have now changed it to include the cluster name in the authentication header. This way you can now let your browser save/remember/cache different passwords for each individual cluster. Cheers, - Ramon. -- ing. R. Bastiaans Systems Programmer / High Performance Computing Visualisation / SARA Computing and Networking Services Kruislaan 415 PO Box 194613 1098 SJ Amsterdam 1090 GP Amsterdam P.+31 (0)20 592 3000 F.+31 (0)20 668 3167 --- There are really only three types of people: Those who make things happen, those who watch things happen and those who say, What happened? -Inline Attachment Follows- --- auth.php.org 2008-03-06 11:56:09.542153567 +0100 +++ auth.php 2008-03-06 11:54:27.261229406 +0100 @@ -30,7 +30,11 @@ #--- function authenticate() { - header(WWW-authenticate: basic realm=\Ganglia Private Cluster\); + global $clustername; + + $auth_header = WWW-authenticate: basic realm=\Private Ganglia cluster: . $clustername . \; + + header( $auth_header ); header(HTTP/1.0 401 Unauthorized); #print URL=\../?c=\; print You are unauthorized to view the details of this Cluster ; -Inline Attachment Follows- - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -Inline Attachment Follows- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [patch] change privateclusters auth header to include clustername
Hi Ramon, unless someone beats me, I will check it into trunk later today. For 3.0.X we need more votes :-) Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Ramon Bastiaans [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Thursday, March 6, 2008 1:38:38 PM Subject: Re: [Ganglia-developers] [patch] change privateclusters auth header to include clustername Hi Martin, The patch should also work with trunk (justed tested), seems that code hasn't changed much.. ;) - Ramon. Martin Knoblauch wrote: Hi Ramon, looks harmless enough. Could you make a similar patch against trunk please? From my side +1 for both trunk and 3.0.X Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Ramon Bastiaans To: ganglia-developers@lists.sourceforge.net Sent: Thursday, March 6, 2008 11:59:36 AM Subject: [Ganglia-developers] [patch] change privateclusters auth header to include clustername Hi, I've made a little patch to the webfrontend of 3.0.7. The problem is that Ganglia always says Ganglia Private Cluster, for ALL private clusters in the authentication header. This way you can't let Firefox or Internet Exporer remember a different password for each cluster. Since the Firefox password manager for example associates the password with the string in the authentication header, you will have to keep on entering your individual private cluster password again and again. I have now changed it to include the cluster name in the authentication header. This way you can now let your browser save/remember/cache different passwords for each individual cluster. Cheers, - Ramon. -- ing. R. Bastiaans Systems Programmer / High Performance Computing Visualisation / SARA Computing and Networking Services Kruislaan 415 PO Box 194613 1098 SJ Amsterdam 1090 GP Amsterdam P.+31 (0)20 592 3000 F.+31 (0)20 668 3167 --- There are really only three types of people: Those who make things happen, those who watch things happen and those who say, What happened? -Inline Attachment Follows- --- auth.php.org 2008-03-06 11:56:09.542153567 +0100 +++ auth.php 2008-03-06 11:54:27.261229406 +0100 @@ -30,7 +30,11 @@ #--- function authenticate() { - header(WWW-authenticate: basic realm=\Ganglia Private Cluster\); + global $clustername; + + $auth_header = WWW-authenticate: basic realm=\Private Ganglia cluster: . $clustername . \; + + header( $auth_header ); header(HTTP/1.0 401 Unauthorized); #print URL=\../?c=\; print You are unauthorized to view the details of this Cluster ; -Inline Attachment Follows- - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -Inline Attachment Follows- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- ing. R. Bastiaans Systems Programmer / High Performance Computing Visualisation / SARA Computing and Networking Services Kruislaan 415 PO Box 194613 1098 SJ Amsterdam 1090 GP Amsterdam P.+31 (0)20 592 3000 F.+31 (0)20 668 3167 --- There are really only three types of people: Those who make things happen, those who watch things happen and those who say, What happened? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [patch] change privateclusters auth headerto include clustername
Hi, what was the exact process? We need +2 for checkins into both trunk and 3.0.x, or just 3.0.x? For now I will abstain from checking Ramons patch into trunk. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Brad Nicholes [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED]; ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net; Ramon Bastiaans [EMAIL PROTECTED] Sent: Thursday, March 6, 2008 4:57:39 PM Subject: Re: [Ganglia-developers] [patch] change privateclusters auth headerto include clustername -1 for now. The concern that I have is that by injecting the name of the cluster as it is pulled from the query string, seems a little dangerous. This would allow the realm to be altered in any way by just modifying the query string. Not sure if that is a real issue or not, but it seems dangerous. Can anybody else clarify this more? Brad On 3/6/2008 at 5:28 AM, in message [EMAIL PROTECTED], Martin Knoblauch wrote: Hi Ramon, looks harmless enough. Could you make a similar patch against trunk please? From my side +1 for both trunk and 3.0.X Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Ramon Bastiaans To: ganglia-developers@lists.sourceforge.net Sent: Thursday, March 6, 2008 11:59:36 AM Subject: [Ganglia-developers] [patch] change privateclusters auth header to include clustername Hi, I've made a little patch to the webfrontend of 3.0.7. The problem is that Ganglia always says Ganglia Private Cluster, for ALL private clusters in the authentication header. This way you can't let Firefox or Internet Exporer remember a different password for each cluster. Since the Firefox password manager for example associates the password with the string in the authentication header, you will have to keep on entering your individual private cluster password again and again. I have now changed it to include the cluster name in the authentication header. This way you can now let your browser save/remember/cache different passwords for each individual cluster. Cheers, - Ramon. -- ing. R. Bastiaans Systems Programmer / High Performance Computing Visualisation / SARA Computing and Networking Services Kruislaan 415 PO Box 194613 1098 SJ Amsterdam 1090 GP Amsterdam P.+31 (0)20 592 3000 F.+31 (0)20 668 3167 --- There are really only three types of people: Those who make things happen, those who watch things happen and those who say, What happened? -Inline Attachment Follows- --- auth.php.org 2008-03-06 11:56:09.542153567 +0100 +++ auth.php 2008-03-06 11:54:27.261229406 +0100 @@ -30,7 +30,11 @@ #--- function authenticate() { - header(WWW-authenticate: basic realm=\Ganglia Private Cluster\); + global $clustername; + + $auth_header = WWW-authenticate: basic realm=\Private Ganglia cluster: . $clustername . \; + + header( $auth_header ); header(HTTP/1.0 401 Unauthorized); #print URL=\../?c=\; print You are unauthorized to view the details of this Cluster ; - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia 3.1 wish list...
Hi, wouldn't it be time to fork off Ganglia WEB Frontent TNG(tm) and put the old stuff into maintenance? Maybe for 3.1.1? It seems there is a lot of cool stuff that can be done, but it likely will destabilyze the frontend for a while? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Ramon Bastiaans [EMAIL PROTECTED] To: Brad Nicholes [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Friday, February 29, 2008 2:30:36 PM Subject: Re: [Ganglia-developers] Ganglia 3.1 wish list... Oh and: web) * session usage for storing settings, filters etc - instead of all the ugly GETing and parsing of the URL * more advanced addon/plugin capabilities (executing custom .php code from with Ganglia's default templates/pages) - Ramon. Ramon Bastiaans wrote: Unfortunately I can't be there, would be fun to meet some of you. I would like to suggest the following for the wishlist though: * License sorting of all the components Since the Debian packages for example are no longer maintained because of licensing conflicts among the different components. For the web interface: * More fancy DHTML and Javascript stuff, we could make it look pretty ;) * Ajax - Could only reload graphs etc when really needed, improving performance \* Could for example only reload host metric graphs when metric type is changed, leaving the rest, etc * PHP5 as new requirement Hope you guys have fun and someone takes pictures of the meeting. ;) Cheers, - Ramon. Brad Nicholes wrote: Here is the latest Ganglia 3.1 wish list. We will be discussing this list during the Ganglia meeting. Brad -- ing. R. Bastiaans Systems Programmer / High Performance Computing Visualisation / SARA Computing and Networking Services Kruislaan 415PO Box 194613 1098 SJ Amsterdam1090 GP Amsterdam P.+31 (0)20 592 3000 F.+31 (0)20 668 3167 --- There are really only three types of people: Those who make things happen, those who watch things happen and those who say, What happened? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- ing. R. Bastiaans Systems Programmer / High Performance Computing Visualisation / SARA Computing and Networking Services Kruislaan 415PO Box 194613 1098 SJ Amsterdam1090 GP Amsterdam P.+31 (0)20 592 3000 F.+31 (0)20 668 3167 --- There are really only three types of people: Those who make things happen, those who watch things happen and those who say, What happened? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia 3.1 wish list...
Original Message From: Jesse Becker [EMAIL PROTECTED] To: Ramon Bastiaans [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Sent: Friday, February 29, 2008 7:53:20 PM Subject: Re: [Ganglia-developers] Ganglia 3.1 wish list... On Fri, Feb 29, 2008 at 8:21 AM, Ramon Bastiaans wrote: * PHP5 as new requirement Are there any particular requirements to move to PHP5? Right now, the existing code works with PHP4 and PHP5. Dropping support for PHP4 would also mean dropping native support for distributions of moderate age (RHEL4, CentOS4, et al). The main reason would IMO be to prepare for PHP6 which will finally remove some long deprecated PHP4 features (like the global arrays). Cheers Martin - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Fix for bogus overflows in linux/metrics.c
Hi, I just checked in a fix to handle bogus overflow events on certain BCM NICs using the bnx2 driver. The fix is to drop any samples where an overflow is detected on any of the four counters. This will work fine in 64-bit mode, as overflow events are relatively rare (once in 5000 years on a fully saturated 1Gbit NIC). But in 32-bit thaey may happen a lot more frequent (like every 40 seconds). My fix may actually drop to many valid samples. To help this out, one could sample at higher rates like 5 or 10 seconds. This definitely needs review and discussion. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia 3.1 wish list...
Hi folks, before I turn off the light, just one or two comments below. See/hear you tomorrow Martin From: Brad Nicholes [EMAIL PROTECTED] To: Peter Mui [EMAIL PROTECTED]; ganglia-developers@lists.sourceforge.net Sent: Thursday, February 28, 2008 11:43:33 PM Subject: [Ganglia-developers] Ganglia 3.1 wish list... Here is the latest Ganglia 3.1 wish list. We will be discussing this list during the Ganglia meeting. Brad -Inline Attachment Follows- Done -- - C module interface as DSO - mod_python Python module interface - Dynamically link libraries like expat, apr, libconfuse - Add TITLE attribute to the XDR data to communicate a human readable name - Add a GROUP attribute to the XDR data This would allow metrics to declare the category that they belong to. The category should be added at the metric definition level and not in the .conf file. - Reimplement the built in metrics as C interface modules - A cleaner XDR encoding: The current encoding scheme embeds too much information about which metrics gmond collects. The encoding scheme should treat all metrics the same: as just a metric. The encoding should not care if the metric is metric_cpu_speed, metric_swap_total or a user-defined gmetric one. - Flexible method of adding extra metric metadata. We could include extra metadata, not just alias/title. For example, some metrics have a natural minimum and maximum value. Perhaps coming up with an extendable way of encoding metric metadata so future changes can be included without loosing backwards compatibility. - Re-organization of RPM packages (libganglia, gmond-python ?) GMond To Do - Gmond module repository - Implement a perl module interface - Implement a PHP module interface - Implement a Ruby module interface - Metric packing: Simply that a UDP packet can contain multiple metrics (using the usual XDR stream decoding) up to the size of a UDP packet. This would help reduce the overheads when sending many metric updates concurrently. It also preserves the current gmond behaviour where it sends metric updates in a single UDP packet. - Support for counters (metrics with +ve slope) This shouldn't require much work (from memory, make sure the slope-type information is preserved and patch gmetad to create RRD files with the correct options). Currently Ganglia doesn't actually support custom counter metrics, which is an awkward limitation. - gmond switching to a non-blocking IO model. If there's a large number of metric updates then gmond must process them quickly or they will be lost. If this happens whilst gmond is sending XML data to gmetad there's may be a delay, increasing the risk of metric update messages being lost. Switching to a non-blocking IO model would allow gmond to respond preferentially to the incoming UDP messages. -* Remove the 4T limit on ganglia metric results -* Modify all byte count metric to 8 bytes ints GMetad To Do -- - Support for new RRDTool which allows graphs to have dynamic sizes - Gilad's stacked graphs - Changing the units of default metrics to their base For example disk_free's base unit should be bytes, not GB as rrdtool will automatically append G,M,K etc.) - Better support for bigger less frequent updates one packet every 20 seconds per host for all data? - Multi PB disk limit - Better on disk RRD perf (tmpfs is an OK workaround) -* Name RRD directories based on UUID generated by client gmond has of MAC address? something else? So that renaming hosts, updating DNS or hosts files don't result in history for the phyiscal gmond client being lost. - Integration of gexec/authd ? - Could be interesting as some kind of lightweight queueing system. - Expand gstat nodelist parameter query options (i.e. return all hosts with 10% iowait, etc.) - Add some event notification mechanism if metrics go over a limit. But do we want to implement another Nagios? - Interface stats in bits? Self awareness of interface capablity for % util stats for network. - Link utilization would be a great metrics. - I am not sure about the bit-stats. For the stuff I do, throughput in bytes/sec makes more sense than a bit-rate. But I can see the comms people have a different view - the network stats should be per interface - Something like a unique per-gmond instance identifier To help with multi-homing and DNS issues and so the IP address is no longer the index key. There was discussion of this under the subject Overriding hostname on the Ganglia-general list. - Give some metrics priority and have them updated more frequently in their RRDs than others. - Allow for some sort of in memory RRD (never written to disk) as an alternative storage for very
Re: [Ganglia-developers] [Ganglia-general] Need a script to remove spikes from network RRDs
- Original Message From: aurbain [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net; ganglia general [EMAIL PROTECTED] Sent: Wednesday, February 27, 2008 5:11:48 PM Subject: Re: [Ganglia-general] Need a script to remove spikes from network RRDs Thanks for the info Martin. So its not a rollover issue after all. By the way, this issue also lives in rhel4u4 32 bit with bnx2 version 1.4.43f interesting. From my reading only the 64-bit version was affected. Anyway, I have a fix which just throws away any samples where an overflow, correct or bogus, occurs. That is definitely fine in 64-bit land. Even at full speed, a 1GBit NIC would overflow only after 5000 years. Nothing that I worry about much :-) Even 5 years for a future 1Tbit NIC is not that bad... But in 32-bit, a 1Gbit NIC could overflow every 40 seconds. And that is very short. Cheers Martin Martin Knoblauch wrote: - Original Message From: aurbain To: Martin Knoblauch Cc: ganglia-developers@lists.sourceforge.net; ganglia general Sent: Tuesday, February 26, 2008 8:25:13 PM Subject: Re: [Ganglia-general] Need a script to remove spikes from network RRDs Happens only on 64-bit systems. Now, my fix kills the generation of the spikes, but my RRD database is now tainted for another 12 month. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Need a script to remove spikes from network RRDs
Hi, one of my clusters has, due to flakey hw/driver combination, spikes in the PB/sec range in the network metrics. This makes viewing the larger timescales pretty much useless (for the next week, month, year) . Does anybody have a script to repair such rrds? Which of the fields need to be touched? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Need a script to remove spikes from network RRDs
- Original Message From: aurbain [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net; ganglia general [EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 8:25:13 PM Subject: Re: [Ganglia-general] Need a script to remove spikes from network RRDs I'm getting these spikes in multiple boxes, specifically the ones which do a lot of network traffic. RHEL4u[4,6], ganglia 2.0.6 Perhaps a rollover bug in the network code in gmond? That was the first thing I suspected and I have since modified the overflow mechanism in my development version to just ignore samples when the overflow happens. After instrumentation it showed that the data in /proc/net/dev was bogus. This is due to this: http://www.mail-archive.com/[EMAIL PROTECTED]/msg59062.html Happens only on 64-bit systems. Now, my fix kills the generation of the spikes, but my RRD database is now tainted for another 12 month. Cheers Martin Martin Knoblauch wrote: Hi, one of my clusters has, due to flakey hw/driver combination, spikes in the PB/sec range in the network metrics. This makes viewing the larger timescales pretty much useless (for the next week, month, year) . Does anybody have a script to repair such rrds? Which of the fields need to be touched? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/ganglia-general - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Need a script to remove spikes from network RRDs
Original Message From: john allspaw [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED]; ganglia-developers@lists.sourceforge.net Cc: ganglia general [EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 7:38:07 PM Subject: Re: [Ganglia-general] Need a script to remove spikes from network RRDs Here is what comes with rrdtool, I've used it with some success... http://oss.oetiker.ch/rrdtool/pub/contrib/removespikes.tar.gz -john cool. Almost what I need. It seems to be a bit to smart for my purpose, but making things stupid is easy :-) Cheers Martin - Original Message From: Martin Knoblauch To: ganglia-developers@lists.sourceforge.net Cc: ganglia general Sent: Tuesday, February 26, 2008 10:02:20 AM Subject: [Ganglia-general] Need a script to remove spikes from network RRDs Hi, one of my clusters has, due to flakey hw/driver combination, spikes in the PB/sec range in the network metrics. This makes viewing the larger timescales pretty much useless (for the next week, month, year) . Does anybody have a script to repair such rrds? Which of the fields need to be touched? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/ganglia-general Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/ganglia-general - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] 3.0.7?
Hi Bernard, what are your plans for 3.0.7? Any time now ? :-) If not, I would like to commit a small patch to enable syslogging error mesages for gmond. But it can wait for 3.0.8. diff -up ~/ganglia-3.0.6.200802141157/gmond/gmond.c gmond/ --- /home/ftt5aa7/ganglia-3.0.6.200802141157/gmond/gmond.c Thu Feb 14 20:58:58 2008 +++ gmond/gmond.c Mon Feb 25 13:26:43 2008 @@ -27,6 +27,7 @@ #include dtd.h /* the DTD definition for our XML */ #include g25_config.h /* for converting old file formats to new */ #include daemon_init.h +#include syslog.h /* When this gmond was started */ apr_time_t started; @@ -191,6 +192,7 @@ process_configuration_file(void) cleanup_threshold = cfg_getint( tmp, cleanup_threshold); } +extern int daemon_proc; /* defined in error.c */ static void daemonize_if_necessary( char *argv[] ) { @@ -213,6 +215,8 @@ daemonize_if_necessary( char *argv[] ) if(!args_info.foreground_flag should_daemonize !debug_level) { apr_proc_detach(1); + openlog(argv[0],LOG_PID,LOG_DAEMON); + daemon_proc = 1; } } Also for 3.0.8, I would like to drop in the trunk version of libmetrics/linux/metrics.c. It [will soon] contain a fix for a nasty overflow problem in some Braodcom NICs (BCM5708, bnx2 driver) that leads to spurious petabyte spikes in the network metrics. The problem is fixed in later driver releases, but is present in some popular enterprise distros like RHEL4. The risk is minimal and I am running it for more than a week, but it is definitely not for 3.0.7. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
- Original Message From: Jesse Becker [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Wednesday, February 20, 2008 4:01:26 PM Subject: Re: [Ganglia-developers] Memory leak in gmond On Feb 19, 2008 7:39 PM, Martin Knoblauch wrote: - Original Message From: Jesse Becker To: Ganglia Developers Sent: Tuesday, February 19, 2008 11:25:54 PM Subject: Re: [Ganglia-developers] Memory leak in gmond I'm not sure if this is right--I've only take a really quick check in libmetrics/linux/metrics.c, and my C-fu is rusty. It looks like strndup() is called in linux/metrics.c:hash_lookup (about line 131) to dupliate an interface name, which is included in the stats structure as stats-name. The net_dev_stats function will return this struct. The function is called in a number of places pkts_in_func, pkts_out_func, bytes_out_func and bytes_in_func. The variable *ns is assigned the output of hash_lookup (e.g. the struct). Since the 'name' element is malloc()ed, but not explictly freed, it will not go away when *ns goes out of scope. This is the leak, isn't it? All four of these functions are very similar, and need to be fixed if this is the case. Or did I miss something obvious? :) Lines 137, 148 and 159 ? :-) I saw those. :-P I meant after the struct has been returned, outside the function, the memory is never freed. Inside that function, it's okay. We actually had a memory leak in that area. The four networking functions would alllocate and then leak the device-names. But that has been fixed in both trunk and 3.0.X about 10 days ago. The memory allocated in line 151 is never freed, indeed. But it is only allocated once per interface and stays alive for the entire lifetime of the gmond process. So, it is not leaked. Ah, that makes more sense, especially if those variables exist for the lifetime of the program. Yup. It is really important to know that the lifetime of those structures. We actually might have a problem in the case when hot-unplugging network cards. But I guess that the resulting leak might be tolerable :-) So, I've just run gmond under valgrind and duma (a fork of the old Electric Fence memory debugger), and I can't seem to reproduce the problem now. Neither one of them is showing any obvious leaks, at least not in the 15 minute tests I've run. The test system(s) are CentOS4.6 boxes. These things happen. Cheers Martin - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond Spoof memory leak fix
Hi, if you resend it as an attachment, I would apply the fix. Cheers Martin PS: How is life at SGI nowadays? -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Martin Hicks [EMAIL PROTECTED] To: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Wednesday, February 20, 2008 7:06:41 PM Subject: [Ganglia-developers] gmond Spoof memory leak fix Hi, Here's a patch against ganglia-3.0.6.200802141157 that fixes a memory leak when using user defined metrics with spoofing. The problem was that the spmetric was being copied out, ignoring the spheader. The strings that were allocated inside the spheader were dropped. mh --- ganglia-3.0.6.200802141157/gmond/gmond.c2008-02-14 14:58:58.0 -0500 +++ ganglia-3.0.6.200802141157.mod/gmond/gmond.c2008-02-20 11:46:23.0 -0500 @@ -831,11 +831,13 @@ Ganglia_message_save( Ganglia_host *host /* Copy in the data */ // Yemi if(message-id == spoof_metric){ -// Store data as regular gmetric in hash table!! + /* Store data as regular gmetric in hash table!! + * Free the Spoof-related strings. + */ - metric-message.id = metric_user_defined; + metric-message.id = metric_user_defined; metric-message.Ganglia_message_u.gmetric = message-Ganglia_message_u.spmetric.gmetric; - + xdr_free(xdr_Ganglia_spoof_header, message-Ganglia_message_u.spmetric.spheader); }else{ memcpy((metric-message), message, sizeof(Ganglia_message)); - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond Spoof memory leak fix
- Original Message From: Martin Hicks [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Wednesday, February 20, 2008 7:33:32 PM Subject: Re: [Ganglia-developers] gmond Spoof memory leak fix On Wed, Feb 20, 2008 at 10:27:33AM -0800, Martin Knoblauch wrote: Hi, if you resend it as an attachment, I would apply the fix. You can apply it with my blabbering at the beginning. :) patch ignores the stuff before the --- The patch is attached for your convenience. My problem is, that my MUA just garbles the white space. So, I prefer inlined patches. Cheers Martin PS: How is life at SGI nowadays? Seems okay. I just got here recently. :) I left about 10 years ago. Different place at that time, I think. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond Spoof memory leak fix
btw. the fix does not apply to trunk. The code looks quite different there. Someone familiar with the spoofing stuff may want to check whether the leak exists in trunk and needs fixing. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Martin Knoblauch [EMAIL PROTECTED] To: Martin Hicks [EMAIL PROTECTED]; Bernard Li [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Wednesday, February 20, 2008 7:58:20 PM Subject: Re: [Ganglia-developers] gmond Spoof memory leak fix Bernard, all, I just committed the fix for the spoofing leak from Martin Hicks. Can you run a [final] snapshot for 3.0.7? I have something brewing to fix the petabyte/sec spikes that one of our customers is seeing, but that needs more testing and can wait for 3.0.8. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Martin Knoblauch To: Martin Hicks Cc: Ganglia Developers Sent: Wednesday, February 20, 2008 7:44:04 PM Subject: Re: [Ganglia-developers] gmond Spoof memory leak fix - Original Message From: Martin Hicks To: Martin Knoblauch Cc: Ganglia Developers Sent: Wednesday, February 20, 2008 7:33:32 PM Subject: Re: [Ganglia-developers] gmond Spoof memory leak fix On Wed, Feb 20, 2008 at 10:27:33AM -0800, Martin Knoblauch wrote: Hi, if you resend it as an attachment, I would apply the fix. You can apply it with my blabbering at the beginning. :) patch ignores the stuff before the --- The patch is attached for your convenience. My problem is, that my MUA just garbles the white space. So, I prefer inlined patches. Cheers Martin PS: How is life at SGI nowadays? Seems okay. I just got here recently. :) I left about 10 years ago. Different place at that time, I think. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Commiting to the maintenance branch (was:Re: 3.0.7 release)
Hi Brad, you are right. 3.0.X should only take [critical] bug fixes by now. Maybe some obvious optimization. New functionality belongs into trunk. Rules for the web-interface might be more relaxed, as changes there do not endanger the monitoring-core framework. But that is my personal feeleing. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Brad Nicholes [EMAIL PROTECTED] To: Ulf Lange [EMAIL PROTECTED]; ganglia-developers@lists.sourceforge.net Sent: Wednesday, February 20, 2008 4:46:36 PM Subject: [Ganglia-developers] Commiting to the maintenance branch (was:Re: 3.0.7 release) Forgive me if I have missed something here, but are these patches intended for the 3.0.x branch or for trunk? As per Bernard's response below, the 3.0.x branch is in maintenance mode only. All new feature should be directed at trunk and submitted as unified diff's rather than modified files. If a patch is determined to be a critical bug fix for a previous version, it will be backported to the maintenance branch at that point. Since I am unable to view the bug in Bugzilla (due to some kind of bugzilla issue), I am not exactly sure what these patches are trying to accomplish. So again, forgive me if I have missed something. Brad On 2/19/2008 at 11:26 PM, in message [EMAIL PROTECTED], Ulf Lange wrote: Hi, here are the patches for the 3.0.x snapshot from last week. It would be okay, to apply the patches at 3.0.8. I' m monitoring a lot of AIX Servers and they seem to work well with the patch from Michael. Part 1/2 Regards Ulf Jesse Becker schrieb: Any chance you could re-post them as .gz or .zip files, instead of .rar? On Feb 19, 2008 2:31 PM, Ulf Lange wrote: Hi, I don' t want to get on your nerves, but can somebody checkin the patches from Micheal(bugid 146)? I included the patched files in my last two mails. Regards, Ulf Ulf Lange schrieb: Hi, I' ve patched the current release from http://therealms.org/oss/ganglia/testing/ with the patches from Micheal Perzl. Up to now, I was not able to test them (no time) as for AIX. The problem is that the AIX rpcgen is buggy (see http://www.perzl.org/ganglia/ganglia-p5metrics-v3.0.5.html), so you need to generate protocol_xdr.c and protocol.h manualy. One thing I' ve not applied from the patch was the #define SLEEP_TIME 1 in test-metrics.c. The patched files should work on AIX, as far as the protocol_xdr.c and protocol.h are created. Maybe you can already work with the patch. Compiled with: gcc -v Reading specs from /opt/freeware/lib/gcc-lib/powerpc-ibm-aix5.3.0.0/3.3.2/specs Configured with: ../configure --with-as=/usr/bin/as --with-ld=/usr/bin/ld --disable-nls --enable-languages=c,c++ --prefix=/opt/freeware --enable-threads --enable-version-specific-runtime-libs --host=powerpc-ibm-aix5.3.0.0 Thread model: aix gcc version 3.3.2 # ./configure --disable-shared --enable-static Part 1/2 Regards, Ulf Bernard Li schrieb: Hi Ulf: On 2/13/08, Ulf wrote: you know, my never ending wish is the integration of http://wtf.ath.cx/ganglia-dev/custom_graph_addon.tar.gz . The integration with 3.0.6 still works fine. The 3.0.x branch is frozen for new features -- it is a maintenance branch for security/major bugfixes only. All new features/patches should be submitted against trunk. After release of 3.0.7 I' ll test the versions with AIX and Solaris, too. The AIX version is probably without these patches http://www.perzl.org/ganglia/ . Well, if someone can ack the patch attached to this bug, I can check it into trunk: http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=146 In about two weeks, I' ll try the latest 3.1.x snapshot with AIX and Solaris. Thanks, Bernard - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01
[Ganglia-developers] Consolidation of network metrics functions for Linux
Hi, I just checked into trunk a first cut on removing duplicate code in the linux/metric.c file. I started working on the network functions, because I am also trying to track down a problem where we are seeing petabyte/sec spikes every few hours, which I attribute to some problem in the overflow handling for the counters. Observer on x86_64 in 64-bit mode. First measure is to do all important math integer only, next I may decide to just drop samples where counters are overflowing. I also checked in two small fixes to the test-metrics.c code. A missing , and a logically wrong #ifdef CYGWIN. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
- Original Message From: Jesse Becker [EMAIL PROTECTED] To: Ganglia Developers ganglia-developers@lists.sourceforge.net Sent: Tuesday, February 19, 2008 11:25:54 PM Subject: Re: [Ganglia-developers] Memory leak in gmond I'm not sure if this is right--I've only take a really quick check in libmetrics/linux/metrics.c, and my C-fu is rusty. It looks like strndup() is called in linux/metrics.c:hash_lookup (about line 131) to dupliate an interface name, which is included in the stats structure as stats-name. The net_dev_stats function will return this struct. The function is called in a number of places pkts_in_func, pkts_out_func, bytes_out_func and bytes_in_func. The variable *ns is assigned the output of hash_lookup (e.g. the struct). Since the 'name' element is malloc()ed, but not explictly freed, it will not go away when *ns goes out of scope. This is the leak, isn't it? All four of these functions are very similar, and need to be fixed if this is the case. Or did I miss something obvious? :) Lines 137, 148 and 159 ? :-) The memory allocated in line 151 is never freed, indeed. But it is only allocated once per interface and stays alive for the entire lifetime of the gmond process. So, it is not leaked. Cheers Martin - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
Hi folks, ACK from my side too. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Kumar Vaibhav [EMAIL PROTECTED] To: Bernard Li [EMAIL PROTECTED] Cc: Brad Nicholes [EMAIL PROTECTED]; ganglia-developers@lists.sourceforge.net; Carlo Marcelo Arenas Belon [EMAIL PROTECTED]; Martin Knoblauch [EMAIL PROTECTED] Sent: Monday, February 18, 2008 5:07:49 AM Subject: Re: [Ganglia-developers] Memory leak in gmond Hi Bernard, I think the problem is solved. I don't see any rise in memory of gmond for the last three days. Thanks for the fix. I will be waiting for 3.0.7 with this patch. Once again thanks a lot. Vaibhav Bernard Li wrote: Hi Vaibhav: On 2/15/08, Kumar Vaibhav wrote: I am testing the new release on my systems. Initial results are encouraging. I can tell the final words after weekend since I am keeping it for the test over the weekend. Sure, please update us after the weekend, we'll likely release 3.0.7 then. Cheers, Bernard - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
Hi, after looking at one of my employerss customers installations, it definitely seems that metrics-collecting/non-mute gmonds are growing (substantially) over time. Pure listeners seem to be unaffected. If I remember correctly, Kumars valgrind traces found that strndup might allocate later leaked memory. If I look at the 3.0.4 libmetrics/linux/metrics.c I have the strong feeling that all four network functions are careless about the memory allocated by strndup: 217: char *devname, *src; 228: devname = strndup(src, n); 238: net_dev_stats *ns = hash_lookup(devname, 1, 305: char *devname, *src; 316: devname = strndup(src, n); 326: net_dev_stats *ns = hash_lookup(devname, 1, 393: char *devname, *src; 404: devname = strndup(src, n); 414: net_dev_stats *ns = hash_lookup(devname, 1, 481: char *devname, *src; 492: devname = strndup(src, n); 502: net_dev_stats *ns = hash_lookup(devname, 1, Have to look at it some more. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Kumar Vaibhav [EMAIL PROTECTED] To: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net Sent: Saturday, February 9, 2008 8:59:18 AM Subject: Re: [Ganglia-developers] Memory leak in gmond Carlo Marcelo Arenas Belon wrote: On Tue, Jan 22, 2008 at 04:17:07PM +0530, Kumar Vaibhav wrote: I am using ganglia-3.0.5 on a woodcrest processor cluster. and I see that after running for weeks the memory consumption of the gmond process is something about 400 MB. did you check what was the size 1 hour after all gmond proceses in your cluster were started?, if you are using multicast and have a large number of nodes/metrics then that is the ammount of memory that is needed to hold all those metrics from all nodes most likely. I Checked it . The memory size increases with Time. i Tried ps -eo cmd,rss and can see the size of gmond increases with time. ==2381== LEAK SUMMARY: ==2381==definitely lost: 69 bytes in 16 blocks. ==2381== possibly lost: 0 bytes in 0 blocks. that means there is no memory leak (execpt for 69 bytes) This is so because I had run it for few minutes only. ==2381==still reachable: 1,446,276 bytes in 1,463 blocks. that is the RSS of your process by memory I mean RSS only. Here are some new tests I have done. I isolated two nodes of the cluster by changing their multicast address. On one I run gmond in mute mode and on one in deaf mode. The RSS of gmond in deaf node continues to increase. But the RSS of gmond on mute mode stablises after some. time. And it didn't increase for a week. Hope this will help you to solve the problem. Carlo Vaibhav - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
Hi, maybe attached patch (based on 3.0.4) can fix the leak. The daemon runs and reports metrics. It is of course to early to say. When looking at the linux metrics file, I just realized hom much code duplication there is. Basically all funtion-groups that grok the same /proc/xxx files should be rewritten to use common code. This ist true for cpu, load and network. Maybe others. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Martin Knoblauch [EMAIL PROTECTED] To: Kumar Vaibhav [EMAIL PROTECTED]; Carlo Marcelo Arenas Belon [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net Sent: Thursday, February 14, 2008 11:36:37 AM Subject: Re: [Ganglia-developers] Memory leak in gmond Hi, after looking at one of my employerss customers installations, it definitely seems that metrics-collecting/non-mute gmonds are growing (substantially) over time. Pure listeners seem to be unaffected. If I remember correctly, Kumars valgrind traces found that strndup might allocate later leaked memory. If I look at the 3.0.4 libmetrics/linux/metrics.c I have the strong feeling that all four network functions are careless about the memory allocated by strndup: 217: char *devname, *src; 228: devname = strndup(src, n); 238: net_dev_stats *ns = hash_lookup(devname, 1, 305: char *devname, *src; 316: devname = strndup(src, n); 326: net_dev_stats *ns = hash_lookup(devname, 1, 393: char *devname, *src; 404: devname = strndup(src, n); 414: net_dev_stats *ns = hash_lookup(devname, 1, 481: char *devname, *src; 492: devname = strndup(src, n); 502: net_dev_stats *ns = hash_lookup(devname, 1, Have to look at it some more. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Kumar Vaibhav To: Carlo Marcelo Arenas Belon Cc: ganglia-developers@lists.sourceforge.net Sent: Saturday, February 9, 2008 8:59:18 AM Subject: Re: [Ganglia-developers] Memory leak in gmond Carlo Marcelo Arenas Belon wrote: On Tue, Jan 22, 2008 at 04:17:07PM +0530, Kumar Vaibhav wrote: I am using ganglia-3.0.5 on a woodcrest processor cluster. and I see that after running for weeks the memory consumption of the gmond process is something about 400 MB. did you check what was the size 1 hour after all gmond proceses in your cluster were started?, if you are using multicast and have a large number of nodes/metrics then that is the ammount of memory that is needed to hold all those metrics from all nodes most likely. I Checked it . The memory size increases with Time. i Tried ps -eo cmd,rss and can see the size of gmond increases with time. ==2381== LEAK SUMMARY: ==2381==definitely lost: 69 bytes in 16 blocks. ==2381== possibly lost: 0 bytes in 0 blocks. that means there is no memory leak (execpt for 69 bytes) This is so because I had run it for few minutes only. ==2381==still reachable: 1,446,276 bytes in 1,463 blocks. that is the RSS of your process by memory I mean RSS only. Here are some new tests I have done. I isolated two nodes of the cluster by changing their multicast address. On one I run gmond in mute mode and on one in deaf mode. The RSS of gmond in deaf node continues to increase. But the RSS of gmond on mute mode stablises after some. time. And it didn't increase for a week. Hope this will help you to solve the problem. Carlo Vaibhav - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers linux-metrics.diff Description: Binary data - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http
Re: [Ganglia-developers] Memory leak in gmond
Brad, definitely, one of the two patches should go into 3.0.X. Both seem to do the same. See other comments elsewhere. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Brad Nicholes [EMAIL PROTECTED] To: Kumar Vaibhav [EMAIL PROTECTED]; Martin Knoblauch [EMAIL PROTECTED]; Carlo Marcelo Arenas Belon [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net Sent: Thursday, February 14, 2008 4:40:10 PM Subject: Re: [Ganglia-developers] Memory leak in gmond This was already fixed in trunk about a week ago along with several other memory leaks that were more specific to 3.1 rather than 3.0. We should probably just backport the trunk patch to 3.0.7 to maintain consistency. Brad On 2/14/2008 at 6:29 AM, in message [EMAIL PROTECTED], Martin Knoblauch wrote: Hi, maybe attached patch (based on 3.0.4) can fix the leak. The daemon runs and reports metrics. It is of course to early to say. When looking at the linux metrics file, I just realized hom much code duplication there is. Basically all funtion-groups that grok the same /proc/xxx files should be rewritten to use common code. This ist true for cpu, load and network. Maybe others. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Martin Knoblauch To: Kumar Vaibhav ; Carlo Marcelo Arenas Belon Cc: ganglia-developers@lists.sourceforge.net Sent: Thursday, February 14, 2008 11:36:37 AM Subject: Re: [Ganglia-developers] Memory leak in gmond Hi, after looking at one of my employerss customers installations, it definitely seems that metrics-collecting/non-mute gmonds are growing (substantially) over time. Pure listeners seem to be unaffected. If I remember correctly, Kumars valgrind traces found that strndup might allocate later leaked memory. If I look at the 3.0.4 libmetrics/linux/metrics.c I have the strong feeling that all four network functions are careless about the memory allocated by strndup: 217: char *devname, *src; 228: devname = strndup(src, n); 238: net_dev_stats *ns = hash_lookup(devname, 1, 305: char *devname, *src; 316: devname = strndup(src, n); 326: net_dev_stats *ns = hash_lookup(devname, 1, 393: char *devname, *src; 404: devname = strndup(src, n); 414: net_dev_stats *ns = hash_lookup(devname, 1, 481: char *devname, *src; 492: devname = strndup(src, n); 502: net_dev_stats *ns = hash_lookup(devname, 1, Have to look at it some more. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Kumar Vaibhav To: Carlo Marcelo Arenas Belon Cc: ganglia-developers@lists.sourceforge.net Sent: Saturday, February 9, 2008 8:59:18 AM Subject: Re: [Ganglia-developers] Memory leak in gmond Carlo Marcelo Arenas Belon wrote: On Tue, Jan 22, 2008 at 04:17:07PM +0530, Kumar Vaibhav wrote: I am using ganglia-3.0.5 on a woodcrest processor cluster. and I see that after running for weeks the memory consumption of the gmond process is something about 400 MB. did you check what was the size 1 hour after all gmond proceses in your cluster were started?, if you are using multicast and have a large number of nodes/metrics then that is the ammount of memory that is needed to hold all those metrics from all nodes most likely. I Checked it . The memory size increases with Time. i Tried ps -eo cmd,rss and can see the size of gmond increases with time. ==2381== LEAK SUMMARY: ==2381==definitely lost: 69 bytes in 16 blocks. ==2381== possibly lost: 0 bytes in 0 blocks. that means there is no memory leak (execpt for 69 bytes) This is so because I had run it for few minutes only. ==2381==still reachable: 1,446,276 bytes in 1,463 blocks. that is the RSS of your process by memory I mean RSS only. Here are some new tests I have done. I isolated two nodes of the cluster by changing their multicast address. On one I run gmond in mute mode and on one in deaf mode. The RSS of gmond in deaf node continues to increase. But the RSS of gmond on mute mode stablises after some. time. And it didn't increase for a week. Hope this will help you to solve the problem. Carlo Vaibhav
Re: [Ganglia-developers] ganglia-web-3.0.6-1 on SLES10 SP1. Missing Requirement
--- [EMAIL PROTECTED] wrote: Quoting Martin Knoblauch [EMAIL PROTECTED]: Hi Bernard, just by chance I had to install 3.0.6 on Sles10sp1 this week. I got the same problem and installing the ctype package for php5 solved the issue. Cheers Martin Is the presence of the php-ctype package in the RPM database enough to confirm that the ctype_* functions are available to PHP? I'm not familiar with SuSE, but on Red Hat the PHP sub-packages drop bits of configuration in /etc/php.d when they are installed. These contain the needed configuration lines for PHP to install the new modules provided by the sub-package. You could have a situation where a sub-package is installed, but the configuration file has been removed, so it's present in the RPM database but not loaded into PHP. Is there a similar situation in SuSE? alex Alex, in my case the RPM package for php5-ctype was just missing. No broken setup. Installing the RPM solved the issue. As Bernard wrote - we cannot forsee all possible breakages. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Moving all built-in metrics to metric modules...
Hi Brad, that seems to be a pretty useful move. Seems it is time that I really start looking closely at 3.1.x Cheers Martin Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Brad Nicholes [EMAIL PROTECTED] To: ganglia-developers@lists.sourceforge.net; [EMAIL PROTECTED] Sent: Tuesday, December 18, 2007 11:44:45 PM Subject: [Ganglia-developers] Moving all built-in metrics to metric modules... I just committed a rather substantial patch to Ganglia 3.1.0 trunk which will affect the way that gmond 3.1.x is deployed. I am posting this to both the developer list and the general list so that all will be aware of the changes and why they are important. The primary purpose for the patch was to remove all of the built in metrics out of the gmond binary and allow them to be built as loadable modules. The following is a more detailed list of what has changed. Hopefully from a user perspective, gmond will continue to work as it has in the past. But going forward, it will be much more flexible with regards to the core set of metrics. * All built-in metrics have been removed from the gmond binary - A new set of core metric modules have been created that represent the same set metrics that gmond has always gathered. These new core modules are mod_cpu.so, mod_disk.so, mod_load.so, mod_mem.so, mod_net.so, mod_proc.so and mod_sys.so. Each of these modules is basically a wrapper around the metric functions that exist in libmetrics. Being wrappers, they still make the same metric function calls as have always been made. And since libmetrics contains all of the platform specific metric code, the metric function calls made by the core modules will continue to do the right thing for all of the platforms that have been previously supported. - There is also an extra module called core_metrics which contains the heartbeat, location and gexec metrics. Even though this module could be dynamically loaded in the same manner as the others, it is always statically linked simply because gmond would not be able to function properly without these metrics so there is no real reason to allow these metrics to be dynamically loaded. - Some additional configuration has been added to the gmond.conf file. Because the core metrics are now implemented as modules, this requires a module configuration block that instructs gmond to load each module. A set of module blocks has been added to the default gmond.conf file. * All metric specific metadata definitions have been removed from protocol.x - With the refactoring of the XDR data and removal of the builtin metrics, there is no longer any need for XDR to have intimate knowledge of the core metrics. Therefore the metric structure array and enum have been removed and are now part of the core metric modules themselves. * --enable-static-build statically links the core metric modules - Building gmond statically will statically link not only APR, expat and libconfuse, it will also statically link all of the core metric modules into the gmond binary. The result should be a gmond binary that looks and feels just like the old 3.0.x statically linked gmond binary. The one exception is that a module statement is still required in the gmond.conf file. The difference between the module configuration block for dynamically loaded modules and the module blocks for statically linked modules is whether or not a path to the .so is included. The configure script and makefiles have been modified to detect --enable-static-build and build the default gmond.conf file appropriately. * --enable-static-build + --enable-python statically links the python module - One of the downsides of building gmond 3.1.x statically was that doing so would disable all of the dynamically loadable module capability. The reason for this is the need for both gmond and the pluggable modules to dynamically link with libapr1. However, if both --enable-static-build and --enable-python are specified during configure, a gmond binary will be built with mod_python statically linked. This provides gmond with the ability to continue to load and run python metric modules in the same manner as the non-static build. In other words, even though statically linking gmond will disable pluggable C interface modules, python pluggable modules will still continue to work. * All metrics carry a group designation - Now that all metrics have been implemented as loadable modules, the metrics have also been assigned to groups. The XML that is produced by gmond and gmetad will carry an tag that defines which group each metric belongs to. This will allow the web front
Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.0.6 (Foss) released
Bernard, great job from you and the team. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Bernard Li [EMAIL PROTECTED] To: Ganglia General [EMAIL PROTECTED]; Ganglia Developers ganglia-developers@lists.sourceforge.net; [EMAIL PROTECTED] Sent: Monday, December 17, 2007 7:35:09 AM Subject: [Ganglia-general] Ganglia 3.0.6 (Foss) released The Ganglia development team is pleased to release Ganglia 3.0.6 (Foss) which is available for immediate download from: http://sourceforge.net/project/showfiles.php?group_id=43021package_id=35280 This release includes a security fix for web frontend cross-scripting vulnerability. All Ganglia web frontend users are strongly recommended to upgrade to this version. In most cases the version of the frontend does not need to match the version of gmetad and/or gmond -- if problem arises, please drop us a note at [EMAIL PROTECTED] Special thanks to Romain Wartel at CERN for discovering the vulnerability and reporting it to us and to Alex Dean for stepping up with the fix so quickly. Bernard, on behalf of the Ganglia Development Team - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ Ganglia-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/ganglia-general - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Patch to graph.php for bits/sec in network graphs
Hi Bernard, as far as I remember, there has been no more discussion on the topic. Making the units configurable would definitely be an option, but I think that is 3.1 material. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Bernard Li [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: Caleb Epstein [EMAIL PROTECTED]; ganglia-developers@lists.sourceforge.net Sent: Saturday, November 17, 2007 3:22:12 AM Subject: Re: [Ganglia-developers] Patch to graph.php for bits/sec in network graphs Caleb, Martin: Any more discussions regarding this? If not, I would probably just leave it as is and close the ticket (unless there is a strong reason to switch). P.S. How about having a configuration parameter to switch between the two? Cheers, Bernard On 11/2/07, Martin Knoblauch wrote: Hi, not sure here. I personally view bytes_in/_out as data throughput, where Bytes/sec makes more sese. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Caleb Epstein To: ganglia-developers@lists.sourceforge.net Sent: Friday, November 2, 2007 9:52:18 PM Subject: [Ganglia-developers] Patch to graph.php for bits/sec in network graphs Attached patch to ganglia-3.0.5 causes the network graphs to be rendered as bits/sec instead of bytes/sec. Seeing as network capacties are usually measured in bits/sec, this seems like a sensible default. -- Caleb Epstein -Inline Attachment Follows- diff -ur ganglia-3.0.5/web/graph.php /pub/www/monitor/ganglia/graph.php --- ganglia-3.0.5/web/graph.php2007-10-03 00:48:43.0 -0400 +++ /pub/www/monitor/ganglia/graph.php 2007-11-02 16:26:50.88059 -0400 @@ -217,12 +217,14 @@ $lower_limit = --lower-limit 0 --rigid; $extras = --base 1024; -$vertical_label = --vertical-label 'Bytes/sec'; +$vertical_label = --vertical-label 'bits/sec'; $series = DEF:'bytes_in'='${rrd_dir}/bytes_in.rrd':'sum':AVERAGE .DEF:'bytes_out'='${rrd_dir}/bytes_out.rrd':'sum':AVERAGE - .LINE2:'bytes_in'#$mem_cached_color:'In' - .LINE2:'bytes_out'#$mem_used_color:'Out' ; + .CDEF:'bits_in'='bytes_in',8,* + .CDEF:'bits_out'='bytes_out',8,* + .LINE2:'bits_in'#$mem_cached_color:'In' + .LINE2:'bits_out'#$mem_used_color:'Out' ; } else if ($graph == packet_report) { @@ -285,6 +287,18 @@ $rrd_file = $rrd_dir/$metricname.rrd; $series = DEF:'sum'='$rrd_file':'sum':AVERAGE .AREA:'sum'#$default_metric_color:'$subtitle' ; + + // Make network graphs bits/sec + if ($metricname == bytes_in or $metricname == bytes_out) + { + $series = DEF:'sum'='$rrd_file':'sum':AVERAGE + .CDEF:'bits'='sum',8,* + .AREA:'bits'#$default_metric_color:'$subtitle' ; + + $metricname = network . substr ($metricname, 5); + $vertical_label = --vertical-label 'bits/sec'; + } + if ($jobstart) $series .= VRULE:$jobstart#$jobstart_color ; } - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Patch to graph.php for bits/sec in network graphs
Hi, not sure here. I personally view bytes_in/_out as data throughput, where Bytes/sec makes more sese. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Original Message From: Caleb Epstein [EMAIL PROTECTED] To: ganglia-developers@lists.sourceforge.net Sent: Friday, November 2, 2007 9:52:18 PM Subject: [Ganglia-developers] Patch to graph.php for bits/sec in network graphs Attached patch to ganglia-3.0.5 causes the network graphs to be rendered as bits/sec instead of bytes/sec. Seeing as network capacties are usually measured in bits/sec, this seems like a sensible default. -- Caleb Epstein -Inline Attachment Follows- diff -ur ganglia-3.0.5/web/graph.php /pub/www/monitor/ganglia/graph.php --- ganglia-3.0.5/web/graph.php2007-10-03 00:48:43.0 -0400 +++ /pub/www/monitor/ganglia/graph.php2007-11-02 16:26:50.88059 -0400 @@ -217,12 +217,14 @@ $lower_limit = --lower-limit 0 --rigid; $extras = --base 1024; -$vertical_label = --vertical-label 'Bytes/sec'; +$vertical_label = --vertical-label 'bits/sec'; $series = DEF:'bytes_in'='${rrd_dir}/bytes_in.rrd':'sum':AVERAGE .DEF:'bytes_out'='${rrd_dir}/bytes_out.rrd':'sum':AVERAGE - .LINE2:'bytes_in'#$mem_cached_color:'In' - .LINE2:'bytes_out'#$mem_used_color:'Out' ; + .CDEF:'bits_in'='bytes_in',8,* + .CDEF:'bits_out'='bytes_out',8,* + .LINE2:'bits_in'#$mem_cached_color:'In' + .LINE2:'bits_out'#$mem_used_color:'Out' ; } else if ($graph == packet_report) { @@ -285,6 +287,18 @@ $rrd_file = $rrd_dir/$metricname.rrd; $series = DEF:'sum'='$rrd_file':'sum':AVERAGE .AREA:'sum'#$default_metric_color:'$subtitle' ; + + // Make network graphs bits/sec + if ($metricname == bytes_in or $metricname == bytes_out) + { + $series = DEF:'sum'='$rrd_file':'sum':AVERAGE + .CDEF:'bits'='sum',8,* + .AREA:'bits'#$default_metric_color:'$subtitle' ; + + $metricname = network . substr ($metricname, 5); + $vertical_label = --vertical-label 'bits/sec'; + } + if ($jobstart) $series .= VRULE:$jobstart#$jobstart_color ; } - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Lets discuss the wish-list and make 3.1.0 happen (was:Re: [Ganglia-general] 4T limit on memory?)
Hi Matt, please do not hold back the meeting due to my schedule. Together with my job priorities I now have a personal matter that makes it more or less impossible for me to do any travel planning. Cheers Martin - Original Message From: Matt Massie [EMAIL PROTECTED] To: Bernard Li [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net; Brad Nicholes [EMAIL PROTECTED] Sent: Wednesday, October 31, 2007 1:18:04 AM Subject: Re: [Ganglia-developers] Lets discuss the wish-list and make 3.1.0 happen (was:Re: [Ganglia-general] 4T limit on memory?) On 10/30/07, Bernard Li [EMAIL PROTECTED] wrote: Matt mentioned that GroundWork Open Source has some monies that could be used to fly some developers to the Bay Area and host a meetup -- I wonder if that offer is still on the plate (Matt?) -- as far as i know, the offer still stands. I am somewhat busy for the next two months (SuperComputing, etc.) so I think the earliest I can attend a meeting would be January. However, if the schedule is right, I could potentially fit it in November/December (the meeting will probably be a day or two I would think). i think a day or two is what i was thinking as well. it looks like february will be the earliest we could do it given martin's schedule. -matt - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Moving on with Ganglia 3.1...
- Original Message From: Bernard Li [EMAIL PROTECTED] To: Brad Nicholes [EMAIL PROTECTED] Cc: ganglia-developers@lists.sourceforge.net Sent: Wednesday, October 31, 2007 11:53:46 PM Subject: Re: [Ganglia-developers] Moving on with Ganglia 3.1... Hi Brad: Good job in compiling the list! I would like to complete my updates to the spec file before you do any massive check-ins (or modifications to the spec file). So as a group, can we answer the following questions: 1) Do we want to allow multiple versions of libganglia to be installed on the same server yes. 2) All versions of libganglia 3.1.x (eg.) should be compatible with each other, i.e. 3.1.0 is compatible with 3.1.1 but not 3.2.x I am not sure. What happens if we have to fix a severe bug in 3.1.X+1 that involves changeing some of the APIs exposed by libganglia-3.1.X? Would that forced us to do a 3.2.0 release? But it would definitely be *desirable* that version 3.1.X can use libganglia version 3.1.X+ 3) Do we want to name libganglia package like libganglia_3_1-3.1.0 according to Novell's packaging rules I have no opinion on this one 4) Split python related DSO modules to ganglia-gmond-python -- and hopefully in the future we'll have ganglia-gmond-perl I am not sure whether a language split is needed or useful. Implementation languages are all the same for me. What I would do is split along the lines of basic-framework vs. core-modules vs. special-modules. Cheers Martin Let's try to wrap this up within the week, thanks all! Cheers, Bernard On 10/31/07, Brad Nicholes wrote: I took a quick look over the wish-list items that were proposed on the mailing list and tried to determine which items would break compatibility and therefore must be completed before we release 3.1.0. I have identified three tasks for which I am planning on completing and commiting the code to trunk over the next few weeks. These tasks include: 1-* Add TITLE attribute to the XDR data to communicate a human readable name There is another task on the wish list which makes this more general which is: -* Flexible method of adding extra metric metadata. We could include extra metadata, not just alias/title. For example, some metrics have a natural minimum and maximum value. Perhaps coming up with an extendable way of encoding metric metadata so future changes can be included without losing backward compatibility. I would rather implement the more flexible method of adding extra metric metadata but I am not really sure how to do that with XDR. If somebody has a good idea of how that could be done with XDR, please let me know. Otherwise I will probably just add the attribute to the existing set of attributes. 2-* Add a GROUP attribute (comma delimited) to the XDR data This would allow metrics to declare the category that they belong to. The category should be added at the metric definition level within the metric module rather than a directive in the .conf file. Again if there were a more flexible way to add extra metric metadata to the XDR package, that would be the preferred method. Short of that, I just plan to add an attribute that would hold a comma delimited list of group names that a metric can belong to. 3-* Modify all byte count metric to 8 byte integers At this point I am assuming that this is one of the issues that is causing the 4T limit problem. For now this is just a temporary fix. The real fix would be to move all of the built in metrics out of gmond itself and implement them as C interface modules which define the correct counter size. If somebody wants to tackle porting the built in metrics rather than applying the temporary fix now, please feel free and let me know that you are doing it. Otherwise, I will try to take care of at least getting the sizing right and then port the metrics sometime later. I have attached a rough compilation of the tasks that were identified through the wish list. This list is not very detailed and should probably be used as a jumping off point for adding all of these enhancements into bugzilla. Once in bugzilla, more detail should be added to each enhancement so that we can have a good discussion about each one, prioritize them and get them implemented. Brad - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net
Re: [Ganglia-developers] Ganglia spec file cleanup
Hi Brad, - Original Message From: Brad Nicholes [EMAIL PROTECTED] To: Marcus Rueckert [EMAIL PROTECTED] Cc: Ganglia Developers ganglia-developers@lists.sourceforge.net; Alex [EMAIL PROTECTED] Sent: Wednesday, October 17, 2007 5:52:57 PM Subject: Re: [Ganglia-developers] Ganglia spec file cleanup On 10/17/2007 at 5:31 AM, in message [EMAIL PROTECTED], Marcus Rueckert wrote: On 2007-10-16 17:06:47 -0600, Brad Nicholes wrote: On 10/11/2007 at 4:11 PM, in message , Bernard Li wrote: Hi Alex: On 10/11/07, Alex wrote: - new subpackage modules-python which contains all the DSO/python modules (not really happy with the naming, so suggestions welcome!) How about extensions-python? Actually I guess I also have concerns about the division of the files, since gmond contains the C modules and the modules-python contains just the python modules -- I wonder if this division is necessary. I guess I'll wait for some feedback from Brad since he's the one who came up with the code. I would rather have all of the metric modules (both C and python) installed with the gmond package. But not all of them have to be enabled. but that would still require to install python although i might not even use it. for the moment i would propose using a subpackage for it. My vision moving forward (just my 2 cents) would be that the ganglia community embrace the python interface as the preferred way to extend gmond with new metric types. To promote this, installing and configuring mod_python by default would encourage the use of the python interface. I've mentioned this idea before on this list, I would also like to see a python metric module repository as part of the ganglia project that would allow the ganglia community to upload and share metric modules similar to the gmetric repository. if an use wants a python based metric type he can easily install the package. In our own internal RPM builds, we have been installing disabled python modules to an extra directory. In other words, a disabled python module .pyconf file would be installed to /etc/ganglia/conf.d/extra and the corresponding .py module file would be installed to /usr/lib/ganglia/python_modules/extra. This allows the user to simply move the .pyconf from extra to conf.d and the .py module from extra to python_modules. Then restart gmond and new metrics appear. Another option would be to install the .pyconf as .pyconf.off and the .py to the python_modules directory. With the config file named .pyconf.off, the gmond configuration file parser will ignore it during startup. The downside of this is that the .py module will always be loaded just because it exists in the python_modules directory, even if it isn't being used or referenced by a configuration file. Of course without a corresponding configuration, even if the .py module is loaded, it's metrics won't be produced or appear in the -m metric list. you can/should do that even with the python module splitted out. as the user might not want all python metrics enabled. Now after having said all of that, there is an option that could be adopted later. If myself or anybody else enabled gmond with other scripting language modules such as perl, PHP, TCL, etc., then it might make more sense to split the different enabling modules with their associated metric plugins, into separate RPM packages. But for now, including the python enabling module along with the python metric modules with gmond, seems more convenient. from a packager/dependency point of view it makes sense to split it out to give the user the choice if they want python or not. darix Actually the scenario that I am proposing would eliminate all of the built in hardcoded metrics and move them out of gmond as python modules. I definitely agree that the notion of core metrics should go in 3.1. At least they should no longer be hardcoded, but loadable. What I do not agree (and you probably didn't mean it that way) is to replace everything by Python code. C Modules should still be allowed and be first class citizens :-) This would allow gmond to be just the collection and transport daemon as it should be. Then the user would have full control over which metrics they want to allow in their system, which version of a metric they want to use, allow them the ability to easily tweak a metric for their particular platform if necessary without having to get into the guts of gmond to do it. Absolutely. It would also eliminate the need for the PHP interface to have to know about gmond vs gmetric metrics. Everything would just be a metric. So in this case, unless your system is running pure C interface metric modules, python would be a required component.
Re: [Ganglia-developers] ganglia-webfrontend package hidden
Hi Bernard, --- Bernard Li [EMAIL PROTECTED] wrote: Dear all: Just FYI I went ahead and hid the deprecated ganglia-webfrontend package on SF.net: http://sourceforge.net/project/showfiles.php?group_id=43021 The ganglia-web component has been part of the ganglia monitoring core package since 3.0.0 and is integrated in the ganglia-version.tar.gz tarball. Very good. Any objections if I go ahead and rename ganglia monitoring core to just ganglia? Fine with me. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Release notes for 3.0.5
Bernard, good job. Hope this will be a worthy last 3.0.x release. Cheers Martin --- Bernard Li [EMAIL PROTECTED] wrote: Hi guys: Ganglia 3.0.5 is ready, I have prepared the release notes here: --- The Ganglia development team is proud to release version 3.0.5 (Louis) of the popular Ganglia monitoring software. Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. The following is a summary of changes in this release. For detailed changelog please refer to the ChangeLog file in the release distribution tarball: - [gmetad] Fixed a bug where messages are being discarded in MacOSX and thus causing data from clients not being consistently and accurately saved to the rrd files (Mike Walker) - [win32] Include documentation (README.WIN) for building under Windows - [webfrontend] Enlarge graphs by clicking on them (Ulf) - [webfrontend] Include RRDTool version in frontend footer (Matthew Chambers) - [webfrontend] Only set the grid stack cookie if it hasn't been set before (Matt Ryan) - [webfrontend] New feature to allow sorting by hosts up and hosts down in meta context (Bernard Li, Eli Stair, Timothy D Witham) - [gstat] New option -n to show numeric addresses instead of hostname (Bernard Li) - Builds under Yellog Dog Linux on Sony PlayStation 3 ppc64 (Bernard Li) - Do not automatically start services (gmond, gmetad) after RPM installation (Bernard Li) - Add y-labels for some metrics. Needed to fix width of RRD images. (Martin Knoblauch) - Build system (Autotools) enhancements (Carlo Marcelo Arenas Belon) - Misc bug fixes Work is underway for the next (3.1.0) release of Ganglia which will allow metrics to be dynamically loaded via DSO. These metrics can be written either in C or in Python making it extremely easy to create plugins for monitoring metrics not already present by default. Apr, expat and libconfuse will be built dynamically in the new release which will make packaging for distributions easier. --- I will be releasing this to SourceForge shortly, please let me know if you see any issues with the above wording. Thanks! Bernard - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] web site and ganglia 3.1.0
--- Matt Massie [EMAIL PROTECTED] wrote: guys- i hope all of you in the united states enjoyed a non-laborious labor day. for our peers in the rest of the world, i hope your day of the moon was a good one. as you might have noticed, we have an updated web site now ( http://ganglia.info/). i plan to add the wiki and make some updates (thanks to feedback from bernard) soon. please feel free to let me know what changes you'd like to see to the site. my hope is to make it easier for people to find the information they need. thanks again to bernard for the mail-archive idea. lastly, i spoke with groundwork open source and they suggested we talk about having a ganglia 3.1.0 ganglia get-together. they offered to help with transportation costs for some of our group (e.g. martin in germany). we should get together and work to push 3.1.0 out. would you guys like to gather in san francisco to meet and release the 3.1.0 release of ganglia? let me know what you think about it. Hi Matt, first of all, the new web site looks very good. Good job. As for a 3.1.x meeting, I belive that it is a great idea. Some brainstorming on what should happen in the is really needed. And if your company helps people with traveling it is even better. As for me, it really depends on the when. I am not 100% master of my time. May day-job employer has some say about it and it might be difficult for me to go away for a week before next February. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad segfaults upgrading from 3.0.3 to 3.0.4
--- Andrea Capriotti [EMAIL PROTECTED] wrote: Il giorno mer, 18/07/2007 alle 11.11 -0700, Bernard Li ha scritto: Well, looks like Charles no longer works for Oracle (his email address bounced). Anyways, anything thing I would like you to try is to run gmetad in debug mode and see if it gives us any hints to why it segfaulted. # ./gmetad -d 5 Going to run as user nobody Sources are ... Source: [Cray_XD1_Linux_Cluster, step 25] has 1 sources xxx.xxx.xxx.xxx Source: [Front_End_Cluster, step 25] has 1 sources xxx.xxx.xxx.xxx Source: [GNU_Linux_Cluster, step 25] has 1 sources xxx.xxx.xxx.xxx Source: [BCX_Linux_Cluster, step 25] has 1 sources xxx.xxx.xxx.xxx Source: [SP5, step 25] has 1 sources xxx.xxx.xxx.xxx Source: [BCC_Linux_Cluster, step 25] has 1 sources xxx.xxx.xxx.xxx xml listening on port 8651 interactive xml listening on port 8652 Data thread 1090386864 is monitoring [Cray_XD1_Linux_Cluster] data source Data thread 1092488112 is monitoring [Front_End_Cluster] data source xxx.xxx.xxx.xxx Data thread 1094589360 is monitoring [GNU_Linux_Cluster] data source xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx Data thread 1096690608 is monitoring [BCX_Linux_Cluster] data source Data thread 1099959216 is monitoring [SP5] data source xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx Data thread 1102060464 is monitoring [BCC_Linux_Cluster] data source xxx.xxx.xxx.xxx cleanup thread has been started [Front_End_Cluster] is a 2.5 or later data stream hash_create size = 1024 hash-size is 1031 hash_create size = 50 hash-size is 53 hash_create size = 50 hash-size is 53 Updating host node01.fec.cineca.it, metric disk_free Updating host node01.fec.cineca.it, metric bytes_out Updating host node01.fec.cineca.it, metric proc_total [..] Writing Summary data for source Front_End_Cluster, metric swap_total Updating host node057.clx.cineca.it, metric cpu_idle Updating host node028, metric cpu_idle Updating host sp062, metric cpu_num Writing Summary data for source Front_End_Cluster, metric part_max_used Updating host node057.clx.cineca.it, metric cpu_user Updating host ch476-n5.xd1.cineca.it, metric mem_total Updating host sp062, metric load_fifteen Updating host node028, metric cpu_user Updating host node057.clx.cineca.it, metric swap_free Segmentation fault If I try again it segfaults in a different point: # ./gmetad -d5 [..] Updating host ch472-n3.xd1.cineca.it, metric cpu_nice hash_create size = 50 hash-size is 53 Updating host node0964.bcx.cineca.it, metric disk_free Updating host sp061, metric mem_cached Writing Summary data for source Front_End_Cluster, metric swap_total Updating host node380.clx.cineca.it, metric mem_cached Updating host sp061, metric load_five Writing Summary data for source Front_End_Cluster, metric part_max_used Updating host node038, metric pkts_out Updating host sp061, metric cpu_num Updating host ch472-n3.xd1.cineca.it, metric cpu_speed Segmentation fault Let me know if you need the whole log. Best Regards -- Andrea Capriotti System Management Group - Cineca - www.cineca.it [EMAIL PROTECTED] - Tel +39 051 6171890 Andrea, do you have a chance to run gmetad under control of a debugger to see where exactely the segfault happens? Apparently the pointer that is NULLified by the patch for bz#56 gets referenced later on, leading to the problem. Thanks Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Removing the static dependancy on APR fromGanglia...
Folks, I tend to agree with Nick. If we move to use apr-1.2.x we can just upgrade the static code. When we moved from 0.9.2 (or .4) to 0.9.7, I considered going 1.2.x instead, but did not have the bandwidth to make the necessary changes. But now the code changes are done anyway for trunk. Cheers Martin --- Nick Galbreath [EMAIL PROTECTED] wrote: Hi Brad... RE: Apr 0.9X vs Apr 1.2.X I guess I'm a bit confused. I like the configure switch, but why not nuke the 0.9.7 and put in the 1.2.X in srclib then no ifdefs are needed and every knows what version to use. To make a patch now, I have to pull two copies of APR and compare differences. Even if we defer linking to dynamic libraries it seems like using the new apr bits (statically) is still a good step. Or what I am I missing? thanks! --nickg On 4/25/07, Brad Nicholes [EMAIL PROTECTED] wrote: I have committed the patches to add --with-libapr to configure.in which allows the project to build against the distro version of libapr 1.2.x or to specify an alternate 1.2.x build. If --with-libapr=some-path-to-apr si specified, it will build and link with the libapr found in the specified path. For now if --with-libapr is not specified at configure time, it will still build and statically link against the 0.9.7 version found in the srclib/apr directory. In order to move from apr 0.9.7 to 1.2.x, I had to add some #ifdef's in gmond.c and apr_net.c to handle the differences. Once we decide to remove apr 0.9.7 completely and only link dynamically to apr 1.2.x, these #ifdef's can be removed. Now that this move to APR 1.2.x has been done, this should pave the way for several things: - allow any plugable metrics module to use APR functions as well - eliminate libexpat and use the expat functions from APR-Util - replace the multicast functions in apr_net.c with the APR multicast functions I plan to work on these tasks as I find time, but if somebody else want to tackle them, please speak up and go ahead. Brad On 4/24/2007 at 8:44 AM, in message [EMAIL PROTECTED], Brad Nicholes [EMAIL PROTECTED] wrote: FYI, I am working on removing the static dependancy on APR from GMOND and other ganglia binaries. In the process I am also moving Ganglia from APR 0.9.7 to APR 1.2.x. This first pass will add a --with-libapr option to configure which will be interpreted as linking dynamically to the distro's version of APR rather than the internal static APR library. In follow on patches, I would like to see the static version of APR removed completely and allow the --with-libapr to specify which APR library to link with if you would rather link with your own built version of APR or use the distro's version. The main reasoning behind this move is so that the metrics modules that are plugged into gmond, can also take advantage of APR. Thinking further ahead, I would also like to see libexpat removed in favor of using the expat functionality built into APR-Util. Comments? Brad - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Removing the static dependancy on APR from Ganglia...
Hi Brad, GO GO GO :-) Really great that you are looking into it. This has been a complaint from several people for some time now. Cheers Martin --- Brad Nicholes [EMAIL PROTECTED] wrote: FYI, I am working on removing the static dependancy on APR from GMOND and other ganglia binaries. In the process I am also moving Ganglia from APR 0.9.7 to APR 1.2.x. This first pass will add a --with-libapr option to configure which will be interpreted as linking dynamically to the distro's version of APR rather than the internal static APR library. In follow on patches, I would like to see the static version of APR removed completely and allow the --with-libapr to specify which APR library to link with if you would rather link with your own built version of APR or use the distro's version. The main reasoning behind this move is so that the metrics modules that are plugged into gmond, can also take advantage of APR. Thinking further ahead, I would also like to see libexpat removed in favor of using the expat functionality built into APR-Util. Comments? Brad - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] AIX Consolidation
Hi Michael, I guess the POWER5 extensions would be good candidates for dynamic loading into the gmond stream. In any case, I see no reason not to keep them in the core code, even if they are not enabled by default. One thing that I like more with the current code are the combined functions for retrieving related metrics (get all cpu and network stats) at one point in time. The reduce syscall overhead and keep metrics together (important for CPU usage). Cheers Martin --- Michael Perzl [EMAIL PROTECTED] wrote: Hi Martin, if possible I would like to somehow take my version (after some reviewing) :-), as it contains all the new POWER5 stuff already. My understanding is - as it would require some changes to protocol.x - that my changes won't have a chance to get into the core Ganglia source code until version 3.1 comes along. This code and everything else (RPMs) can be found on my website http://www.perzl.org/ganglia/. This stuff is actually in use at quite many customer sites already (runs on AIX 4.3.3, 5.1, 5.2 and 5.3) so I would like to keep that POWER5-stuff in if possible. Actually, an AIX gmond implementation without the POWER5-stuff based on my implementation could be done very easy (just stripping off the POWER5-addons). Regards, Michael Martin Knoblauch wrote: Michael, Andreas, any chance that you could consolidate the two versions of the AIX metrics that seem to be around? Seem you are the ones who have worked most recently on the AIX implementation. Cheers Martin --- Michael Perzl [EMAIL PROTECTED] wrote: Andreas, thank you for taking the blame but you are off the hook here. ;-) If I understood David correctly, he is using my AIX Ganglia RPM packages with POWER5 extensions. Here most if not all implementation of how the metrics are collected under AIX have been changed. Everything is documented on my homepage (http://www.perzl.org/ganglia/) though. So everything what goes wrong here is entiremy my fault :-[ After some investigating and some discussions with Nigel I have come to terms with the following facts regarding the bytes_in/bytes_out problem: - libperfstat (the library on AIX which obtains all the system performance data) uses u_longlong_t data types (these are definitely 64-bit large). - The AIX kernel internally, though, may probably not be using 64-bit data types - more realistic is probably unsigned 32-bit - in order not to break compatibility (my personal opinion) - The consequence now is that integer overrun may occur much easier with 32-bit data types than with 64-bit data types (we all probably don't live long enough to see that happen). Please take a look at my implementation of the bytes_in metric (the bytes_out implementation is accordingly): 01 g_val_t 02 bytes_in_func( void ) 03 { 04 g_val_t val; 05 perfstat_netinterface_total_t n; 06 static u_longlong_t last_bytes_in = 0, bytes_in; 07 static double last_time = 0.0; 08 double now, delta_t; 09 struct timeval timeValue; 10 struct timezone timeZone; 11 12 gettimeofday( timeValue, timeZone ); 13 14 now = (double) (timeValue.tv_sec - boottime) + (timeValue.tv_usec / 100.0); 15 16 if (perfstat_netinterface_total( NULL, n, sizeof( perfstat_netinterface_total_t ), 1 ) == -1) 17val.f = 0.0; 18 else 19 { 20bytes_in = n.ibytes; 21 22delta_t = now - last_time; 23 24if ( delta_t ) 25 val.f = (double) (bytes_in - last_bytes_in) / delta_t; 26else 27 val.f = 0.0; 28 29last_bytes_in = bytes_in; 30 } 31 32 last_time = now; 33 34 return( val ); 35 } In my opinion the overrun occurs in line #25 when bytes_in last_bytes_in. In my naivity I had assumed as both are of type u_longlong_t that an integer overrun might never happen. So to solve the overrun a check for bytes_in last_bytes_in must be introduced, something like: u_longlong_t d; d = bytes_in - last_bytes_in; if (d 0) d += ULONG_MAX; and line #25 would essentially become 25 val.f = (double) d / delta_t; Comments ? Regards, Michael PS: David, the reason why you don't see it happen with pkts_in and pkts_out is that probably no overrun so far has occurred but at some point it will also happen. PPS: David, if this is a solution (I want some comments on that before, though) then I would be building new RPMs with the then hopefully correct code. Andreas Schoenfeld wrote: Hi David and Martin, I suppose the network code is still the code I wrote, so there are two problems I know of: 1. yes there is a problem with owerflows 2. the shown network traffic is the sum of all network interfaces including local
[Ganglia-developers] AIX Consolidation (was: Re: [Ganglia-general] Help! I have a petabyte/s network (Martin Knoblauch))
Michael, Andreas, any chance that you could consolidate the two versions of the AIX metrics that seem to be around? Seem you are the ones who have worked most recently on the AIX implementation. Cheers Martin --- Michael Perzl [EMAIL PROTECTED] wrote: Andreas, thank you for taking the blame but you are off the hook here. ;-) If I understood David correctly, he is using my AIX Ganglia RPM packages with POWER5 extensions. Here most if not all implementation of how the metrics are collected under AIX have been changed. Everything is documented on my homepage (http://www.perzl.org/ganglia/) though. So everything what goes wrong here is entiremy my fault :-[ After some investigating and some discussions with Nigel I have come to terms with the following facts regarding the bytes_in/bytes_out problem: - libperfstat (the library on AIX which obtains all the system performance data) uses u_longlong_t data types (these are definitely 64-bit large). - The AIX kernel internally, though, may probably not be using 64-bit data types - more realistic is probably unsigned 32-bit - in order not to break compatibility (my personal opinion) - The consequence now is that integer overrun may occur much easier with 32-bit data types than with 64-bit data types (we all probably don't live long enough to see that happen). Please take a look at my implementation of the bytes_in metric (the bytes_out implementation is accordingly): 01 g_val_t 02 bytes_in_func( void ) 03 { 04 g_val_t val; 05 perfstat_netinterface_total_t n; 06 static u_longlong_t last_bytes_in = 0, bytes_in; 07 static double last_time = 0.0; 08 double now, delta_t; 09 struct timeval timeValue; 10 struct timezone timeZone; 11 12 gettimeofday( timeValue, timeZone ); 13 14 now = (double) (timeValue.tv_sec - boottime) + (timeValue.tv_usec / 100.0); 15 16 if (perfstat_netinterface_total( NULL, n, sizeof( perfstat_netinterface_total_t ), 1 ) == -1) 17val.f = 0.0; 18 else 19 { 20bytes_in = n.ibytes; 21 22delta_t = now - last_time; 23 24if ( delta_t ) 25 val.f = (double) (bytes_in - last_bytes_in) / delta_t; 26else 27 val.f = 0.0; 28 29last_bytes_in = bytes_in; 30 } 31 32 last_time = now; 33 34 return( val ); 35 } In my opinion the overrun occurs in line #25 when bytes_in last_bytes_in. In my naivity I had assumed as both are of type u_longlong_t that an integer overrun might never happen. So to solve the overrun a check for bytes_in last_bytes_in must be introduced, something like: u_longlong_t d; d = bytes_in - last_bytes_in; if (d 0) d += ULONG_MAX; and line #25 would essentially become 25 val.f = (double) d / delta_t; Comments ? Regards, Michael PS: David, the reason why you don't see it happen with pkts_in and pkts_out is that probably no overrun so far has occurred but at some point it will also happen. PPS: David, if this is a solution (I want some comments on that before, though) then I would be building new RPMs with the then hopefully correct code. Andreas Schoenfeld wrote: Hi David and Martin, I suppose the network code is still the code I wrote, so there are two problems I know of: 1. yes there is a problem with owerflows 2. the shown network traffic is the sum of all network interfaces including local loopback devices (lo0...). Both Problems could lead to astonishing data transfer rate in ganglia. Sorry I had promised to fix the problems, but there was to much other work ... Best regards Andreas Date: Thu, 29 Mar 2007 08:21:38 -0700 (PDT) From: Martin Knoblauch [EMAIL PROTECTED] Subject: Re: [Ganglia-general] Help! I have a petabyte/s network To: David Wong [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=iso-8859-1 David, good catch. I will have to look at it for a bit. Cheers Martin --- David Wong [EMAIL PROTECTED] wrote: I don't write much code nowadays, so I'm going to need a lot of help with this. I dug through the ganglia code and I found this interesting tidbit in libmetrics/aix/metrics.c which may be indicative of the problem. There's an assignment from cur_ninfo.ibytes to cur_net_stat.ibytes, but the types of the two variables are different. net_stat::ibytes is a double: struct net_stat{ double ipackets; double opackets; double ibytes; double obytes; } cur_net_stat; and we have *ninfo declared here: perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ; libperfstat.h has perfstat_netinterface_total_t::ibytes as u_longlong_t. Does this code try to do what I think it is doing, i.e. assign an unsigned 64 bit integer to a signed 64bit
Re: [Ganglia-developers] Adding extensibility to gmond...
Rich, this is actually where I see the future direction for gmond - having all metrics configurable. But if ysou do this you may end up with slightly different gmonds. We need to find a way to make those work together seamlessly. Cheers Martin --- Richard Mohr [EMAIL PROTECTED] wrote: On Fri, 2007-03-02 at 07:40 -0800, Martin Knoblauch wrote: My vision for the future would include a completely configurable set of core metrics for gmond. But in a way where different gmonds still can work together in some meaningfull way. For example we have to rework the metrics array mechanism in protocol.x to be much more flexibel. Something that might introduce incompatibilities to 3.0.x. Is there any reason why most of the sutff in protocol.x can't be ditched in favor of treating every metric like a user defined metric? In my mind, the proper thing to do is to have gmond operate in much the same way that gmetric does. The gmetric code just parses the command line to determine the metric's name, type, slope, etc. The gmond.conf file could maybe support some syntax like this: metric { name = cpu_idle type = float slope = both format = %.1f value_threshold = 5.0 } Essentially this just moves the ganglia_25_metric_array from protocol.x into gmond.conf. But probably a better way would be to change the type of value used by the functions that collect the metrics. Instead of having them return g_val_t (which is nothing more than a union of all possible value types), they could return something a bit more complex. For example, just take struct Ganglia_25metric and replace int key; with g_val_t value; Each metric is now responsible for reporting info about itself, and gmond doesn't have to care about those details. Or maybe I'm just naive, and I don't fully understand some of the XDR related details -- Rick Mohr Systems Developer Ohio Supercomputer Center - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Getting code into Ganglia
Dear Paul, I have just added the bug to my list. Unfortunatelly people are a bit lazy in picking from the global bug-list. As for #120 I am inclined to shout no, never :-) What is the need for having / as part of the metrics name? This is so utterly ugly... Cheers Martin --- Paul Millar [EMAIL PROTECTED] wrote: Hi all, Last December, I asked about getting a bug-fix into Ganglia (bug #120, see [1]). Through the following discussion, I had thought the consensus was that the code was basically OK and was ready to go in. There was a few concerns, that the patch: might cause confusion on certain platforms, such as Windows (although I believe Matt's opinion was that this shouldn't prevent the code from going in) needed a is-not-NULL test before the free() (adding this test is trivial). But, apparently nothing has happened; or, at least, the bug hasn't been closed. Is the including of this patch blocking on something? Can I do anything to help out? Cheers, Paul. [1] http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=120 [2] http://sourceforge.net/mailarchive/forum.php?thread_id=31232602forum_id=9584 - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] [PATCH] linux/metrics.c - Remove fsusage.c
Hi Matt, what was it doing in the first place? Seems we are not really missing it. Cheers Martin --- matt massie [EMAIL PROTECTED] wrote: i've just committed brad's libmetric update to subversion. very nice patch indeed... very simple and removes a licensing conflict. i remember this conflict being part of earlier threads on this list. was it Stuart Teasdale who was monitoring ganglia for debian repositories? thanks brian for this patch! i'm reviewing the dso code your submitted as well. my initial impression is that we should setup an account for you to get direct svn access. :) On Wed, 2007-02-28 at 17:15 -0700, Brad Nicholes wrote: Since fsusage.c is licensed under the GPL, the fact that this file is being linked into libmetrics causes a licensing issue for gmond. The attached patch removes the fsusage.c(.h) which relieves gmond of a licensing conflict between the GPL and BSD. Brad - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- matt massie phone: 415.692.0828 x2843 fax: 415.278.0441 http://archrock.com/ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Adding extensibility to gmond...
Brad, thanks for providing this functionality. As others already commented, this is urgently needed. Now we have the question how to move on. It seems to me that we are on the way of opening 3.1 or 4.0. My vision for the future would include a completely configurable set of core metrics for gmond. But in a way where different gmonds still can work together in some meaningfull way. For example we have to rework the metrics array mechanism in protocol.x to be much more flexibel. Something that might introduce incompatibilities to 3.0.x. Cheers Martin --- Brad Nicholes [EMAIL PROTECTED] wrote: All, I have just added an enhancement request to bugzilla (#129) for adding modular metric extensibility to gmond. I have also attached a patch file and example module to the bug report that add this functionality. Hopefully you will find this enhancement useful and commit it to the ganglia SVN repository. Let me know if you have any questions or issues with the patches. Brad On 2/27/2007 at 10:09 AM, in message [EMAIL PROTECTED], matt massie [EMAIL PROTECTED] wrote: brad- having loadable modules for gmond would be outstanding. one of the reason i moved gmond onto apr was for that very reason. there is no real reason for gmond to statically linked. it is mostly a historical artifact really. when ganglia was first written (back in 2000), package management was a nightmare. the feedback i got was that statically linking was favorable since it effectively eliminated most library dependencies. package management has come a long way in the last seven years so i think we should make static linking optional. the only other dependencies that ganglia has is to expat and libconfuse. the expat dependency could be easily dropped since apr has all the expat xml code. please let us know how we can help. On Tue, 2007-02-27 at 09:36 -0700, Brad Nicholes wrote: I am working on adding metric module extensibility to gmond in much the same way that Apache loads and uses dynamic modules. The fact that APR already support DSO loading for various platforms, makes the Apache model an easy fit. While implementing the example metric module, I wondered why APR is linked statically to gmond rather than dynamically. Is there any specific reason? If gmond were to load APR dynamically, it would make it much easier for a metric module to also use APR especially for memory allocation. Also, once I have extensibility added, is the project interested in committing this feature back into the SVN repository? thanks, Brad - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] How to change ip to hostname in the cluster report
Carlos, the display shows the IPs if it cannot reverse lookup the host-names. So you either need to set-up NIS or DNS with the host-name/IP pairs, or just add the hosts to the /etc/hosts files of the gmetad host (if different, also to the host running the source gmond). cheers Martin --- Carlos Fernández [EMAIL PROTECTED] wrote: Hello, I would like to know how to change the boxes showing the node ip instead the node hostname in the cluster report page. Thank you. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Query From-To
Hi Richard, seems bankers still have money to burn :-) But your grid description definitely sounds impressive. What kind of HPC do you perform on Windows? Cheers Martin --- [EMAIL PROTECTED] wrote: Re: fined grained data - At our site we mostly set 5 or 10 second polls in gmetad, and the same in the gmond confs. This was requested by our users. And yes, we have adjusted our RRD configuration in gmetad.conf to allow a few weeks of 5 second data. Each of our ganglia severs handle up to 3000 hosts, although I once earlier had about 5,000 hosts on one servers. The monitored hosts are almost all windows servers doing HPC. The ONLY way that this is even remotely possible is by 1) reducing the number of collected metrics to a bare essential minimum for HPC (cpu, network, and I/O). The cygwin agent is unable to give much more anyway. 2) The RRD storage on servers of up to 100 gigabytes, is located on fast SAN storage. The sustained I/O has been up to 20 gigabytes a second sometimes. And as we all know, RRD stands for Rapid Ruination of Disks. Why did users want such fine grained data? Well some groups wanted fine grained data to do after the fact forensics, some wanted immediacy to the graphs, some wanted to make sure that load spikes did not disappear, and others didn't know what they wanted. But now our user base is experienced with ganglia, some are reducing their poll rate (Some users want capacity trending only, say). Richard Grevis Production Architecture Barclays Capital, Canary Wharf, London, E14 4BB *DDI : +44 (0) 20 7773 4915 * richard.grevis -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Knoblauch Sent: 16 January 2007 09:44 To: Caleb Epstein; Grevis, Richard: IT (LDN) Cc: ganglia-developers@lists.sourceforge.net Subject: Re: [Ganglia-developers] Query From-To Richard, I second the desire to have such a feature. But: where do you get the fine grained data for past periods from? You need to add datapoints to the RRDs. How big are your databases compared to the default ones? But yes - useful feature. Cheers Martin --- Caleb Epstein [EMAIL PROTECTED] wrote: On 1/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: The changes have been useful to us internally here, but our HPC environment is not really like most of the HPC clusters in academia and elsewhere, which means I am unsure of the likely level of interest. Also a change like the from/to mod may be too much of a change for people. I've always wished for a feature like this, personally. I think it would be a useful addition. -- Caleb Epstein -- --- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforge CID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDE V ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers For more information about Barclays Capital, please visit our web site at http://www.barcap.com. Internet communications are not secure and therefore the Barclays Group does not accept legal responsibility for the contents of this message. Although the Barclays Group operates anti-virus programmes, it does not accept responsibility for any damage whatsoever that is caused by viruses being passed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Barclays Group. Replies to this email may be monitored by the Barclays Group for operational or business reasons. -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Fixing the font ugliness created by image scaling
Hi Vladimir, almost correct. The main changes are in the web-frontend. I also changed some of the metric units in libmetrics from to in order to have all host metrics display in the same width. Commits: 2006-12-28 21:19 knobi1 * lib/protocol.x: MKN: Add y-labels for some metrics. Needed to fix width of RRD images. 2006-12-28 21:18 knobi1 * web/cluster_view.php, web/functions.php, web/graph.php, web/templates/default/cluster_view.tpl, web/templates/default/host_view.tpl, web/templates/default/meta_view.tpl: MKN: Fix scaling related ugliness of RRD fonts. MKN: Check whether rrd-files exist in function find_limits Btw. I also changed the online color (nohost mode) from yellow/orange to green. Seems to make more sense. Cheers Martin --- Vladimir Vuksan [EMAIL PROTECTED] wrote: Can someone shed some light where these fixes have been commited to ? I assume it is in the web frontend but haven't been able to find it :-(. Vladimir Reply to: some time ago I reported problems with very ugly fonts when using the webfrontend with some newer versions of rrdtools. Some investigation showed that the problem was created by incorrect WIDTH/HEIGHT parameters to the IMG tags in the frontend. I have checked in a few fixes that remove the scaling from the IMG tags. The RRD images are now sized at creation time. As a result the fonts look good again. I also had to add y-labels to some of the core metrics. Without them the images on the host_view would have different width. There are some visual differences, that I hope are acceptable: - the images are now a bit larger - in order to make the outer images the same size, I had to vary the size of the drawing canvases. Some of them are now larger For me both are good, as my eyesight is not what is used to be :-) During the exercise I also found a bug in the find_limits function, where we did not check for the existence of the RRD files. This would fail when using one of the *_report aggregates as default_metric. Not fatal, but it fills up the logfiles. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] updating to a more recent libtool version for the bootstrap?
--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: I was unable to reproduce this problem, which IMHO should only happen if the Makefile.am file is for some reason found to be newer than the generated files (Makefile.in and Makefile), triggering automake to be invoked again or if you are changing Makefile.am see my previous Mail. It seems the newerness of the timestamps for Makefile.am and Makefile.in is randomly depending on the svn checkouts. in that case, just running aclocal and automake with the version of the tools I had in my FC3 workstation fixed the problem as well, and it is anyway needed if Makefile.am was changed so that the changes are applied. again, see my previous E-Mail, where it did not help to run aclocal and automake. I agree with you though that we should better come up with a version that is well supported for all development environments (*), and I presume that the current de-facto is FC4? Not sure about FC4 being the standard. It is true for me, but I do not want to dictate that. Maybe something like RHEL4 (Centos-4) or SLES9 would be more apropriate for Linux. But what about the non-Linux platforms? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Pie Chart not showing
--- [EMAIL PROTECTED] wrote: All, We've added the following to cluster_view.php: /** BEGIN MTB **/ if ($clustername == AMS GDC || $clustername == Expro Lite Site || $clustername == SEPNO Lite Site) { $optional_graphs = array('session'); } /** END MTB **/ And now the pie chart is not displaying for the above clusters? In the produced HTML source the URL for the pie chart is: IMG SRC=./pie.php? ALT=Pie Chart BORDER=0 Any suggestions? Regards, Alexander, are there any errors in the webservers logfiles or in /var/log/messages? Usually a non-showing pie chart means that you are missing GD support. Did the pie-chart ever show up? :-) Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] [Ganglia-general] Windows port issues
--- Vladimir [EMAIL PROTECTED] wrote: Martin Knoblauch wrote: could you be more specific on the error message? Is it compile time, or link time? There is no such thing as xdr_create. Maybe xdrmem_create. Sorry I should have been more precise. It is a linking error. Here is the log gmond.o: In function `Ganglia_collection_group_send': /ganglia-3.0.4/gmond/gmond.c:1633: undefined reference to `_xdrmem_create' gmond.o: In function `main': /ganglia-3.0.4/gmond/gmond.c:897: undefined reference to `_xdrmem_create' /ganglia-3.0.4/gmond/gmond.c:828: undefined reference to `_xdr_free' /ganglia-3.0.4/gmond/gmond.c:912: undefined reference to `_xdr_free' ../lib/.libs/libganglia.a(libgmond.o): In function `Ganglia_gmetric_send': /ganglia-3.0.4/lib/libgmond.c:695: undefined reference to `_xdrmem_create' ../lib/.libs/libganglia.a(libgmond.o): In function `Ganglia_gmetric_send_spoof': /ganglia-3.0.4/lib/libgmond.c:748: undefined reference to `_xdrmem_create' ../lib/.libs/libganglia.a(protocol_xdr.o): In function `xdr_Ganglia_value_types': /ganglia-3.0.4/lib/protocol_xdr.c:13: undefined reference to `_xdr_enum' ../lib/.libs/libganglia.a(protocol_xdr.o): In function `xdr_Ganglia_gmetric_message': /ganglia-3.0.4/lib/protocol_xdr.c:23: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:25: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:27: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:29: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:31: undefined reference to `_xdr_u_int' /ganglia-3.0.4/lib/protocol_xdr.c:33: undefined reference to `_xdr_u_int' /ganglia-3.0.4/lib/protocol_xdr.c:35: undefined reference to `_xdr_u_int' ../lib/.libs/libganglia.a(protocol_xdr.o): In function `xdr_Ganglia_spoof_header': /ganglia-3.0.4/lib/protocol_xdr.c:45: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:47: undefined reference to `_xdr_string' ../lib/.libs/libganglia.a(protocol_xdr.o): In function `xdr_Ganglia_message_formats': /ganglia-3.0.4/lib/protocol_xdr.c:69: undefined reference to `_xdr_enum' ../lib/.libs/libganglia.a(protocol_xdr.o): In function `xdr_Ganglia_message': /ganglia-3.0.4/lib/protocol_xdr.c:116: undefined reference to `_xdr_u_int' /ganglia-3.0.4/lib/protocol_xdr.c:124: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:151: undefined reference to `_xdr_float' /ganglia-3.0.4/lib/protocol_xdr.c:156: undefined reference to `_xdr_double' /ganglia-3.0.4/lib/protocol_xdr.c:95: undefined reference to `_xdr_u_short' ../lib/.libs/libganglia.a(protocol_xdr.o): In function `xdr_Ganglia_25metric': /ganglia-3.0.4/lib/protocol_xdr.c:170: undefined reference to `_xdr_int' /ganglia-3.0.4/lib/protocol_xdr.c:172: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:174: undefined reference to `_xdr_int' /ganglia-3.0.4/lib/protocol_xdr.c:178: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:180: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:182: undefined reference to `_xdr_string' /ganglia-3.0.4/lib/protocol_xdr.c:184: undefined reference to `_xdr_int' collect2: ld returned 1 exit status make[3]: *** [gmond.exe] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 OK, seems ld is unable to find all of the xdr functions. Maybe someone removed a library from the library list. Although under Linux those functions are in libc. Hmm. What package are you refering to? There is no official windows (cygwin) binary distribution. Perhaps it is unofficial but it is on SourceForge e.g. http://downloads.sourceforge.net/ganglia/ganglia-3.0.0-setup.exe?modtime=1107790662big_mirror=0 Ah. I forgot about this one. And I do not recall who donated the work. I am adding the developers list. Apparently, the installer was never updated after the initial release. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)
--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Fri, Dec 22, 2006 at 08:05:02AM -0800, Martin Knoblauch wrote: Hi Folks, in order to fix bz#84 for Linux. http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=84 I think that the fix for this bug should actually include adding 2 more metrics, as the problem as stated isn't really that ganglia isn't reporting the right count of CPUs, but that there is no way to know if it is virtual or real CPUs for inventory and in some sort also scheduling reasons. This way cpu_num could be kept as the number of available CPUs, as is implicitly described to do in the current documentation for this metric and will have cpu_cores and cpu_sockets as the number of available cores or available sockets. of course for HPC, the number of effective CPUs is a function of all those 3 and the type of code that is being run, so we should leave up to the end users to figure that out while giving them all the information they need for that. the advantages of doing it this way, are that the code is greatly simplified, all possible use cases are covered and the metric is kept backward compatible. comments, anyone? Carlo Carlo, modulo the naming of the new metrics, I completely agree with you. In order to make an educated guess, we need all three components. And we should not forget that more and more clusters in use are running non-HPC workloads, where the virtual CPUs may actually be of use. One thing we should at least keep in mind is the fact the number of CPUs may no longer be a constant - CPU hotplugging is available on Linux and some of the proprietary Unixes. Same for memeory. And the cpu-frequency has been variable for years. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Ganglia RPM auto start
Hi Bernard, just change it. This has bitten me many times and I always wondered. Definitely stuff for 3.0.5. In case you build 3.0.4 RPMs, I would already use a modified spec file. Cheers Martin --- Bernard Li [EMAIL PROTECTED] wrote: I remember asking this a few years back, but now I don't remember the reason behind this - is there a particular reason why the spec file automatically starts up the daemon upon installation? For gmond, since the user does not have a chance to modify the configuration, the daemon could be started and may join the wrong group. Any objections to disabling auto-start for gmetad and gmond by default? Thanks, Bernard - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Ganglia 3.0.3 and Linux /proc/net/dev counter overflow
--- Caleb Epstein [EMAIL PROTECTED] wrote: On 10/26/06, Caleb Epstein [EMAIL PROTECTED] wrote: So val.f will be zero when bytes_in last_bytes_in. Since these counters are known to be 32-bit unsigned integers, can't this code do a better job of calculating diff? Something like: OK, see attached patch against 3.0.3. This seems to fix this overflow problem here, and it eliminates a *lot* of redundant code from the Linux version of metrics.c. -- Caleb Epstein Hi Caleb, your patch does no longer apply to the 3.0.4 source. Would it be possible for you to rebase the patch against 3.0.4 or current SVN? I definitely like the code/complexity reduction. We should do the same for the cpu_*_func family. Sorry for the inconvenience. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] [Ganglia-general] Ganglia+OpenBSD?
--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Wed, Dec 27, 2006 at 12:38:00AM -0800, Martin Knoblauch wrote: I see no problem to add OpenBSD support in 3.0.5. Just go on and check it in once you are satisfied with your stuff. checked it in already in revision 697. saw it. Just out of curiosity: how similar are the BSD flavours? We already have NetBSD and FreeBSD support in. I used NetBSD as a base from my port (as it is the closest), sadly they are not that similar as to just work with the other source as you can see by the diff. Understand. Btw. you should check the use of the strings NetBSD / FreeBSD in you patch :-) DragonflyBSD will be most likely closer to FreeBSD and the same for MacOS X (AKA Darwin), but I have no interest on adding those yet (DragonFlyBSD could be an interesting option for clusters, but I'd heard of no one using it in a cluster yet). You realize that we already have a Darwin port, although I do not know the quality/completeness of the metrics code. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Ganglia 3.0.4 released
--- matt massie [EMAIL PROTECTED] wrote: On Mon, 2006-12-25 at 06:40 -0600, Carlo Marcelo Arenas Belon wrote: On Mon, Dec 25, 2006 at 02:32:30AM -0800, Martin Knoblauch wrote: Ho ho ho, Santa just released version 3.0.4 of Ganglia. This is mainly a bugfix release. See the ChangeLog in the tarball for a complete list of changes. thanks for the gift santa knoblauch! i always wanted a tarball for christmas. :) almost better than snowballs. definitely lasts longer :-) thanks Santa, and I got to be the first kid that went to the sourceforge tree for the nicely wrapped package :) which was far nicer than that Wii that Matt is probably still waiting to get a hold of. hehe. you're right... i just gave up trying to find a wii. funny thing is: i bet in a few months they'll be available everywhere. wish nintendo didn't play it so conservative on the year end sales numbers. I never understood why people would queue up to buy the 1.0- release of a consumer product. At least wait for 1.01. agreed. i did propose the naming scheme just to deal with cvs character limitations. i like simple clean tags like monitor-core-3.0.4. just an fyi martin, the copy command will not make a duplicate of the trunk .. more like a symbolic link. As I understand it, it is actually more like hard linking. Which is fine. You may have seen that I added a monitor-core-3.0.4 tag a few minutes ago. happy holidays everyone... let's hope for a more peaceful year in '07. Good wish, independent of the number. seven is a lucky number right? In the western world yes. Not sure about other cultures. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Ganglia 3.0.4 released
--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Mon, Dec 25, 2006 at 02:32:30AM -0800, Martin Knoblauch wrote: Ho ho ho, Santa just released version 3.0.4 of Ganglia. This is mainly a bugfix release. See the ChangeLog in the tarball for a complete list of changes. thanks Santa, and I got to be the first kid that went to the sourceforge tree for the nicely wrapped package :) which was far nicer than that Wii that Matt is probably still waiting to get a hold of. since I was running tests on the last SVN anyway, I got some more platforms where gmond/gmetric (and therefore libmetrics) were tested (*): * Gentoo Linux 2006.1 (amd64), Fedora Core 6 (i386) * Solaris 9 (sparc), Solaris 10 (i386, amd64 and sparc) * NetBSD 2.0.2 (i386), NetBSD 3.0 (i386), NetBSD 3.1 (i386, amd64) * FreeBSD 6.1 (amd64) Hi Carlo, thanks for the feedback. Could you just tell us which toolchains were used on the non-Linux platforms? Especially which compiler? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Bugzilla and bugs
--- Erich Focht [EMAIL PROTECTED] wrote: Hi Martin, your suggestion makes kind of sense, so I would agree to incorporate this. thanks, done (guess you've seen it in the commits). So what's the plan now for a release? Hi Erich, what about writing a letter to Father Christmas? Maybe he will put a release under the tree :-) Merry Chrismas and a Happy New Year to all Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
[Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)
Hi Folks, in order to fix bz#84 for Linux, I would like to collect some data from different system configurations. Could you please create the file cpu.grep and execute the cat/grep chain below. Please report the results together with uname -a output which distro you are running. # more cpu.grep processor vendor model name physical id siblings core id cpu cores # cat /proc/cpuinfo | grep -f cpu.grep Merry Xmas and a Happy new Year Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Bugzilla and bugs
--- Erich Focht [EMAIL PROTECTED] wrote: Hi all, On Monday 18 December 2006 11:28, Martin Knoblauch wrote: Hi Bernard, no objections from my side. I am using it on several clusters without any serious problems. any objections to changing the part_max_used metric to be limited to RW partitions? If not, I'd like to check it in before you guys freeze a new version... Thanks, Erich Hi Erich, your suggestion makes kind of sense, so I would agree to incorporate this. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Bugzilla and bugs
Hi Paul, no, you are not doing anything wrong. It is just some people have not been very active the last few months. I am not sure about Matt or Bernard, but my excuse is total overload on my daytime job :-) There are just to few cycles left for ganglia. Maybe we need more people who are allowed to do checkins. Matt? Any suggestions? Cheers Martin --- Paul Millar [EMAIL PROTECTED] wrote: Hi all, I've a quick query about bugs and getting code into ganglia. I spotted a bug in Ganglia and reported this in the Ganglia bugzilla (#120) almost two months ago (2006-10-19). I've included a test-case to demonstrate the problem and a proposed patch to fix it: http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=120 So far I've received no feedback about this bug; apparently, nothing is happening. I'm I doing something wrong? How can I help this to progress further? Cheers, Paul. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Per processor metrics
--- Paul Millar [EMAIL PROTECTED] wrote: On Thursday 30 November 2006 17:33, [EMAIL PROTECTED] wrote: I am not sure which way I should proceed, and the code setup for metrics is kind of discouraging against putting in new ones with XDR encoding because of the places I need to change stuff - linux/metric.c and gmond.c and ./lib/protocol_xdr.c and protocol.h at the least. Yup, I would avoid hacking protocol.x (from which protocol_xdr.c is derived) unless you want to maintain it: the current setup isn't too friendly towards changing core metrics. Yes, one of the goals in a 4.x series should be to simplify adding new metrics to the gmond stream. The current way is a big hassle. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] gexec segfault
Hi Erich, --- Erich Focht [EMAIL PROTECTED] wrote: Hi, I'd prefer to change this by patching the llist.h of Ganglia. Does anybody have objections to this? Before fixing things either way, we should take a closer look. Could you show us the offending difference between the two versions? I made some changes over a year ago. Those were just cosmetic and did not change the order. Over the long term it would make sense to either build ganglia with libe or maybe even integrate libe into ganglia and produce the libe RPM when building ganglia. I saw that gexec is also integrated into the ganglia SVN tree: is this now the main development tree for gexec? Integrating gexec/libe into standard ganglia builds sounds good to me. As for the main development tree gexec question - what development? Last changes have been over two years ago. And I think it has been broken since then. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] gexec segfault
Hi Erich, --- Erich Focht [EMAIL PROTECTED] wrote: Hi Martin, On Monday 27 November 2006 12:53, Martin Knoblauch wrote: I'd prefer to change this by patching the llist.h of Ganglia. Does anybody have objections to this? Before fixing things either way, we should take a closer look. Could you show us the offending difference between the two versions? I made some changes over a year ago. Those were just cosmetic and did not change the order. Ganglia has in lib/llist.h: typedef struct _llist_entry { void*val; /* Entry value */ struct _llist_entry *prev; /* Previous entry on list */ struct _llist_entry *next; /* Next entry on list */ } llist_entry; while the original libe package llist.h is: typedef struct _llist_entry { struct _llist_entry *prev; /* Previous entry on list */ struct _llist_entry *next; /* Next entry on list */ void*val; /* Entry value */ } llist_entry; the libe version looks more textbook like. I think we should take it :-) Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de