Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-15 Thread Nicholas Satterly
Hi Devon,

I think now that we the ability to define exactly which metrics should and
should not be summarised then the issue of slow-downs due to metric
summarisation can be managed.

If we are to look at redoing the XML parsing next then the two contenders
that come to mind are gzipped JSON and Google Protocol Buffers.

PB is meant to be very efficient and therefore faster, however it seems
people have gotten comparable results with gzipped JSON. An obvious
advantage of gzipped JSON is that it would be simple to make the output
human readable though we could easily develop a CLI tool that allowed us to
query and decode ganglia PB data for testing.

What do others think?

--Nick.



On Tue, Jan 14, 2014 at 4:42 PM, Devon H. O'Dell devon.od...@gmail.comwrote:

 I don't personally have any objections, but if this remains a pain
 point, perhaps this is something we can address differently? I think
 where I left off, XML parsing was the taking the most time; is that
 something that people are comfortable with changing (data format?)

 --dho

 2014/1/14 Nicholas Satterly nfsatte...@gmail.com:
  Given the performance benefits gained by Devon's work I will revert the
  patch that attempted to speed up metric summaries because it's causing
  grid-of-grids to fail (unless there are any objections) ...
 
 
 https://github.com/ganglia/monitor-core/commit/0705a5defa284e289004daf61ea390338719d5fb
 
  --Nick.
 
 
  On Tue, Dec 10, 2013 at 8:00 PM, Chris Burroughs 
 chris.burrou...@gmail.com
  wrote:
 
  On 12/08/2013 04:43 PM, Devon H. O'Dell wrote:
   This is a simple `perf top -p $PID` on one of of our gmetad nodes
   
   Samples: 1M of event 'cycles', Event count (approx.): 64115959770
  6.59%  libexpat.so.1.5.2  [.] 0x00011b8d
  4.77%  libganglia-3.6.0.so.0.0.0  [.] hashval
  2.62%  [kernel]   [k] __d_lookup
  2.21%  [kernel]   [k] _spin_lock
  2.14%  libc-2.12.so   [.] vfprintf
  1.61%  librrd.so.4.2.0[.] process_arg
  1.54%  libganglia-3.6.0.so.0.0.0  [.] hash_lookup
  1.46%  [kernel]   [k] __link_path_walk
  1.16%  libc-2.12.so   [.] __GI_strtod_l_internal
  1.11%  libc-2.12.so   [.] memcpy
  1.08%  libc-2.12.so   [.] _int_malloc
   
   So I suppose my intuition about xml parsing expense is off.  I have
not used
   perf as much as I should, if we were seeing similar rrd writing
contention
   should I literally see stat near the top?
   Ah, so to see what's really going on:
  
   perf record -e cpu-clock -g -p $PID
  
   Let that run for a minute or two. Then:
  
   perf report --sort=comm,dso,symbol -G
  
   If you don't have cpu-clock, cycles is OK, but you definitely are
   going to want to see the callgraph. The time in XML is mostly writing
   RRDs and you only see that digging down into the chain.
  
 
 
  For the list, Devon and I spoke in #ganglia and the high occurrence of
  libexpat in this sample seems to be an artifact of missing debug
 symbols.
 
 
 
 --
  Rapidly troubleshoot problems before they affect your business. Most IT
  organizations don't have a clear picture of how application performance
  affects their revenue. With AppDynamics, you get 100% visibility into
 your
  Java,.NET,  PHP application. Start your 15-day FREE TRIAL of
 AppDynamics
  Pro!
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
 
  ___
  Ganglia-developers mailing list
  Ganglia-developers@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-developers
 
 
 
 
  --
  gpg: using PGP trust model
  pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3
 8BD9
  uid  Nicholas Satterly (Debian Key) 
 nfsatte...@gmail.com
  sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
 




-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-14 Thread Nicholas Satterly
Given the performance benefits gained by Devon's work I will revert the
patch that attempted to speed up metric summaries because it's causing
grid-of-grids to fail (unless there are any objections) ...

https://github.com/ganglia/monitor-core/commit/0705a5defa284e289004daf61ea390338719d5fb

--Nick.


On Tue, Dec 10, 2013 at 8:00 PM, Chris Burroughs
chris.burrou...@gmail.comwrote:

 On 12/08/2013 04:43 PM, Devon H. O'Dell wrote:
  This is a simple `perf top -p $PID` on one of of our gmetad nodes
  
  Samples: 1M of event 'cycles', Event count (approx.): 64115959770
 6.59%  libexpat.so.1.5.2  [.] 0x00011b8d
 4.77%  libganglia-3.6.0.so.0.0.0  [.] hashval
 2.62%  [kernel]   [k] __d_lookup
 2.21%  [kernel]   [k] _spin_lock
 2.14%  libc-2.12.so   [.] vfprintf
 1.61%  librrd.so.4.2.0[.] process_arg
 1.54%  libganglia-3.6.0.so.0.0.0  [.] hash_lookup
 1.46%  [kernel]   [k] __link_path_walk
 1.16%  libc-2.12.so   [.] __GI_strtod_l_internal
 1.11%  libc-2.12.so   [.] memcpy
 1.08%  libc-2.12.so   [.] _int_malloc
  
  So I suppose my intuition about xml parsing expense is off.  I have
 not used
  perf as much as I should, if we were seeing similar rrd writing
 contention
  should I literally see stat near the top?
  Ah, so to see what's really going on:
 
  perf record -e cpu-clock -g -p $PID
 
  Let that run for a minute or two. Then:
 
  perf report --sort=comm,dso,symbol -G
 
  If you don't have cpu-clock, cycles is OK, but you definitely are
  going to want to see the callgraph. The time in XML is mostly writing
  RRDs and you only see that digging down into the chain.
 


 For the list, Devon and I spoke in #ganglia and the high occurrence of
 libexpat in this sample seems to be an artifact of missing debug symbols.


 --
 Rapidly troubleshoot problems before they affect your business. Most IT
 organizations don't have a clear picture of how application performance
 affects their revenue. With AppDynamics, you get 100% visibility into your
 Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics
 Pro!
 http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers




-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-06 Thread Nicholas Satterly
. They are running on (censored) right now, and we'll
 leave them running for a while to make sure they're good before pushing the
 patches upstream.

 In the process of doing this, I noticed that ganglia used a particularly
 poor method for reading its XML metrics from gmond: It initialized a
 1024-byte buffer, read into it, and if it would overflow, it would realloc
 the buffer with an additional 1024 bytes and try reading again. When
 dealing with XML files many megabytes in size, this caused many unnecessary
 reallocations. I modified this code to start with a 128KB buffer and double
 the buffer size when it runs out of space. (I made a similar change to the
 code for decompressing gzip'ed data that used a similar buffer sizing
 paradigm).

 After all these changes, both the interactive and RRD-writing processes
 spend most of their time in the hash table. I can continue improving
 Ganglia performance, but most of the low hanging fruit is now gone; at some
 me point it will require:

  * writing a version of librrd (this probably also means changing the rrd
 file format),
  * replacing the hash table in Ganglia with one that performs better,
  * changing the data serialization format from XML to one that is easier /
 faster to parse,
  * using a different data structure than a hash table for metrics
 hierarchies (probably a tree with metrics stored at each level in
 contiguous memory and an index describing each metric at each level)
  * refactoring gmetad and gmond into a single process that shares memory

 These are all longer-term projects, but I think that they'll probably
 eventually be useful.

 --





 *** This
 message originated from the Internet. Its originator may or may not be who
 they claim to be and the information contained in the message and any
 attachments may or may not be accurate.
 ***
 --

 Sponsored by Intel(R) XDK
 Develop, test and display web and hybrid apps with a single code base.
 Download it for free now!

 http://pubads.g.doubleclick.net/gampad/clk?id=111408631iu=/4140/ostg.clktrk

 -
 **
 This message originated from the Internet.  Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.

 **___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers


 -
 **
 This message originated from the Internet.  Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.
 **

  -
 ** This
 E-mail is confidential. It may also be legally privileged. If you are not
 the addressee you may not copy, forward, disclose or use any part of it. If
 you have received this message in error, please delete it and all copies
 from your system and notify the sender immediately by return E-mail.
 Internet communications cannot be guaranteed to be timely, secure, error or
 virus-free. The sender does not accept liability for any errors or
 omissions.
 ** SAVE
 PAPER - THINK BEFORE YOU PRINT!



 --
 Sponsored by Intel(R) XDK
 Develop, test and display web and hybrid apps with a single code base.
 Download it for free now!

 http://pubads.g.doubleclick.net/gampad/clk?id=111408631iu=/4140/ostg.clktrk
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers




-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
Sponsored by Intel(R) XDK 
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631iu=/4140/ostg.clktrk___
Ganglia

Re: [Ganglia-developers] [Ganglia-general] Grid of Grids Broken Again in 3.6.0? Is this a different problem?

2013-11-17 Thread Nicholas Satterly
Hi Adam,

Our experience was that the summary RRDs were actually generated but then
rarely updated. Only very occasionally would we see metrics suddenly get
written to the RRD and only for a few intervals and then there would be
large gaps again.

Do graphs based on the RRDs you are getting in your tests look right?

Regards,
Nick


On Fri, Nov 15, 2013 at 8:15 PM, Adam Compton acomp...@quantcast.comwrote:

  Nicholas, I'm the person who submitted #92. I've attempted to replicate
 the problem and I'm still seeing summary RRDs being written for the top
 grid in a grid-of-grids configuration (assuming you mean
 /var/lib/ganglia/rrds/__SummaryInfo__/*.rrd).

 Can you please share the configs you used to reproduce this issue? I'd
 like to fix the bug and submit a patch, but I don't know how to replicate
 the problem.

 Thanks,
 Adam



 On 11/3/13 2:04 PM, Nicholas Satterly wrote:

 Hi Bernard,

  I think this is the bug in federation that you might be thinking of as
 I've mentioned it before. I don't have a fix for this. It's quite a large
 patch and I've never looked at this part of the codebase before.

  Regards,
 Nick


 On Sun, Nov 3, 2013 at 5:10 PM, Bernard Li bern...@vanhpc.org wrote:

 My $0.02 is that Grid of Grids (federation) is still a widely used
 feature so we should attempt to fix it.

  Nick -- do you still have another outstanding pull request to fix a bug
 in federation?  If so, what's the hold up?  Just waiting for someone with
 authorization to accept it?

  Thanks!

  Bernard


  On Sat, Nov 2, 2013 at 5:14 PM, Nicholas Satterly 
 nfsatte...@gmail.comwrote:

  I have confirmed that this patch [1] broke writing of the root
 summaries for the top-level gmetad when in a grid-of-grids setup. What
 should we do? Revert the patch, attempt to debug it, or just log a github
 issue to track it for now?

  Regards,
 Nick

  [1] https://github.com/ganglia/monitor-core/pull/92


 On Tue, Sep 24, 2013 at 12:40 PM, Nicholas Satterly 
 nfsatte...@gmail.com wrote:

 Hi Illydth,

  You might have missed that the pull request that added the break back
 also added more logic to the endElement_GRID() function to fix
 double-writing of the last cluster. So yes, that break is meant to be there
 again. See https://github.com/ganglia/monitor-core/pull/73

  However, what isn't clear is why there is a new grid-of-grids
 problem. I suspect that it relates to this pull request but I haven't been
 able to confirm this yet. See
 https://github.com/ganglia/monitor-core/pull/92

  Regards,
 Nick


  On Fri, Sep 20, 2013 at 7:41 PM, Douglas Wagner 
 dougla...@gmail.comwrote:

 So the last time I tried this upgrade thing (3.1.7 - 3.4.0) I
 was getting no grid of grids information.  Ran across the fix with the 
 help
 of others on the list and documented it here:

 http://sourceforge.net/apps/phpbb/ganglia/viewtopic.php?f=4t=16p=28

  So now I've upgraded from 3.4.0 to 3.6.0.  I have 2 new clients
 (RHEL6) that I'm implementing.  Went through the build process and built
 out RPMs for RHEL6.

  Turned on GMOND and I'm not seeing either of the two systems
 reporting into the associated GMETAD.  The Web Interface isn't updating
 with the new boxes.

  As I start going back through some of my past issues, I ran back
 across this where in 3.4.0 Grid of Grids was broken.  And when I check the
 reported file and problem again I see the same old code (the break; at
 the end of the first switch block).

  Is this broken again in 3.6?  or is this the correct code and I
 should be looking somewhere else for why my new RHEL6 clients aren't
 reporting to my GMETAD system?

  --Illydth


 --
 LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
 SharePoint
 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
 includes
 Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.

 http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
 ___
 Ganglia-general mailing list
 ganglia-gene...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general




  --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
   Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3
 8BD9
 uid  Nicholas Satterly (Debian Key) 
 nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]




  --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
   Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3
 8BD9
 uid  Nicholas Satterly (Debian Key) 
 nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]



 --
 Android is increasing in popularity

Re: [Ganglia-developers] Riemann pull request for Ganglia

2013-11-13 Thread Nicholas Satterly
Yep. Done.

 On 12 Nov 2013, at 23:32, Bernard Li bern...@vanhpc.org wrote:

 Hi Nick:

 Cool -- do you think you can add a link to the page you created in the
 main Trac Wiki page?

 http://sourceforge.net/apps/trac/ganglia/wiki

 Thanks,

 Bernard

 On Tue, Nov 12, 2013 at 1:40 PM, Nicholas Satterly nfsatte...@gmail.com 
 wrote:
 And just to close the loop... Ganglia now gets a mention on the Riemann
 website http://riemann.io/clients.html

 --Nick


 On Tue, Nov 12, 2013 at 11:06 AM, Nicholas Satterly nfsatte...@gmail.com
 wrote:

 Thanks.

 Page added ...
 http://sourceforge.net/apps/trac/ganglia/wiki/riemann_integration

 --Nick.


 On Mon, Nov 11, 2013 at 10:50 PM, Bernard Li bern...@vanhpc.org wrote:

 Fixed.

 Cheers,

 Bernard


 On Mon, Nov 11, 2013 at 8:26 AM, Nicholas Satterly nfsatte...@gmail.com
 wrote:

 Hi,

 I've written a wiki page for trac/sourceforge but don't seem to have
 edit rights -- I can't see an Edit this Page button on any of the 
 Ganglia
 wiki pages (eg. https://sourceforge.net/apps/trac/ganglia/wiki) even 
 though
 I'm logged in as satterly.

 If someone could fix this that would be great. If that's too hard feel
 free to add the page yourself, if you can (file attached). A link to it 
 from
 the main page would be nice too.

 Thanks,
 Nick


 On Fri, Nov 8, 2013 at 5:29 PM, Jeff Buchbinder
 rufustfire...@gmail.com wrote:

 On Fri, Nov 8, 2013 at 12:27 PM, Bernard Li bern...@vanhpc.org wrote:

 Jeff:

 I'm talking about the Wiki hosted at SourceForge.  However I'm
 uncertain if that has been deprecated in favour of the new one on 
 GitHub.
 Vlad?


 I had been trying to migrate from the Sourceforge one to the Github
 wiki, but I'm not sure if we're *officially* designating the Github wiki 
 to
 be the authoritative source of Ganglia knowledge.

 Jeff




 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3
 8BD9
 uid  Nicholas Satterly (Debian Key)
 nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]



 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
 uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]



 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
 uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]


 --
 DreamFactory - Open Source REST  JSON Services for HTML5  Native Apps
 OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
 Free app hosting. Or install the open source package on any LAMP server.
 Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
 http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers


--
DreamFactory - Open Source REST  JSON Services for HTML5  Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Riemann pull request for Ganglia

2013-11-12 Thread Nicholas Satterly
And just to close the loop... Ganglia now gets a mention on the Riemann
website http://riemann.io/clients.html

--Nick


On Tue, Nov 12, 2013 at 11:06 AM, Nicholas Satterly nfsatte...@gmail.comwrote:

 Thanks.

 Page added ...
 http://sourceforge.net/apps/trac/ganglia/wiki/riemann_integration

 --Nick.


 On Mon, Nov 11, 2013 at 10:50 PM, Bernard Li bern...@vanhpc.org wrote:

 Fixed.

 Cheers,

 Bernard


 On Mon, Nov 11, 2013 at 8:26 AM, Nicholas Satterly 
 nfsatte...@gmail.comwrote:

 Hi,

 I've written a wiki page for trac/sourceforge but don't seem to have
 edit rights -- I can't see an Edit this Page button on any of the Ganglia
 wiki pages (eg. https://sourceforge.net/apps/trac/ganglia/wiki) even
 though I'm logged in as satterly.

 If someone could fix this that would be great. If that's too hard feel
 free to add the page yourself, if you can (file attached). A link to it
 from the main page would be nice too.

 Thanks,
 Nick


 On Fri, Nov 8, 2013 at 5:29 PM, Jeff Buchbinder rufustfire...@gmail.com
  wrote:

 On Fri, Nov 8, 2013 at 12:27 PM, Bernard Li bern...@vanhpc.org wrote:

 Jeff:

 I'm talking about the Wiki hosted at SourceForge.  However I'm
 uncertain if that has been deprecated in favour of the new one on GitHub.
  Vlad?


 I had been trying to migrate from the Sourceforge one to the Github
 wiki, but I'm not sure if we're *officially* designating the Github wiki to
 be the authoritative source of Ganglia knowledge.

 Jeff




 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
   Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3
 8BD9
 uid  Nicholas Satterly (Debian Key) 
 nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]





 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
   Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
 uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]




-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
DreamFactory - Open Source REST  JSON Services for HTML5  Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Riemann pull request for Ganglia

2013-11-07 Thread Nicholas Satterly
Hi developers,

I've done some work recently to add Riemann support to Ganglia for which
I've submitted a pull request [1]. We are currently using this in
production at the Guardian to alert in real-time off tens of thousands of
metrics. (You can see our config here
https://github.com/guardian/riemann-config )

It would be great if this was accepted by upstream as I know there is a lot
of interest in alerting off real-time metric data recently and this is a
solution that scales and makes use of a lot of the meta data that Ganglia
associates with a metric/host.

Feedback welcome.

Regards,
Nick

[1] https://github.com/ganglia/monitor-core/pull/124
--
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Grid of Grids Broken Again in 3.6.0? Is this a different problem?

2013-11-03 Thread Nicholas Satterly
Hi Bernard,

I think this is the bug in federation that you might be thinking of as I've
mentioned it before. I don't have a fix for this. It's quite a large patch
and I've never looked at this part of the codebase before.

Regards,
Nick


On Sun, Nov 3, 2013 at 5:10 PM, Bernard Li bern...@vanhpc.org wrote:

 My $0.02 is that Grid of Grids (federation) is still a widely used feature
 so we should attempt to fix it.

 Nick -- do you still have another outstanding pull request to fix a bug in
 federation?  If so, what's the hold up?  Just waiting for someone with
 authorization to accept it?

 Thanks!

 Bernard


 On Sat, Nov 2, 2013 at 5:14 PM, Nicholas Satterly nfsatte...@gmail.comwrote:

 I have confirmed that this patch [1] broke writing of the root summaries
 for the top-level gmetad when in a grid-of-grids setup. What should we do?
 Revert the patch, attempt to debug it, or just log a github issue to track
 it for now?

 Regards,
 Nick

 [1] https://github.com/ganglia/monitor-core/pull/92


 On Tue, Sep 24, 2013 at 12:40 PM, Nicholas Satterly nfsatte...@gmail.com
  wrote:

 Hi Illydth,

 You might have missed that the pull request that added the break back
 also added more logic to the endElement_GRID() function to fix
 double-writing of the last cluster. So yes, that break is meant to be there
 again. See https://github.com/ganglia/monitor-core/pull/73

 However, what isn't clear is why there is a new grid-of-grids problem. I
 suspect that it relates to this pull request but I haven't been able to
 confirm this yet. See https://github.com/ganglia/monitor-core/pull/92

 Regards,
 Nick


 On Fri, Sep 20, 2013 at 7:41 PM, Douglas Wagner dougla...@gmail.comwrote:

 So the last time I tried this upgrade thing (3.1.7 - 3.4.0) I was
 getting no grid of grids information.  Ran across the fix with the help of
 others on the list and documented it here:

 http://sourceforge.net/apps/phpbb/ganglia/viewtopic.php?f=4t=16p=28

 So now I've upgraded from 3.4.0 to 3.6.0.  I have 2 new clients (RHEL6)
 that I'm implementing.  Went through the build process and built out RPMs
 for RHEL6.

 Turned on GMOND and I'm not seeing either of the two systems reporting
 into the associated GMETAD.  The Web Interface isn't updating with the new
 boxes.

 As I start going back through some of my past issues, I ran back across
 this where in 3.4.0 Grid of Grids was broken.  And when I check the
 reported file and problem again I see the same old code (the break; at
 the end of the first switch block).

 Is this broken again in 3.6?  or is this the correct code and I should
 be looking somewhere else for why my new RHEL6 clients aren't reporting to
 my GMETAD system?

 --Illydth


 --
 LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
 SharePoint
 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
 includes
 Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.

 http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
 ___
 Ganglia-general mailing list
 ganglia-gene...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general




 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
   Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3
 8BD9
 uid  Nicholas Satterly (Debian Key) 
 nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]




 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
   Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
 uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
 
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]



 --
 Android is increasing in popularity, but the open development platform
 that
 developers love is also attractive to malware creators. Download this
 white
 paper to learn more about secure code signing practices that can help keep
 Android apps secure.

 http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers





-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
Android is increasing in popularity, but the open development platform that
developers

Re: [Ganglia-developers] [Ganglia-general] Grid of Grids Broken Again in 3.6.0? Is this a different problem?

2013-11-02 Thread Nicholas Satterly
I have confirmed that this patch [1] broke writing of the root summaries
for the top-level gmetad when in a grid-of-grids setup. What should we do?
Revert the patch, attempt to debug it, or just log a github issue to track
it for now?

Regards,
Nick

[1] https://github.com/ganglia/monitor-core/pull/92


On Tue, Sep 24, 2013 at 12:40 PM, Nicholas Satterly nfsatte...@gmail.comwrote:

 Hi Illydth,

 You might have missed that the pull request that added the break back also
 added more logic to the endElement_GRID() function to fix double-writing of
 the last cluster. So yes, that break is meant to be there again. See
 https://github.com/ganglia/monitor-core/pull/73

 However, what isn't clear is why there is a new grid-of-grids problem. I
 suspect that it relates to this pull request but I haven't been able to
 confirm this yet. See https://github.com/ganglia/monitor-core/pull/92

 Regards,
 Nick


 On Fri, Sep 20, 2013 at 7:41 PM, Douglas Wagner dougla...@gmail.comwrote:

 So the last time I tried this upgrade thing (3.1.7 - 3.4.0) I was
 getting no grid of grids information.  Ran across the fix with the help of
 others on the list and documented it here:

 http://sourceforge.net/apps/phpbb/ganglia/viewtopic.php?f=4t=16p=28

 So now I've upgraded from 3.4.0 to 3.6.0.  I have 2 new clients (RHEL6)
 that I'm implementing.  Went through the build process and built out RPMs
 for RHEL6.

 Turned on GMOND and I'm not seeing either of the two systems reporting
 into the associated GMETAD.  The Web Interface isn't updating with the new
 boxes.

 As I start going back through some of my past issues, I ran back across
 this where in 3.4.0 Grid of Grids was broken.  And when I check the
 reported file and problem again I see the same old code (the break; at
 the end of the first switch block).

 Is this broken again in 3.6?  or is this the correct code and I should be
 looking somewhere else for why my new RHEL6 clients aren't reporting to my
 GMETAD system?

 --Illydth


 --
 LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
 SharePoint
 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
 includes
 Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.

 http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
 ___
 Ganglia-general mailing list
 ganglia-gene...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general




 --
 gpg: using PGP trust model
 pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
   Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
 uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
 sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]




-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Possibility of using different serialization format than XDR

2013-07-30 Thread Nicholas Satterly
 Further improvements could probably  be had in the arena of node
 multi-tenancy and/or arbitrary node
 grouping/clustering.

Could you expand on what you mean by multi-tenancy, please? I'm curious.

--Nick.

On 29 Jul 2013, at 19:21, Dave Rawks d...@pandora.com wrote:

 I'm still trying to figure out what you're trying to improve here? XDR
 seems like a fine, standard, lightweight serialization protocol to use.
 It is already implemented and we've already got some protocol handling
 for backwards compat for really old ganglia monitor clients. What is
 there to gain from switching aside from having some new and shiny that
 needs to be supported in addition to the existing stuff? We aren't
 serializing any custom data types or references or anything aside from
 some floats, ints, and a couple of strings. XDR compute overhead is not
 hurting performance, especially on modern hardware, the payloads aren't
 very big and the tuning of various check timings and metric validity
 timings further reduces the amount of chatter on the wire.

 If you want to introduce some more modern code to ganglia I think adding
 support for pushing gmond communications into a modern pub/sub message
 queue framework. I've never heard anybody have problems with our
 serialization, but there is frequent and often confusing troubleshooting
 around multicast vs unicast and the various
 infrastructural/configuration tweaks to make the most out of those.
 Further improvements could probably be had in the arena of node
 multi-tenancy and/or arbitrary node grouping/clustering.

 Maybe I'm missing something that you've said or implied already, but
 this just seems like change for the sake of change.

 -Dave


 On 07/28/2013 02:09 PM, Nikhil wrote:

 Hi,

 Thanks for response.

 I see there is no averseness to the idea of considering different
 serialization format/protocol.

 Before we have any contribution in terms of code/specifications, what
 would be the ideal choice among these :
 http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats for 
 choosing
 the serialization format over the current XDR implementation in ganglia?

 As in like what is the current payload by XDR and what we should not
 intend to cross over, the performance overhead in processing and
 storing, the availability of libraries and ease of use being some of
 them that comes to thought of discussion.

 As Dave also mentions platform agnostic, portability (endianness?) and
 efficiency are also of the critical things to be considered. While ASN.1
 does offer all of this, some of the others that I wanted to consider are
 :  MessagePack and UBJson. Formats specs are described here for
 MessagePack
 http://wiki.msgpack.org/display/MSGPACK/Format+specification and for
 UBJson http://ubjson.org http://ubjson.org/

 Let me know what do you all think would be the ideal choice.

 Thanks.



 On Sat, Jul 27, 2013 at 6:13 AM, Vladimir Vuksan vli...@veus.hr
 mailto:vli...@veus.hr wrote:

I am not necessarily opposed to it if it's implemented in such a way not
to break backwards compatibility. Someone would need to contribute some
code.

Vladimir

On Fri, 26 Jul 2013, Dave Rawks wrote:

 I'm curious to hear what you think is going to be more efficient,
 platform agnostic and portable than XDR? ASN1 would be the only
thing I
 would even consider using instead, but it is arguable whether it
would
 be worth the pain of supporting more than one serialization
format and
 it certainly doesn't seem sane to break all backwards
compatibility to
 switch to something new unilaterally. ASN1 /might/ be a reasonable
 alternative to XDR, but I don't see what advantages this could
possibly
 bring.

 -Dave

 On 7/26/13 10:46 AM, Nikhil wrote:
 Hi,

 Considering that we have better and compute efficient and binary
 serialization open formats out there . How hard would it to make
Ganglia
 use them instead of XDR ?
 Can the serialization format engines be pluggable, instead of being
 closely integrated with XDR? Is it still worth continuing to
stick with XDR?

 The intention is to understand and see the possibility and have a
 discussion what could be best to go with, if its appropriate.

 I am really hoping to see the reply from the authors of ganglia
core :-)

 Thanks,
 Nikhil


 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] coverity

2013-07-24 Thread Nicholas Satterly
Hi Chris,

I think it's a good idea. There are definitely some memory leaks that it
would be good to track down. Maybe coverity could help. It's worth a try at
least.

--Nick.


On Tue, Jul 23, 2013 at 11:25 AM, Chris Burroughs chris.burrou...@gmail.com
 wrote:

 coverity offers free scanning or open source projects.  Is there any
 interest in adding the ganglia C code there?  I think all that's
 required is one of the developers clicking 'sign up'.

 http://scan.coverity.com/


 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers




-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Problems with GMOND leaking memory

2013-07-03 Thread Nicholas Satterly
Please send a copy of your gmond.conf file to the list. Your explanation of
what you changed is difficult to follow.

Regards
Nick

On 3 Jul 2013, at 02:42, Valter Silva valter.si...@movile.com wrote:

I setup *gmond* with rpmbuild ganglia.spec, for centOS 5.9 and centOS 6.4
with ganglia.3.6.tar.gz.

Everything looks fine, but when I didn't setup *deaf=yes *and didn't remove
the related configuration like *listen* in *tcp *or *udp* the memory jump
from 9MB to 10GB of memory using.

And this crash many of my servers.
Any idea why this happen ? And why is that ?

--
Atenciosamente,
logo.gif http://www.movile.com/ Valter Silva
Analista de Infraestrutura
Tel: +55 19. 9122-1822
Skype: valter.silva.movile
valter.si...@movile.com
facebook.png 
http://facebook.com/moviletwitter.pnghttp://www.twitter.com/movile
linkedin.png http://www.linkedin.com/company/movile
pinterest.pnghttp://www.pinterest.com/movile
great-places-to-work.png
environment.gif

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia in the EC2 cloud

2013-06-18 Thread Nicholas Satterly
Hi Demetri,

Could you try building from my personal development branch? It is an
up-to-date merge with Ganglia master with one additional potential bug fix (
https://github.com/satterly/monitor-core/commit/ed3ad9d57b1d582503ef0104e17f7919044c7617
).

If this version runs without segfaulting I'll push it to the ganglia
feature/cloud branch.

And thanks for the pull request. It seems that it needs to be rebased with
master. However, if your testing of the above branch proves successful we
can rebase your patch against that.

Let me know how you get on.

Regards,
Nick


On Mon, Jun 17, 2013 at 11:53 PM, Demetri Mouratis dmour...@gmail.comwrote:

 Nicholas Satterly nfsatterly at gmail.com writes:
 
  [1]
 https://github.com/ganglia/monitor-core/compare/master...feature/cloud
 


 Nick,

 Thanks for your work in implementing this feature.  I'm in the same boat
 with a
 larg(ish) EC2 (VPC) deployment and sorely missing ganglia in this new
 environment.

 I've found and fixed one bug pertaining to localtime versus GMT in the EC2
 apr
 request:

 https://github.com/ganglia/monitor-core/pull/112

 Amazon expects all timestamps to be in GMT.  Some of my hosts have non-GMT
 set
 localtimes (don't ask).

 Now I'm facing a consistent sefgfault when the number of nodes in the
 cluster is
 large (= 17).

 The error looks like:

 [discovery.ec2] Found 17 matching instances [discovery.ec2] adding
 i-10ad3c25,
 udp send channel private_ip 10.10.1.211:8649 [discovery.ec2] adding
 i-34296506,
 udp send channel private_ip 10.10.1.204:8649 [discovery.ec2] adding
 i-1894ff2a,
 udp send channel private_ip 10.10.1.240:8649 [discovery.ec2] adding
 i-1a94ff28,
 udp send channel private_ip 10.10.1.241:8649 [discovery.ec2] adding
 i-cc99f2fe,
 udp send channel private_ip 10.10.1.214:8649 [discovery.ec2] adding
 i-c81c8dfd,
 udp send channel private_ip 10.10.2.115:8649 [discovery.ec2] adding
 i-a2d36990,
 udp send channel private_ip 10.10.1.116:8649 [discovery.ec2] adding
 i-24235016,
 udp send channel private_ip 10.10.1.234:8649 [discovery.ec2] adding
 i-2401bc11,
 udp send channel private_ip 10.10.2.216:8649 [discovery.ec2] adding
 i-2a235018,
 udp send channel private_ip 10.10.1.235:8649 [discovery.ec2] adding
 i-3a01bc0f,
 udp send channel private_ip 10.10.2.217:8649 [discovery.ec2] adding
 i-3801bc0d,
 udp send channel private_ip 10.10.2.218:8649 [discovery.ec2] adding
 i-d27015e7,
 udp send channel private_ip 10.10.2.164:8649 [discovery.ec2] adding
 i-2823501a,
 udp send channel private_ip 10.10.1.238:8649 [discovery.ec2] adding
 i-3a07620f,
 udp send channel private_ip 10.10.2.177:8649 [discovery.ec2] adding
 i-422a4f77,
 udp send channel private_ip 10.10.2.64:8649 [discovery.ec2] adding
 i-3890f10a,
 udp send channel private_ip 10.10.1.102:8649 .  .  .

 [discovery.ec2] Refreshing node list...  [discovery.cloud] access
 key=AKIAJNY4GBUKJRXY4JDA, secret
 key=DxvJ
 [discovery.ec2] using host_type [private_ip], tags [environment= TEST],
 groups
 [], availability_zones [] [discovery.ec2] using endpoint
 ec2.us-west-2.amazonaws.com - ec2.us-west-2.amazonaws.com [discovery.ec2]
 URL-encoded API request ec2.us-west-2.amazonaws.com?
 AWSAccessKeyId=AKIAJNY4GBUKJRXY4JDAAction=DescribeInstancesFilter.1.Name
 =
 instance-state-
 nameFilter.1.Value=runningFilter.2.Name
 =tag%3AenvironmentFilter.2.Value=
 TESTSignatureMet
 hod=HmacSHA256SignatureVersion=2Timestamp=2013-06-
 17T22%3A41%3A39ZVersion=2012-08-
 15Signature=
 O7qmbgbbZnMk8njNQiEo4YLlDIVhM9NAF4171NoMTj4%3D [discovery.ec2] HTTP
 response code 200, 99664 bytes retrieved Segmentation fault

 The crash is reproducible, happens in about 2 minutes after start and can
 be
 avoided by renaming one of the hosts environment= tags to remove it from
 the
 cluster.

 I haven't been able to come up with a fix for this issue but I'm
 sufficiently
 out of my depth at this point to ask for help.

 Thanks.

 -D



 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers




-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Sending aggregated cluster metrics to Graphite

2013-04-15 Thread Nicholas Satterly
Hi,

We're looking at using the support for sending ganglia metrics to graphite
however I've just worked out that aggregated cluster are not sent.

Can anyone explain why this might be the case? Could it be because you
would actually need to send two metrics for every cluster metric ie. the
num and sum? Even so, it that an issue?

Thanks,
Nick
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Ganglia in the EC2 cloud

2013-02-10 Thread Nicholas Satterly
Hi,

A few months back I mentioned that I'd modified gmond to dynamically
discover its cluster peers by using the EC2 API to update the udp send
channel configuration. Well, we've been running this in production at
the Guardian for more than 3 months and it's been a great success.

I think this would be a very useful addition to the Ganglia agent so
I'm submitting the code to a separate branch called feature/cloud
for review and feedback.

Changes to gmond.c have been kept to a minimum [1] and it's all
conditionally compiled using --enable-cloud at the moment. The
cloud.c code which does most of the work will need to be refactored to
move the EC2-specific code into a separate function so that it can be
extended to use other (more standards-based) cloud API's that are
available. eg. DeltaCloud and CIMI.

I've written a wiki page that explains this stuff in more detail here
... https://github.com/ganglia/monitor-core/wiki/EC2-Discovery

As I said, feedback (and enhancement requests) very welcome.

Regards,
Nick

[1] https://github.com/ganglia/monitor-core/compare/master...feature/cloud

-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia gmetad thread stuck at TCP SYN SENT

2013-02-06 Thread Nicholas Satterly
Thanks Kostas and Jonathan for your suggestions.

I spent a quite a few hours on this and in the end decided that the
gmetad was working as designed and that adding a specific timeout on a
socket connection wasn't needed.

This is because the kernel already times out socket connections that
fail, or rather it times failures out and then retries several times
until it finally gives up. The data collection thread then sleeps for
a bit before trying again.

My specific problem was that after sleeping the data thread was just
retrying the same host it failed on last time which was the instance
that had been terminated. This would inevitably fail at some point and
the data thread would appear to hang.

The solution was to modify gmetad to poll the most recently launched
instance by looking at the GMOND_STARTED value which works well.

Hopefully I'll find time to submit this code in a branch in the coming
days/weeks.

--Nick.

On Tue, Feb 5, 2013 at 5:28 PM, Kostas Georgiou
k.georg...@atreides.org.uk wrote:
 On Fri, Jan 25, 2013 at 12:45:10PM +, Nicholas Satterly wrote:

 Does anyone have any ideas of how the connection could at least be
 timed out? Keep in mind that the gmetad is multi-threaded so I'm
 pretty sure that rules out the use of SIGALRM.
 ..,
 How could a 2 second timeout be enforced on this connect()?

 You set O_NONBLOCK on the socket before the connect, run select
 with a 2 sec timeout on the socket from there if you have a connection
 (depending on if select hit the timeout or not and what getsockopt for
 SO_ERROR returns) you set the socket back to blocking.

 Did you see any failures when the machine went away after the connect?
 I can't remember if we timeout while we are reading data from the
 scoket.

 --
 Free Next-Gen Firewall Hardware Offer
 Buy your Sophos next-gen firewall before the end March 2013
 and get the hardware for free! Learn more.
 http://p.sf.net/sfu/sophos-d2d-feb
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers



-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
  Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid  Nicholas Satterly (Debian Key) nfsatte...@gmail.com
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Ganglia gmetad thread stuck at TCP SYN SENT

2013-01-25 Thread Nicholas Satterly
Hi,

We have a situation here where developers deploy a new version of
their app in EC2 by spinning up instances running the new version,
adding them to the auto-scaling group and once all looks good just
terminating the instances with the old app version.

Works great for them, however the ganglia gmetad's polling that
cluster seem to hang during the socket connect to the old instances in
SYN SENT status if they are in the middle of establishing the TCP
connection just as the instance is being terminated.

Does anyone have any ideas of how the connection could at least be
timed out? Keep in mind that the gmetad is multi-threaded so I'm
pretty sure that rules out the use of SIGALRM.

I think the relevant code block is in the g_tcp_socket_new() function
in lib/tcp.c here...

/* Connect */
  rv = connect(sockfd, s-sa, sizeof(s-sa));
  if (rv != 0)
{
  close (sockfd);
  free (s);
  return NULL;
}

How could a 2 second timeout be enforced on this connect()?

Thanks in advance.

--Nick.

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] override_ip causing gmond to crash

2012-10-22 Thread Nicholas Satterly
I believe this was a problem caused by using the wrong APR pool in the
apr_pstrcat() call.

https://github.com/ganglia/monitor-core/pull/62

--Nick.
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] dynamic discovery of hosts in EC2

2012-10-12 Thread Nicholas Satterly
Hi Paul,

Thanks for your feedback. That was the best solution I came up with too
so I've added this in and it seems to work well.

An added side-effect is that the file can also be used to troubleshoot if
you need to know exactly where the gmond is sending its metrics too without
having to run the agent in debug mode.

Regards,
Nick

On Wed, Oct 10, 2012 at 1:38 PM, Paul Hewlett paul.hewl...@arm.com wrote:

  ** **

 Hi Nick

 ** **

 Modify gmond to write a special file /etc/ganglia/ec2.conf with the
 discovered instances and then modify gmetric to read that file – using a
 cmdline option perhaps

 This change should be lightweight enough for gmetric

 ** **

 Regards

 ** **

 --

 Paul Hewlett  X25250

 http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/

 ARM Ltd

 110 Fulbourn Road, Cambridge, CB1 9NJ

 Tel: +44 (0)1223 405923

 skype: paul-at-arm

 www.arm.com

 ** **

 ** **

 *From:* Nicholas Satterly [mailto:nfsatte...@gmail.com]
 *Sent:* 10 October 2012 13:06
 *To:* ganglia-developers@lists.sourceforge.net
 *Subject:* [Ganglia-developers] dynamic discovery of hosts in EC2

 ** **

 Hi,

 ** **

 I've been hacking on the ganglia gmond code to get the agent to
 auto-discover other servers in its cluster when running in EC2 [1]. It
 works a lot like the way elasticsearch does [2].

 ** **

 Does anyone have any suggestions on how I might get gmetric to work in a
 scalable way if it can't rely on the UDP send destinations being listed in
 the gmond.conf file? It really is a show-stopper for us at the moment which
 is unfortunate because gmond would work brilliantly in EC2 with these
 changes.

 ** **

 Thanks in advance,
 Nick

 ** **

 [1] https://github.com/satterly/monitor-core

 [2]
 http://www.elasticsearch.org/guide/reference/modules/discovery/ec2.html
  and
 http://www.elasticsearch.org/tutorials/2011/08/22/elasticsearch-on-ec2.html
 

  

 ** **

 ** **

 ** **

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] dynamic discovery of hosts in EC2

2012-10-12 Thread Nicholas Satterly
It's currently writing to /var/lib/ganglia/gmond-ec2.conf but I'm
flexible...

https://github.com/satterly/monitor-core/blob/master/lib/libgmond.c#L614

--Nick.

On Fri, Oct 12, 2012 at 4:02 PM, Paul Hewlett paul.hewl...@arm.com wrote:

 Hi Alex

 You are correct - it should be /var/lib/ganglia/ec2.conf or maybe even
 /tmp/ganglia?

 Also If the data does not need to persist between reboots then it could be
 /dev/shm/ganglia/ec2.conf...

 Regards



 --
 Paul Hewlett  X25250
 http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/
 ARM Ltd
 110 Fulbourn Road, Cambridge, CB1 9NJ
 Tel: +44 (0)1223 405923
 skype: paul-at-arm
 www.arm.com



  -Original Message-
  From: Alex Dean [mailto:a...@crackpot.org]
  Sent: 12 October 2012 15:56
  To: ganglia-developers@lists.sourceforge.net
  Subject: Re: [Ganglia-developers] dynamic discovery of hosts in EC2
 
 
  On Oct 10, 2012, at 7:38 AM, Paul Hewlett wrote:
 
  
   Hi Nick
  
   Modify gmond to write a special file /etc/ganglia/ec2.conf with the
  discovered instances and then modify gmetric to read that file - using a
  cmdline option perhaps
   This change should be lightweight enough for gmetric
 
  I haven't looked at this code specifically, but just a general
  suggestion: A process shouldn't typically be able to write to files in
  /etc. Any data that gmond needs to write out should probably go
  somewhere in /var.
 
  alex
 
 
  
  --
  Don't let slow site performance ruin your business. Deploy New Relic APM
  Deploy New Relic app performance management and know exactly
  what is happening inside your Ruby, Python, PHP, Java, and .NET app
  Try New Relic at no cost today and get our sweet Data Nerd shirt too!
  http://p.sf.net/sfu/newrelic-dev2dev
  ___
  Ganglia-developers mailing list
  Ganglia-developers@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-developers


 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium.  Thank you.



 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Ganglia-developers mailing list
 Ganglia-developers@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] dynamic discovery of hosts in EC2

2012-10-10 Thread Nicholas Satterly
Hi,

I've been hacking on the ganglia gmond code to get the agent to
auto-discover other servers in its cluster when running in EC2 [1]. It
works a lot like the way elasticsearch does [2].

To get it to work, you add the following stanzas to the gmond.conf...

/* Dynamic discovery for cloud environments */
cloud {
  aws_access_key = INSERT_YOUR_ACCESS_KEY
  aws_secret_key = INSERT_YOUR_SECRET_KEY
}

discovery {
  type = ec2 /* only ec2 API supported so far */
# endpoint = https://ec2.amazonaws.com /* only required if in us-east-1 */
  tags = { stage:dev } /* stage:prod */
  groups = { quicklaunch-1 } /* security groups */
  availability_zones = { us-east-1d } /* eg. eu-west-1a */
  discover_every = 90
  host_type = public_dns /* private_ip, public_ip, private_dns, public_dns
*/
  port = 8649
}

Then at start-up, gmond uses the filter defined by combining the tags,
groups and availability zones that you define in the discovery section to
find the list of matching EC2 instances using the EC2 API.

Whenever a new instance comes up (as part of a scaling group, or whatever)
and sends metrics to existing instances it triggers those gmonds to do
another discovery which should find the new server.

It will also do a rediscovery every so often (by default every 90 seconds)
so that instances that have been terminated are removed from its list of
UDP send destinations.

This all works really well so far. The only thing I can't work out is how
to support gmetric. If I understand gmetric correctly it works out what the
UDP send destinations should be by reading in the gmond.conf file. However,
if gmond is using EC2 discovery there are no static destinations listed.
One solution might be for gmetric to query the EC2 API for the list the
same way gmond does but this would add quite an overhead to a lightweight
CLI.

Also, we use gmetric quite a lot (called 1000's of times a minute) on some
servers which would not scale if each gmetric exec had to query the EC2 API
first.

Does anyone have any suggestions on how I might get gmetric to work in a
scalable way if it can't rely on the UDP send destinations being listed in
the gmond.conf file? It really is a show-stopper for us at the moment which
is unfortunate because gmond would work brilliantly in EC2 with these
changes.

Thanks in advance,
Nick

[1] https://github.com/satterly/monitor-core
[2] http://www.elasticsearch.org/guide/reference/modules/discovery/ec2.html
 and
http://www.elasticsearch.org/tutorials/2011/08/22/elasticsearch-on-ec2.html
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] override_ip causing gmond to crash

2012-09-28 Thread Nicholas Satterly
Hi,

The version in APR instead of homegrown #49 is causing still causing
corruption of the host name field on the server that I was having problems
with before [1]. The current version in github is ...

 cb-msg.Ganglia_value_msg_u.gstr.metric_id.host =
apr_pstrcat(gm_pool, (char *)( override_ip != NULL ? override_ip :
override_hostname ), :, (char *) override_hostname, NULL);

I've slightly modified the above version to the following and it seems to
work ok...

override_ip = override_ip != NULL ? override_ip :
override_hostname;
cb-msg.Ganglia_value_msg_u.gstr.metric_id.host =
apr_pstrcat(gm_pool, override_ip, :, override_hostname, NULL);

I assume there is some subtle difference between the two that someone on
the developer list could explain to me.

Do people think this would be robust enough to work is all cases?

Regards,
Nick

[1] The HOST NAME tag was corrupted as follows...

HOST NAME=U\xc2\xa69 IP= REPORTED=1348821943 TN=20 TMAX=20
DMAX=86400 LOCATION=unspecified GMOND_STARTED=0 TAGS=os:Linux
datacentre:dev virtual:physical/HOST

On Thu, Sep 27, 2012 at 10:23 AM, Nicholas Satterly nfsatte...@gmail.comwrote:

 Paul, thanks for that. However, I'd be more inclined to get the APR
 version working as it should.

 Vladimir, were there specific bug reports for gmond crashing? Or any more
 information to help us narrow down what the root cause may have been?

 --Nick.

 On Wed, Sep 26, 2012 at 9:20 AM, Paul Hewlett paul.hewl...@arm.comwrote:

  Hi Nicholas

 ** **

 The +1 should be +2 in the malloc() call – one for the terminating null
 and one for the ‘:’ character.

 ** **

 Regards

 ** **

 ** **

 --

 Paul Hewlett  X25250

 http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/

 ARM Ltd

 110 Fulbourn Road, Cambridge, CB1 9NJ

 Tel: +44 (0)1223 405923

 skype: paul-at-arm

 www.arm.com

 ** **

 ** **

 *From:* Nicholas Satterly [mailto:nfsatte...@gmail.com]
 *Sent:* 26 September 2012 00:49
 *To:* ganglia-developers@lists.sourceforge.net
 *Subject:* [Ganglia-developers] override_ip causing gmond to crash

 ** **

 Hi,

 ** **

 I've discovered that on some of our systems (perhaps only half a dozen
 out of 500 or so) gmond crashes if the override_ip configuration option
 is set. I've worked out that the problem is something to do with this block
 of code...

 ** **

 #if 1

 char* tmpstr = malloc( strlen(( override_ip != NULL ?
 override_ip : override_hostname )) + strlen( override_hostname ) + 1 );**
 **

 strcpy (tmpstr, (char *)( override_ip != NULL ?
 override_ip : override_hostname ) );

 strcat (tmpstr, :);

 strcat (tmpstr, (char *) override_hostname);

 ** **

 cb-msg.Ganglia_value_msg_u.gstr.metric_id.host = tmpstr;
 

 #endif

 #if 0

 cb-msg.Ganglia_value_msg_u.gstr.metric_id.host =
 apr_pstrcat(gm_pool, (char *)( override_ip != NULL ? override_ip :
 override_hostname ), :, (char *) override_hostname, NULL);

 #endif

 ** **

 What I'm trying to understand at the moment is why the apr_pstrcat
 version is #if 0 commented out when it seems to work OK during my testing.
 

 ** **

 Thanks,

 Nick

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.



--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] override_ip causing gmond to crash

2012-09-27 Thread Nicholas Satterly
Paul, thanks for that. However, I'd be more inclined to get the APR version
working as it should.

Vladimir, were there specific bug reports for gmond crashing? Or any more
information to help us narrow down what the root cause may have been?

--Nick.

On Wed, Sep 26, 2012 at 9:20 AM, Paul Hewlett paul.hewl...@arm.com wrote:

  Hi Nicholas

 ** **

 The +1 should be +2 in the malloc() call – one for the terminating null
 and one for the ‘:’ character.

 ** **

 Regards

 ** **

 ** **

 --

 Paul Hewlett  X25250

 http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/

 ARM Ltd

 110 Fulbourn Road, Cambridge, CB1 9NJ

 Tel: +44 (0)1223 405923

 skype: paul-at-arm

 www.arm.com

 ** **

 ** **

 *From:* Nicholas Satterly [mailto:nfsatte...@gmail.com]
 *Sent:* 26 September 2012 00:49
 *To:* ganglia-developers@lists.sourceforge.net
 *Subject:* [Ganglia-developers] override_ip causing gmond to crash

 ** **

 Hi,

 ** **

 I've discovered that on some of our systems (perhaps only half a dozen out
 of 500 or so) gmond crashes if the override_ip configuration option is
 set. I've worked out that the problem is something to do with this block of
 code...

 ** **

 #if 1

 char* tmpstr = malloc( strlen(( override_ip != NULL ?
 override_ip : override_hostname )) + strlen( override_hostname ) + 1 );***
 *

 strcpy (tmpstr, (char *)( override_ip != NULL ?
 override_ip : override_hostname ) );

 strcat (tmpstr, :);

 strcat (tmpstr, (char *) override_hostname);

 ** **

 cb-msg.Ganglia_value_msg_u.gstr.metric_id.host = tmpstr;*
 ***

 #endif

 #if 0

 cb-msg.Ganglia_value_msg_u.gstr.metric_id.host =
 apr_pstrcat(gm_pool, (char *)( override_ip != NULL ? override_ip :
 override_hostname ), :, (char *) override_hostname, NULL);

 #endif

 ** **

 What I'm trying to understand at the moment is why the apr_pstrcat
 version is #if 0 commented out when it seems to work OK during my testing.
 

 ** **

 Thanks,

 Nick

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] override_ip causing gmond to crash

2012-09-25 Thread Nicholas Satterly
Hi,

I've discovered that on some of our systems (perhaps only half a dozen out
of 500 or so) gmond crashes if the override_ip configuration option is
set. I've worked out that the problem is something to do with this block of
code...

#if 1
char* tmpstr = malloc( strlen(( override_ip != NULL ?
override_ip : override_hostname )) + strlen( override_hostname ) + 1 );
strcpy (tmpstr, (char *)( override_ip != NULL ? override_ip
: override_hostname ) );
strcat (tmpstr, :);
strcat (tmpstr, (char *) override_hostname);

cb-msg.Ganglia_value_msg_u.gstr.metric_id.host = tmpstr;
#endif
#if 0
cb-msg.Ganglia_value_msg_u.gstr.metric_id.host =
apr_pstrcat(gm_pool, (char *)( override_ip != NULL ? override_ip :
override_hostname ), :, (char *) override_hostname, NULL);
#endif

What I'm trying to understand at the moment is why the apr_pstrcat
version is #if 0 commented out when it seems to work OK during my testing.

Thanks,
Nick
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers