Re: [Ganglia-developers] gmond -m shows Python error, sometimes

2008-07-07 Thread Carlo Marcelo Arenas Belon
On Mon, Jul 07, 2008 at 10:18:09AM +0200, Ulf wrote:
 Hi,
 
 on my Linux testbox SLES10 SP2 64Bit (python-2.4.2-18.13), I get the 
 following error.
 gmond 3.1.0.1527
 # gmond -m
 [...]
 dev-rootvg-usr-disk_usedUsed disk space (module python_module)
 swap_free   Amount of available swap memory (module mem_module)
 Exception in thread Thread-1:
 Traceback (most recent call last):
   File /usr/lib64/python2.4/threading.py, line 442, in __bootstrap
 self.run()
   File /usr/lib64/ganglia/python_modules/tcpconn.py, line 260, in run
 self.popenChild.wait()
   File /usr/lib64/python2.4/popen2.py, line 94, in wait
 pid, sts = os.waitpid(self.pid, 0)
 OSError: [Errno 10] No child processes

so the netstat command that was started by that module got killed somehow.
what version of python 2.4 do you have installed?, and could it be that you
have a lot of connections open in that server?

 The problem seems to be an timing problem, as the error occurs only every 
 second  or third call of gmond -m. When the error is not shown, gmond -m 
 waits some time after the last line swap_free.

that is a surprise (at least for me), gmond -m shouldn't need to wait as it is
just collecting available metrics, but in this case is probably that it is
blocked waiting for the thread that started that netstat call to finish, and
so both issues are related.

as a quick workaround to speed up that call (and therefore reduce the latency
and probability of timeouts), try the attached patch.

Carlo
---
Index: gmond/python_modules/network/tcpconn.py
===
--- gmond/python_modules/network/tcpconn.py (revision 1527)
+++ gmond/python_modules/network/tcpconn.py (working copy)
@@ -246,7 +246,7 @@
 
 #Call the netstat utility and split the output into separate lines
 fd_poll = select.poll()
-self.popenChild = popen2.Popen3(netstat -t -a)
+self.popenChild = popen2.Popen3(netstat -t -a -n)
 fd_poll.register(self.popenChild.fromchild)
 
 poll_events = fd_poll.poll()
-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFT]: build: linux 64bit biarch support

2008-07-07 Thread Jarod Wilson
On Monday 07 July 2008 04:46:05 am Carlo Marcelo Arenas Belon wrote:
 On Wed, Jul 02, 2008 at 09:27:02AM -0400, Jarod Wilson wrote:
  On Wednesday 02 July 2008 07:36:41 am Carlo Marcelo Arenas Belon wrote:
   The following proposed patch for stable 3.1, replaces the configure
   routine that tried to guess the libdir directory by assuming biarch
   rules from fedora linux (breaking all amd64 BSD and x64 Solaris) and
   overriding the libdir parameter passed at configure time (breaking
   fedora linux ppc64).
  
   Contains changes from r1452, r1467, r1468, r1475 and r1487
 
  At a glance, yeah, this looks like it should indeed finally Do The Right
  Thing(tm) on all Linux architectures for both 32-bit and 64-bit.

 It didn't, I found it was broken in an ia64 SuSE Enterprise Server 10
 server.

D'oh. How so?

 A fix has been committed to trunk already, but it is so interlinked with
 another patch to remove the need to hardcode libdir in the configurations
 from Brad that it might as well require it (or both patches together)
 backported as a solution.

Never mind, I can just take a look at trunk and spin up one of my own ia64 
boxes...


-- 
Jarod Wilson
[EMAIL PROTECTED]

-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Python rewrite of Gmetad...

2008-07-07 Thread Brad Nicholes
 On 7/3/2008 at 3:23 PM, in message
[EMAIL PROTECTED], Bernard Li
[EMAIL PROTECTED] wrote:
 Hi Brad:
 
 On Fri, May 16, 2008 at 11:01 AM, Brad Nicholes [EMAIL PROTECTED] wrote:
 
   With this new rewrite in python, we now have the ability to plug in new 
 metric storage modules to support other type of storage mechanisms other than 
 RRD and also the ability to plug in any type of analysis functionality that 
 would make sense at the gmetad level.  Try it out, fix a bug or two and try 
 writing a module.  If you have any idea on how best to package and install 
 the python version of gmetad, please comment or better yet, just do it.
 
 Right now the plugin modules for gmetad-python can only be written in
 Python, can we support other languages in the future?
 

Good question.  I don't know.  I guess it depends on the ability for python to 
interface with other languages.  

Brad


-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond -m shows Python error, sometimes

2008-07-07 Thread Ulf
Hi,

I use # python -V
Python 2.4.2
It is a 64Bit version python.

The -n is a good idea, anyway.
But didn' t fix the problem.
But I' ve not many connections open.
# time netstat -t -a
[...]
real0m0.071s
user0m0.012s
sys 0m0.056s

# time netstat -t -a -n
real0m0.054s
user0m0.000s
sys 0m0.052s


Ulf
-- 
Pt! Schon das coole Video vom GMX MultiMessenger gesehen?
Der Eine für Alle: http://www.gmx.net/de/go/messenger03

-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Modules: per-disk I/O and per-NIC interfacestats

2008-07-07 Thread Brad Nicholes
 On 7/4/2008 at 8:59 AM, in message
[EMAIL PROTECTED] 
, [EMAIL PROTECTED] wrote:

 
 
 
 Hi,
 
 I've just had a look in trunk, and I get the impression that no one else
 is working on these features:
 
 - per-disk I/O stats

Since the multidisk.py python module is already doing the disk discovery, you 
might just want to add the disk I/O metrics to the multidisk module rather than 
recreating a new module. 

 - per-NIC interface stats

I look into doing this module but never really got around to it.  This would be 
a good set of metrics to add.

 
 I'm thinking about contributing these things myself, but would like to
 make sure I'm not duplicating work that somebody else is already doing.
 

Looking forward to your contributions.

Brad



-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond -m shows Python error, sometimes

2008-07-07 Thread Brad Nicholes
 On 7/7/2008 at 9:46 AM, in message [EMAIL PROTECTED], Ulf
[EMAIL PROTECTED] wrote:
 Hi,
 
 I use # python -V
 Python 2.4.2
 It is a 64Bit version python.
 
 The -n is a good idea, anyway.
 But didn' t fix the problem.
 But I' ve not many connections open.
 # time netstat -t -a
 [...]
 real0m0.071s
 user0m0.012s
 sys 0m0.056s
 
 # time netstat -t -a -n
 real0m0.054s
 user0m0.000s
 sys 0m0.052s
 
 
 Ulf


A lot of this had to due with the change that we made to the way that netstat 
was being exec'd from python in order to support older python versions.  We 
might want to consider using subprocess.Popen() in newer versions of python and 
the popen2 for older versions.  Take a look at this diff from the original 
version of tcpconn.py starting from the diff at line #179.  The subprocess way 
of exec'ing netstat is cleaner and doesn't carry the popenChild.wait() problems.

Brad


-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond -m shows Python error, sometimes

2008-07-07 Thread Brad Nicholes
 On 7/7/2008 at 10:45 AM, in message [EMAIL PROTECTED], Brad
Nicholes [EMAIL PROTECTED] wrote:
 On 7/7/2008 at 9:46 AM, in message [EMAIL PROTECTED], Ulf
 [EMAIL PROTECTED] wrote:
 Hi,
 
 I use # python -V
 Python 2.4.2
 It is a 64Bit version python.
 
 The -n is a good idea, anyway.
 But didn' t fix the problem.
 But I' ve not many connections open.
 # time netstat -t -a
 [...]
 real0m0.071s
 user0m0.012s
 sys 0m0.056s
 
 # time netstat -t -a -n
 real0m0.054s
 user0m0.000s
 sys 0m0.052s
 
 
 Ulf
 
 
 A lot of this had to due with the change that we made to the way that 
 netstat was being exec'd from python in order to support older python 
 versions.  We might want to consider using subprocess.Popen() in newer 
 versions of python and the popen2 for older versions.  Take a look at this 
 diff from the original version of tcpconn.py starting from the diff at line 
 #179.  The subprocess way of exec'ing netstat is cleaner and doesn't carry 
 the popenChild.wait() problems.
 

Sorry, forgot to include the link to the diff 

http://ganglia.svn.sourceforge.net/viewvc/ganglia/trunk/monitor-core/gmond/python_modules/network/tcpconn.py?view=diffr1=851r2=1528diff_format=h


-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] mod_multicpu issue?

2008-07-07 Thread Brad Nicholes
 On 7/7/2008 at 10:41 AM, in message
[EMAIL PROTECTED] 
, [EMAIL PROTECTED] wrote:

 
 I notice that the ex_metric_info array is normally NULL terminated:
 
 static Ganglia_25metric ex_metric_info[] =
 {
 {0, NULL}
 };
 
 
 The function  ex_metric_init (apr_pool_t *p) populates a new array,
 metric_info, but doesn't add a NULL entry at the end.
 
 If the memory area contains 0, then this is not a problem.  However, if
 the area after the end of the array doesn't contain 0 (either because it
 is uninitialised or because it has been allocated to something else)
 could this be a problem?
 

Yes it is a problem.  I fixed this in mod_python but forgot about mod_multicpu 
and mod_status.  All of the other modules should be OK since they use static 
metric definitions.  I will get this fixed and proposed for backport to 3.1.

Brad


-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFT] gmond: drop C/C++ references as asupported language for building DSO metrics

2008-07-07 Thread Brad Nicholes
 On 7/4/2008 at 12:58 AM, in message [EMAIL PROTECTED], Carlo
Marcelo Arenas Belon [EMAIL PROTECTED] wrote:
 On Thu, Jul 03, 2008 at 10:42:06AM -0700, Bernard Li wrote:
 -1
 
 I agree with Brad on this.
 
 Not sure what to make of no one replying to the code on this thread, after
 all this is a developer mailing list and I would expect all debate to be
 made in a technical basis.
 

The technical reason for not accepting this proposal at this time is the risk 
of destabilizing the 3.1 branch before releasing 3.1.1 without solving any real 
issue.  The goal right now is to stabilize the 3.1 branch so that we can 
release 3.1.1.  Adding support for a C++ compiler is closer to being an 
enhancement similar to adding support for perl or ruby, than it is a bug fix.  
I have no problem at this point with the backport proposal and patches for 
adding support for a C++ compiler except for the fact that they touch a 
significant amount of code and have a much greater potential for destabilizing 
the 3.1 branch than they do for fixing a real issue.  For the 3.1.1 release, 
this is simply a matter of documentation and noting that the unsupported C++ 
portion of the language label C/C++ is a known issue that will be resolved in 
the next release. 

 So C++ support is incomplete
 
 C++ support doesn't exist in the 3.1 branch at all.
 
 we should document this and keep going.
 
 Instead, we should fix this and keep going, a starting point to add that
 support has been implemented in trunk for a couple of days, and will
 be posting it for review/testing for backport to 3.1
 

I would agree if the patch didn't touch so much code in relation to the issue 
that it is trying to solve.  There is no need to have C++ compiler support in 
Ganglia 3.1.1.  Once we release 3.1.1, we can risk destabilizing the branch 
briefly while we add C++ compiler support and move toward a release of 3.1.2.  
Let's do what is necessary to get 3.1.1 out the door and then worry about 
enhancing Ganglia to support more languages or compilers.  There is a lot of 
needed functionality in Ganglia 3.1 that has been waiting around for a long 
time.  Let's not delay it any more than we have to for functionality that to my 
knowledge, is not needed yet by anybody.

As I mentioned in a previous email thread, this is Open Source.  Release early, 
release often.  There is nothing that says that we can't release a stable 3.1.1 
with current functionality and then a month later release 3.1.2 with new 
functionality that includes support for a C++ compiler.  It really comes down 
to weighing current stable functionality against known issues.  IMO, releasing 
the current stable functionality of 3.1.1 far out-weights the risk of 
destabilizing the 3.1 branch just to add support for a C++ compiler.  

We are going on another month now.  Let's get 3.1.1 out the door and move on.

Brad 


-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFT] gmond: drop C/C++ references as asupported language for building DSO metrics

2008-07-07 Thread Brad Nicholes
 On 7/7/2008 at 2:37 PM, in message [EMAIL PROTECTED], Carlo Marcelo
Arenas Belon [EMAIL PROTECTED] wrote:
 On Mon, Jul 07, 2008 at 12:31:13PM -0600, Brad Nicholes wrote:
  On 7/4/2008 at 12:58 AM, in message [EMAIL PROTECTED], Carlo
 Marcelo Arenas Belon [EMAIL PROTECTED] wrote:
  On Thu, Jul 03, 2008 at 10:42:06AM -0700, Bernard Li wrote:
  -1
  
  I agree with Brad on this.
  
  Not sure what to make of no one replying to the code on this thread, after
  all this is a developer mailing list and I would expect all debate to be
  made in a technical basis.
 
 The technical reason for not accepting this proposal at this time is the 
 risk of destabilizing the 3.1 branch before releasing 3.1.1 without solving 
 any real issue.
 
 This thread code (which has been long removed from the conversation) was
 mainly documentation changes and code removal in 1 line of code which has no
 chance of destabilizing anything.
 
 I agree with you that the alternative (which is all we had left after this 
 one
 has been rejected) has a risk of destabilizing the 3.1 branch, and that is 
 why
 it was delayed for next release originally with an intermediate solution
 posted in the interim.
 

OK, let's go back and rehash.  There were two proposals, 1) Removing the /C++ 
portion of the language label C/C++.  The technical reason for rejecting this 
is basically that it doesn't solve anything yet adds more confusion to the 
configuration of a module in the next release.  The exact same code path would 
be followed by gmond for either a C or a C++ module, therefore combining both 
languages into a single language label makes things less confusing for the 
user.  Especially since it would be very difficult for a user to distinguish 
between a compiled C DSO and a C++ DSO.  Both C and C++ would both produce a 
DSO module that would need to be loaded by apr_dso_load().  Both types of 
module would interface with gmond in exactly the same way.  However if we apply 
this patch for 3.1.1, it would then require a code change for 3.1.2 when the 
second patch is applied.  2) Make all of the changes to the source code both in 
gmond, module headers and modules to support the C++ compi
 ler.  The technical reason for not accepting this patch has already been 
discussed on this thread.

  The goal right now is to stabilize the 3.1 branch so that we can release 
 3.1.1.  Adding support for a C++ compiler is closer to being an enhancement 
 similar to adding support for perl or ruby, than it is a bug fix.
 
 Agree, just wanted to clarify though that the added support is not for a
 compiler but for a language.  In the line of compiler support, Sun Studio 12
 support was added by me to trunk long ago as an alternative to GCC in
 OpenSolaris but as I agree with you on the need to keep 3.1 stable for 
 release
 wasn't even proposed to backport yet.
 
 As it is now, you can use the C gmond modular interface (which was labeled
 C/C++) to write C modules or Objective C modules but C++ is not
 supported yet.
 

Correct.  This can easily be noted in the README as a known issue that will be 
fixed in 3.1.2 when the C++ backport proposal (#2) is accepted in the 3.1 
branch.  In the meantime (which is a very short time), C++ is not supported for 
developing a gmond module.  When the C++ backport proposal is accepted into 
3.1.2 and 3.1.2 is released, the language label C/C++ will be fully supported 
and make complete sense to the user as well as the developer.  

 For the 3.1.1 release, this is simply a matter of documentation and noting 
 that the unsupported C++ portion of the language label C/C++ is a known 
 issue that will be resolved in the next release.
 
 There are 2 parts of it which seem to be confusing most :
 
 1) the documentation and use of C/C++ as a label for DSO modules is
 incorrect because technically speaking the interface exported by gm_metric.h
 is a C interface which can be used to build DSO modules in almost anything
 that can compile to it, including using it directly by C, Objective C or
 C++ (once the support is committed) source modules and because those other
 languages are mostly C compatible at the source level.
 

Ok, but the user doesn't care. Basically to the user, the language that was 
used to produce the DSO doesn't matter.  To gmond, the language that was used 
to produce the DSO doesn't matter.  So all we really need to do is give it a 
label.  The easiest label to give it that will be understood by both users and 
developers is C/C++.  This label also makes sense when you are talking in 
terms of alternate languages when referring to Python, perl, ruby, etc. modules.

 2) the documentation and use of C/C++ is misleading because C++ can't be
 used with it yet (until support is committed).
 
 the solution you propose only mitigates 2; the last patch I proposed
 mitigates both.
 

Absolutely, however the last patch is too destabilizing for 3.1.1. So rather 
than applying a label change to 3.1.1 which would have to be