Re: [Ganglia-developers] gmond -m shows Python error, sometimes
On Mon, Jul 07, 2008 at 10:18:09AM +0200, Ulf wrote: Hi, on my Linux testbox SLES10 SP2 64Bit (python-2.4.2-18.13), I get the following error. gmond 3.1.0.1527 # gmond -m [...] dev-rootvg-usr-disk_usedUsed disk space (module python_module) swap_free Amount of available swap memory (module mem_module) Exception in thread Thread-1: Traceback (most recent call last): File /usr/lib64/python2.4/threading.py, line 442, in __bootstrap self.run() File /usr/lib64/ganglia/python_modules/tcpconn.py, line 260, in run self.popenChild.wait() File /usr/lib64/python2.4/popen2.py, line 94, in wait pid, sts = os.waitpid(self.pid, 0) OSError: [Errno 10] No child processes so the netstat command that was started by that module got killed somehow. what version of python 2.4 do you have installed?, and could it be that you have a lot of connections open in that server? The problem seems to be an timing problem, as the error occurs only every second or third call of gmond -m. When the error is not shown, gmond -m waits some time after the last line swap_free. that is a surprise (at least for me), gmond -m shouldn't need to wait as it is just collecting available metrics, but in this case is probably that it is blocked waiting for the thread that started that netstat call to finish, and so both issues are related. as a quick workaround to speed up that call (and therefore reduce the latency and probability of timeouts), try the attached patch. Carlo --- Index: gmond/python_modules/network/tcpconn.py === --- gmond/python_modules/network/tcpconn.py (revision 1527) +++ gmond/python_modules/network/tcpconn.py (working copy) @@ -246,7 +246,7 @@ #Call the netstat utility and split the output into separate lines fd_poll = select.poll() -self.popenChild = popen2.Popen3(netstat -t -a) +self.popenChild = popen2.Popen3(netstat -t -a -n) fd_poll.register(self.popenChild.fromchild) poll_events = fd_poll.poll() - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [RFT]: build: linux 64bit biarch support
On Monday 07 July 2008 04:46:05 am Carlo Marcelo Arenas Belon wrote: On Wed, Jul 02, 2008 at 09:27:02AM -0400, Jarod Wilson wrote: On Wednesday 02 July 2008 07:36:41 am Carlo Marcelo Arenas Belon wrote: The following proposed patch for stable 3.1, replaces the configure routine that tried to guess the libdir directory by assuming biarch rules from fedora linux (breaking all amd64 BSD and x64 Solaris) and overriding the libdir parameter passed at configure time (breaking fedora linux ppc64). Contains changes from r1452, r1467, r1468, r1475 and r1487 At a glance, yeah, this looks like it should indeed finally Do The Right Thing(tm) on all Linux architectures for both 32-bit and 64-bit. It didn't, I found it was broken in an ia64 SuSE Enterprise Server 10 server. D'oh. How so? A fix has been committed to trunk already, but it is so interlinked with another patch to remove the need to hardcode libdir in the configurations from Brad that it might as well require it (or both patches together) backported as a solution. Never mind, I can just take a look at trunk and spin up one of my own ia64 boxes... -- Jarod Wilson [EMAIL PROTECTED] - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Python rewrite of Gmetad...
On 7/3/2008 at 3:23 PM, in message [EMAIL PROTECTED], Bernard Li [EMAIL PROTECTED] wrote: Hi Brad: On Fri, May 16, 2008 at 11:01 AM, Brad Nicholes [EMAIL PROTECTED] wrote: With this new rewrite in python, we now have the ability to plug in new metric storage modules to support other type of storage mechanisms other than RRD and also the ability to plug in any type of analysis functionality that would make sense at the gmetad level. Try it out, fix a bug or two and try writing a module. If you have any idea on how best to package and install the python version of gmetad, please comment or better yet, just do it. Right now the plugin modules for gmetad-python can only be written in Python, can we support other languages in the future? Good question. I don't know. I guess it depends on the ability for python to interface with other languages. Brad - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond -m shows Python error, sometimes
Hi, I use # python -V Python 2.4.2 It is a 64Bit version python. The -n is a good idea, anyway. But didn' t fix the problem. But I' ve not many connections open. # time netstat -t -a [...] real0m0.071s user0m0.012s sys 0m0.056s # time netstat -t -a -n real0m0.054s user0m0.000s sys 0m0.052s Ulf -- Pt! Schon das coole Video vom GMX MultiMessenger gesehen? Der Eine für Alle: http://www.gmx.net/de/go/messenger03 - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Modules: per-disk I/O and per-NIC interfacestats
On 7/4/2008 at 8:59 AM, in message [EMAIL PROTECTED] , [EMAIL PROTECTED] wrote: Hi, I've just had a look in trunk, and I get the impression that no one else is working on these features: - per-disk I/O stats Since the multidisk.py python module is already doing the disk discovery, you might just want to add the disk I/O metrics to the multidisk module rather than recreating a new module. - per-NIC interface stats I look into doing this module but never really got around to it. This would be a good set of metrics to add. I'm thinking about contributing these things myself, but would like to make sure I'm not duplicating work that somebody else is already doing. Looking forward to your contributions. Brad - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond -m shows Python error, sometimes
On 7/7/2008 at 9:46 AM, in message [EMAIL PROTECTED], Ulf [EMAIL PROTECTED] wrote: Hi, I use # python -V Python 2.4.2 It is a 64Bit version python. The -n is a good idea, anyway. But didn' t fix the problem. But I' ve not many connections open. # time netstat -t -a [...] real0m0.071s user0m0.012s sys 0m0.056s # time netstat -t -a -n real0m0.054s user0m0.000s sys 0m0.052s Ulf A lot of this had to due with the change that we made to the way that netstat was being exec'd from python in order to support older python versions. We might want to consider using subprocess.Popen() in newer versions of python and the popen2 for older versions. Take a look at this diff from the original version of tcpconn.py starting from the diff at line #179. The subprocess way of exec'ing netstat is cleaner and doesn't carry the popenChild.wait() problems. Brad - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond -m shows Python error, sometimes
On 7/7/2008 at 10:45 AM, in message [EMAIL PROTECTED], Brad Nicholes [EMAIL PROTECTED] wrote: On 7/7/2008 at 9:46 AM, in message [EMAIL PROTECTED], Ulf [EMAIL PROTECTED] wrote: Hi, I use # python -V Python 2.4.2 It is a 64Bit version python. The -n is a good idea, anyway. But didn' t fix the problem. But I' ve not many connections open. # time netstat -t -a [...] real0m0.071s user0m0.012s sys 0m0.056s # time netstat -t -a -n real0m0.054s user0m0.000s sys 0m0.052s Ulf A lot of this had to due with the change that we made to the way that netstat was being exec'd from python in order to support older python versions. We might want to consider using subprocess.Popen() in newer versions of python and the popen2 for older versions. Take a look at this diff from the original version of tcpconn.py starting from the diff at line #179. The subprocess way of exec'ing netstat is cleaner and doesn't carry the popenChild.wait() problems. Sorry, forgot to include the link to the diff http://ganglia.svn.sourceforge.net/viewvc/ganglia/trunk/monitor-core/gmond/python_modules/network/tcpconn.py?view=diffr1=851r2=1528diff_format=h - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] mod_multicpu issue?
On 7/7/2008 at 10:41 AM, in message [EMAIL PROTECTED] , [EMAIL PROTECTED] wrote: I notice that the ex_metric_info array is normally NULL terminated: static Ganglia_25metric ex_metric_info[] = { {0, NULL} }; The function ex_metric_init (apr_pool_t *p) populates a new array, metric_info, but doesn't add a NULL entry at the end. If the memory area contains 0, then this is not a problem. However, if the area after the end of the array doesn't contain 0 (either because it is uninitialised or because it has been allocated to something else) could this be a problem? Yes it is a problem. I fixed this in mod_python but forgot about mod_multicpu and mod_status. All of the other modules should be OK since they use static metric definitions. I will get this fixed and proposed for backport to 3.1. Brad - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [RFT] gmond: drop C/C++ references as asupported language for building DSO metrics
On 7/4/2008 at 12:58 AM, in message [EMAIL PROTECTED], Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Thu, Jul 03, 2008 at 10:42:06AM -0700, Bernard Li wrote: -1 I agree with Brad on this. Not sure what to make of no one replying to the code on this thread, after all this is a developer mailing list and I would expect all debate to be made in a technical basis. The technical reason for not accepting this proposal at this time is the risk of destabilizing the 3.1 branch before releasing 3.1.1 without solving any real issue. The goal right now is to stabilize the 3.1 branch so that we can release 3.1.1. Adding support for a C++ compiler is closer to being an enhancement similar to adding support for perl or ruby, than it is a bug fix. I have no problem at this point with the backport proposal and patches for adding support for a C++ compiler except for the fact that they touch a significant amount of code and have a much greater potential for destabilizing the 3.1 branch than they do for fixing a real issue. For the 3.1.1 release, this is simply a matter of documentation and noting that the unsupported C++ portion of the language label C/C++ is a known issue that will be resolved in the next release. So C++ support is incomplete C++ support doesn't exist in the 3.1 branch at all. we should document this and keep going. Instead, we should fix this and keep going, a starting point to add that support has been implemented in trunk for a couple of days, and will be posting it for review/testing for backport to 3.1 I would agree if the patch didn't touch so much code in relation to the issue that it is trying to solve. There is no need to have C++ compiler support in Ganglia 3.1.1. Once we release 3.1.1, we can risk destabilizing the branch briefly while we add C++ compiler support and move toward a release of 3.1.2. Let's do what is necessary to get 3.1.1 out the door and then worry about enhancing Ganglia to support more languages or compilers. There is a lot of needed functionality in Ganglia 3.1 that has been waiting around for a long time. Let's not delay it any more than we have to for functionality that to my knowledge, is not needed yet by anybody. As I mentioned in a previous email thread, this is Open Source. Release early, release often. There is nothing that says that we can't release a stable 3.1.1 with current functionality and then a month later release 3.1.2 with new functionality that includes support for a C++ compiler. It really comes down to weighing current stable functionality against known issues. IMO, releasing the current stable functionality of 3.1.1 far out-weights the risk of destabilizing the 3.1 branch just to add support for a C++ compiler. We are going on another month now. Let's get 3.1.1 out the door and move on. Brad - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [RFT] gmond: drop C/C++ references as asupported language for building DSO metrics
On 7/7/2008 at 2:37 PM, in message [EMAIL PROTECTED], Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Mon, Jul 07, 2008 at 12:31:13PM -0600, Brad Nicholes wrote: On 7/4/2008 at 12:58 AM, in message [EMAIL PROTECTED], Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Thu, Jul 03, 2008 at 10:42:06AM -0700, Bernard Li wrote: -1 I agree with Brad on this. Not sure what to make of no one replying to the code on this thread, after all this is a developer mailing list and I would expect all debate to be made in a technical basis. The technical reason for not accepting this proposal at this time is the risk of destabilizing the 3.1 branch before releasing 3.1.1 without solving any real issue. This thread code (which has been long removed from the conversation) was mainly documentation changes and code removal in 1 line of code which has no chance of destabilizing anything. I agree with you that the alternative (which is all we had left after this one has been rejected) has a risk of destabilizing the 3.1 branch, and that is why it was delayed for next release originally with an intermediate solution posted in the interim. OK, let's go back and rehash. There were two proposals, 1) Removing the /C++ portion of the language label C/C++. The technical reason for rejecting this is basically that it doesn't solve anything yet adds more confusion to the configuration of a module in the next release. The exact same code path would be followed by gmond for either a C or a C++ module, therefore combining both languages into a single language label makes things less confusing for the user. Especially since it would be very difficult for a user to distinguish between a compiled C DSO and a C++ DSO. Both C and C++ would both produce a DSO module that would need to be loaded by apr_dso_load(). Both types of module would interface with gmond in exactly the same way. However if we apply this patch for 3.1.1, it would then require a code change for 3.1.2 when the second patch is applied. 2) Make all of the changes to the source code both in gmond, module headers and modules to support the C++ compi ler. The technical reason for not accepting this patch has already been discussed on this thread. The goal right now is to stabilize the 3.1 branch so that we can release 3.1.1. Adding support for a C++ compiler is closer to being an enhancement similar to adding support for perl or ruby, than it is a bug fix. Agree, just wanted to clarify though that the added support is not for a compiler but for a language. In the line of compiler support, Sun Studio 12 support was added by me to trunk long ago as an alternative to GCC in OpenSolaris but as I agree with you on the need to keep 3.1 stable for release wasn't even proposed to backport yet. As it is now, you can use the C gmond modular interface (which was labeled C/C++) to write C modules or Objective C modules but C++ is not supported yet. Correct. This can easily be noted in the README as a known issue that will be fixed in 3.1.2 when the C++ backport proposal (#2) is accepted in the 3.1 branch. In the meantime (which is a very short time), C++ is not supported for developing a gmond module. When the C++ backport proposal is accepted into 3.1.2 and 3.1.2 is released, the language label C/C++ will be fully supported and make complete sense to the user as well as the developer. For the 3.1.1 release, this is simply a matter of documentation and noting that the unsupported C++ portion of the language label C/C++ is a known issue that will be resolved in the next release. There are 2 parts of it which seem to be confusing most : 1) the documentation and use of C/C++ as a label for DSO modules is incorrect because technically speaking the interface exported by gm_metric.h is a C interface which can be used to build DSO modules in almost anything that can compile to it, including using it directly by C, Objective C or C++ (once the support is committed) source modules and because those other languages are mostly C compatible at the source level. Ok, but the user doesn't care. Basically to the user, the language that was used to produce the DSO doesn't matter. To gmond, the language that was used to produce the DSO doesn't matter. So all we really need to do is give it a label. The easiest label to give it that will be understood by both users and developers is C/C++. This label also makes sense when you are talking in terms of alternate languages when referring to Python, perl, ruby, etc. modules. 2) the documentation and use of C/C++ is misleading because C++ can't be used with it yet (until support is committed). the solution you propose only mitigates 2; the last patch I proposed mitigates both. Absolutely, however the last patch is too destabilizing for 3.1.1. So rather than applying a label change to 3.1.1 which would have to be