[Ganglia-developers] gmond dead but subsys locked

2011-02-18 Thread Calin Floare
Hello,

 I'm new on the list but I have a difficult problem I suppose.
 I maintain a cluster which used to work nicely.

 After a reboot, necessary due to the instalation of an UPS, I cannot start 
gmond on the server even if on the nodes is working well.
 I have Scientific Linux SL release 5.3 (Boron) installed on all machines.

 I have the following kernel on all the nodes:
 [root@cn-smpi sbin]# uname -a
 Linux cn-smpi.itim-cj.ro 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 16:05:53 EDT 2010 
x86_64 x86_64 x86_64 GNU/Linux

 Below you can see some details:

 root@cn-smpi sbin]# /sbin/service gmond start
 Starting GANGLIA gmond: [ OK ]
 [root@cn-smpi sbin]# /sbin/service gmond status
 gmond dead but subsys locked

 On all nodes is working:
 [root@cn-smpi sbin]# cexec /sbin/service gmond status
 * cn-sge-1 *
 - cn-mpi01-
 gmond (pid 13623) is running...
 - cn-mpi02-
 gmond (pid 11971) is running...

 [root@cn-smpi log]# grep segfault messages
 Feb 18 12:42:37 cn-smpi kernel: gmond[14493]: segfault at 0008 rip 
003c2d0b54dd rsp 7de116d8 error 4
 Feb 18 12:51:18 cn-smpi kernel: gmond[14952]: segfault at 0008 rip 
003c2d0b54dd rsp 7fff393f13e8 error 4

 As I can see the error is at rip 003c2d0b54dd which is stable.

 I'll do a debug to see where is the problem:

 [root@cn-smpi cfloare]# ulimit -a
 core file size (blocks, -c) 0
 data seg size (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size (blocks, -f) unlimited
 pending signals (-i) 134143
 max locked memory (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files (-n) 1024
 pipe size (512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority (-r) 0
 stack size (kbytes, -s) 10240
 cpu time (seconds, -t) unlimited
 max user processes (-u) 134143
 virtual memory (kbytes, -v) unlimited
 file locks (-x) unlimited
 [root@cn-smpi cfloare]# ulimit -c unlimited
 [root@cn-smpi cfloare]# ulimit -a
 core file size (blocks, -c) unlimited
 data seg size (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size (blocks, -f) unlimited
 pending signals (-i) 134143
 max locked memory (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files (-n) 1024
 pipe size (512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority (-r) 0
 stack size (kbytes, -s) 10240
 cpu time (seconds, -t) unlimited
 max user processes (-u) 134143
 virtual memory (kbytes, -v) unlimited
 file locks (-x) unlimited
 [root@cn-smpi cfloare]#


 [root@cn-smpi sbin]# ./gmond

 [root@cn-smpi sbin]# ls co
 convertquota core.14952 cossdump

 [root@cn-smpi sbin]# ll core*
 -rw--- 1 root root 2457600 Feb 18 12:51 core.14952

 [root@cn-smpi sbin]# date
 Fri Feb 18 12:52:02 EET 2011

 [root@cn-smpi sbin]# gdb --core=./core.14952 ./gmond
 GNU gdb Fedora (6.8-27.el5)
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law. Type show copying
 and show warranty for details.
 This GDB was configured as x86_64-redhat-linux-gnu...
 Reading symbols from /lib64/libresolv.so.2...done.
 Loaded symbols for /lib64/libresolv.so.2
 Reading symbols from /usr/lib64/libganglia-3.1.7.so.0...done.
 Loaded symbols for /usr/lib64/libganglia-3.1.7.so.0
 Reading symbols from /lib64/libdl.so.2...done.
 Loaded symbols for /lib64/libdl.so.2
 Reading symbols from /lib64/libnsl.so.1...done.
 Loaded symbols for /lib64/libnsl.so.1
 Reading symbols from /lib64/libpcre.so.0...done.
 Loaded symbols for /lib64/libpcre.so.0
 Reading symbols from /lib64/libexpat.so.0...done.
 Loaded symbols for /lib64/libexpat.so.0
 Reading symbols from /usr/lib64/libconfuse.so.0...done.
 Loaded symbols for /usr/lib64/libconfuse.so.0
 Reading symbols from /usr/lib64/libapr-1.so.0...done.
 Loaded symbols for /usr/lib64/libapr-1.so.0
 Reading symbols from /lib64/libpthread.so.0...done.
 Loaded symbols for /lib64/libpthread.so.0
 Reading symbols from /lib64/libc.so.6...done.
 Loaded symbols for /lib64/libc.so.6
 Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
 Loaded symbols for /lib64/ld-linux-x86-64.so.2
 Reading symbols from /lib64/libuuid.so.1...done.
 Loaded symbols for /lib64/libuuid.so.1
 Reading symbols from /lib64/libcrypt.so.1...done.
 Loaded symbols for /lib64/libcrypt.so.1
 Reading symbols from /usr/lib64/ganglia/modcpu.so...done.
 Loaded symbols for /usr/lib64/ganglia/modcpu.so
 Reading symbols from /usr/lib64/ganglia/moddisk.so...done.
 Loaded symbols for /usr/lib64/ganglia/moddisk.so
 Reading symbols from /usr/lib64/ganglia/modload.so...done.
 Loaded symbols for /usr/lib64/ganglia/modload.so
 Reading symbols from /usr/lib64/ganglia/modmem.so...done.
 Loaded symbols for /usr/lib64/ganglia/modmem.so
 Reading symbols from /usr/lib64/ganglia/modnet.so...done.
 

[Ganglia-developers] Google Summer of Code 2011

2011-02-18 Thread Bernard Li
Hi all:

It is finally upon us!

http://www.google-melange.com/

We currently have a pretty good wish-list going on, so I think this
would be a good start for project ideas:

https://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list

Of course, the web-2.0 effort could always use additional help.

So, who would like to be a mentor?

Cheers,

Bernard

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers