"Loong, Andreas" <[email protected]> writes: > Hello, > > I've recently installed SoGE 8.1.2 on a multihomed qmaster. It's a clean > install, using the packages from SoGE project page. This is meant to > replace the existing SGE 6.2u5 version we have running today.
[It shouldn't have needed a new install -- you can do a live upgrade.] > However, it segfaults every few minutes: > Oct 30 09:51:06 srvname kernel: sge_qmaster[13013]: segfault at > 0000000000000068 rip 000000000049e49c rsp 00000000470caf00 error 6 > Oct 30 09:53:10 srvname kernel: sge_qmaster[14421]: segfault at > 0000000000000068 rip 000000000049e49c rsp 000000004842ef00 error 6 > Oct 30 09:56:06 srvname kernel: sge_qmaster[21056]: segfault at > 0000000000000068 rip 000000000049e49c rsp 0000000047339f00 error 6 > Oct 30 09:58:10 srvname kernel: sge_qmaster[21307]: segfault at > 0000000000000068 rip 000000000049e49c rsp 0000000046f57f00 error 6 > > What could I do to get some more information? The qmaster itself hasn't > had any chance of writing something to the logfiles as of yet. Any help > would be appreciated. What OS/architecture is this? If it hasn't produced any messages, I'd suspect some sort of resource problem, probably allocating on the stack early on (see "ulimit -a"). It's possibly useful to run the daemon interactively after "dl 3" -- see sge_dl(8) -- and check the messages produced, but I don't know if it will show anything in this case. If you can generate a gdb backtrace it should help, especially if qmaster has been built with debugging support, e.g. SGE_ND=1 gdb -batch -ex run -ex 'bt full' sge_qmaster If you can get useful output, you could put it on the issue tracker. Mailing to the sge-bugs address is fine. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
