"Loong, Andreas" <[email protected]> writes:

> Hello,
>
> I've recently installed SoGE 8.1.2 on a multihomed qmaster. It's a clean
> install, using the packages from SoGE project page. This is meant to
> replace the existing SGE 6.2u5 version we have running today.

[It shouldn't have needed a new install -- you can do a live upgrade.]

> However, it segfaults every few minutes:
> Oct 30 09:51:06 srvname kernel: sge_qmaster[13013]: segfault at
> 0000000000000068 rip 000000000049e49c rsp 00000000470caf00 error 6
> Oct 30 09:53:10 srvname kernel: sge_qmaster[14421]: segfault at
> 0000000000000068 rip 000000000049e49c rsp 000000004842ef00 error 6
> Oct 30 09:56:06 srvname kernel: sge_qmaster[21056]: segfault at
> 0000000000000068 rip 000000000049e49c rsp 0000000047339f00 error 6
> Oct 30 09:58:10 srvname kernel: sge_qmaster[21307]: segfault at
> 0000000000000068 rip 000000000049e49c rsp 0000000046f57f00 error 6
>
> What could I do to get some more information? The qmaster itself hasn't
> had any chance of writing something to the logfiles as of yet. Any help
> would be appreciated.

What OS/architecture is this?  If it hasn't produced any messages, I'd
suspect some sort of resource problem, probably allocating on the stack
early on (see "ulimit -a").  It's possibly useful to run the daemon
interactively after "dl 3" -- see sge_dl(8) -- and check the messages
produced, but I don't know if it will show anything in this case.  If
you can generate a gdb backtrace it should help, especially if qmaster
has been built with debugging support, e.g.
  SGE_ND=1 gdb -batch -ex run -ex 'bt full' sge_qmaster

If you can get useful output, you could put it on the issue tracker.
Mailing to the sge-bugs address is fine.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to