Apparently the sys admin did not include (at least) that package on the server build. It is included on our client builds though. I've never used it before, but I ran it as Rayson instructed. In the terminal where it was running, after it crashed, all it said was:
(gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x47df6940 (LWP 2193)] 0x000000000055c698 in lCopySwitchPack () (gdb) which I guess we already knew. I'll look at monit and getting a coredump. We had added 'ulimit -c unlimited' to the master init.d script in hopes of getting a dump that way, but no such luck. It's probably all a moot point anyway since it appears to be a confirmed bug with 6.2u5. >From my perspective (grid admin) I hope my management's meeting with Univa on Friday will be fruitful :-) Thanks to all for the help. --murph -----Original Message----- From: Dave Love [mailto:[email protected]] Sent: Monday, May 02, 2011 6:49 PM To: Murphy, Brian (E IT F 45) Cc: [email protected] Subject: Re: [gridengine users] Qmaster Failing "Murphy, Brian (E IT F 45)" <[email protected]> writes: > Thanks Rayson. > Downloaded, compiled and installed GDB 7.2. [What's wrong with he system gdb?] Running under gdb isn't so useful in this situation, where you're trying to keep qmaster up going. (I used monit, rather than cron, in the same situation.) You can get a post mortem core dump using http://arc.liv.ac.uk/repos/darcs/sge/source/libs/libcore/, per https://arc.liv.ac.uk/trac/SGE/ticket/507. However, it's probably not very useful anyway unless what you're running was compiled with debugging information, and it would be easier just to try a patched version. I have RH 5 rpms that have been in production, but you probably won't want to run binaries from random sources, and presumably you'll get a fix from Univa. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
