On Tue, Sep 25, 2012 at 10:50:28AM -0400, Brodie, Kent wrote:
OK, I'll back off a bit and generalize... where else would I look to see hints of processes dying? Would I see anything anywhere else in the grid engine environment other than the exec host 'messages' file in the sge spool, or the qmaster 'messages' file ? (is there some other sge logging I'm missing, I guess is what I'm asking).
I don't know of anything of the top of my head that tracks this. Perhaps part of the execd/shepard logging? I'm not sure. If a job process dies, the shepard will get a SIGCHLD, but not much else...
So far, the above file(s) aren't telling me much at all, other than lots of these:: 09/22/2012 16:56:09| main|rome|W|reaping job "22139" ptf complains: Job does not exist
I've seen errors like this when there was a discrepency between the (classic) spool files and what jobs are actually running. Specifcally, the files indicate there should be a job, but nothing is actually running.
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
-- Jesse Becker NHGRI Linux support (Digicon Contractor) _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
