On Tue, Sep 25, 2012 at 10:50:28AM -0400, Brodie, Kent wrote:
OK, I'll back off a bit and generalize...   where else would I look to see 
hints of processes dying?   Would I see anything anywhere else in the grid 
engine environment other than the exec host 'messages' file in the sge spool, 
or the qmaster 'messages' file ?    (is there some other sge logging I'm 
missing, I guess is what I'm asking).

I don't know of anything of the top of my head that tracks this.
Perhaps part of the execd/shepard logging?  I'm not sure.

If a job process dies, the shepard will get a SIGCHLD, but not much
else...


So far, the above file(s) aren't telling me much at all, other than lots of 
these::

09/22/2012 16:56:09|  main|rome|W|reaping job "22139" ptf complains: Job does 
not exist

I've seen errors like this when there was a discrepency between the
(classic) spool files and what jobs are actually running.  Specifcally,
the files indicate there should be a job, but nothing is actually running.




_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to