All,

I have a user running qmake jobs.  Intermittently, the job fails and SGE says 
it was killed with signal 9.  The user did not kill it.  We (the sysadmins) did 
not kill it.  How can I figure out what is going on?  The worst part is that 
this problem is intermittent.  Exactly the same command works sometimes but 
fails sometimes.  I have appended the message from SGE below.  Any suggestions 
would be greatly appreciated.

Thanks,
        Eric Peskin

From: root [root@local]
Sent: Saturday, September 24, 2011 9:04 PM
To: Tang, Zuojian
Subject: Job 311263 (qmake) Aborted

Job 311263 (qmake) Aborted
Exit Status      = 137
Signal           = KILL
User             = tangz01
Queue            = [email protected]
Host             = compute-0-13.local
Start Time       = 09/24/2011 19:03:31
End Time         = 09/24/2011 21:04:10
CPU              = 00:00:29
Max vmem         = 2.579G
failed assumedly after job because:
job 311263.1 died through signal KILL (9)


------------------------------------------------------------
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=================================


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to