Is the hard run time limit (h_rt) getting reached some times but not others?

> -----Original Message-----
> From: [email protected] [mailto:users-
> [email protected]] On Behalf Of Peskin, Eric
> Sent: Monday, October 03, 2011 11:14 AM
> To: [email protected]
> Subject: [gridengine users] jobs getting killed (failed assumedly after
> job because: job 311263.1 died through signal KILL (9))
> 
> All,
> 
> I have a user running qmake jobs.  Intermittently, the job fails and
> SGE says it was killed with signal 9.  The user did not kill it.  We
> (the sysadmins) did not kill it.  How can I figure out what is going
> on?  The worst part is that this problem is intermittent.  Exactly the
> same command works sometimes but fails sometimes.  I have appended the
> message from SGE below.  Any suggestions would be greatly appreciated.
> 
> Thanks,
>       Eric Peskin
> 
> From: root [root@local]
> Sent: Saturday, September 24, 2011 9:04 PM
> To: Tang, Zuojian
> Subject: Job 311263 (qmake) Aborted
> 
> Job 311263 (qmake) Aborted
> Exit Status      = 137
> Signal           = KILL
> User             = tangz01
> Queue            = [email protected]
> Host             = compute-0-13.local
> Start Time       = 09/24/2011 19:03:31
> End Time         = 09/24/2011 21:04:10
> CPU              = 00:00:29
> Max vmem         = 2.579G
> failed assumedly after job because:
> job 311263.1 died through signal KILL (9)
> 
> 
> ------------------------------------------------------------
> This email message, including any attachments, is for the sole use of
> the intended recipient(s) and may contain information that is
> proprietary, confidential, and exempt from disclosure under applicable
> law. Any unauthorized review, use, disclosure, or distribution is
> prohibited. If you have received this email in error please notify the
> sender by return email and delete the original message. Please note,
> the recipient should check this email and any attachments for the
> presence of viruses. The organization accepts no liability for any
> damage caused by any virus transmitted by this email.
> =================================
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to