Hi,
Am 04.10.2011 um 00:47 schrieb Peskin, Eric:
Is the hard run time limit (h_rt) getting reached some times but
not others?
No, we do not have any limits set:
[root@fen1 ~]# qconf -sq `qconf -sql` | grep [hs]_|sort -u
h_core INFINITY
h_cpu INFINITY
h_data INFINITY
h_fsize INFINITY
h_rss INFINITY
h_rt INFINITY
h_stack INFINITY
h_vmem INFINITY
s_core INFINITY
s_cpu INFINITY
s_data INFINITY
s_fsize INFINITY
s_rss INFINITY
s_rt INFINITY
s_stack INFINITY
s_vmem INFINITY
[root@fen1 ~]#
is there anything in /var/log/messages about the oom-killer? Or the
SGE messages files on the exechost's spool directory?
-- Reuti
On Oct 3, 2011, at 1:41 PM, Mike Hanby wrote:
Is the hard run time limit (h_rt) getting reached some times but
not others?
-----Original Message-----
From: [email protected] [mailto:users-
[email protected]] On Behalf Of Peskin, Eric
Sent: Monday, October 03, 2011 11:14 AM
To: [email protected]
Subject: [gridengine users] jobs getting killed (failed assumedly
after
job because: job 311263.1 died through signal KILL (9))
All,
I have a user running qmake jobs. Intermittently, the job fails and
SGE says it was killed with signal 9. The user did not kill it. We
(the sysadmins) did not kill it. How can I figure out what is going
on? The worst part is that this problem is intermittent. Exactly
the
same command works sometimes but fails sometimes. I have appended
the
message from SGE below. Any suggestions would be greatly
appreciated.
Thanks,
Eric Peskin
From: root [root@local]
Sent: Saturday, September 24, 2011 9:04 PM
To: Tang, Zuojian
Subject: Job 311263 (qmake) Aborted
Job 311263 (qmake) Aborted
Exit Status = 137
Signal = KILL
User = tangz01
Queue = [email protected]
Host = compute-0-13.local
Start Time = 09/24/2011 19:03:31
End Time = 09/24/2011 21:04:10
CPU = 00:00:29
Max vmem = 2.579G
failed assumedly after job because:
job 311263.1 died through signal KILL (9)
------------------------------------------------------------
This email message, including any attachments, is for the sole use
of
the intended recipient(s) and may contain information that is
proprietary, confidential, and exempt from disclosure under
applicable
law. Any unauthorized review, use, disclosure, or distribution is
prohibited. If you have received this email in error please notify
the
sender by return email and delete the original message. Please note,
the recipient should check this email and any attachments for the
presence of viruses. The organization accepts no liability for any
damage caused by any virus transmitted by this email.
=================================
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
------------------------------------------------------------
This email message, including any attachments, is for the sole use
of the intended recipient(s) and may contain information that is
proprietary, confidential, and exempt from disclosure under
applicable law. Any unauthorized review, use, disclosure, or
distribution is prohibited. If you have received this email in error
please notify the sender by return email and delete the original
message. Please note, the recipient should check this email and any
attachments for the presence of viruses. The organization accepts no
liability for any damage caused by any virus transmitted by this
email.
=================================
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users