[gridengine users] Jobs died through signal XCPU while not exceeding limit

Jérémie Dubois-Lacoste Fri, 19 Oct 2012 10:03:40 -0700

Dear all,

One user on our cluster is having this problem, that I've never
seen before. According to him there is some randomness, the
same job may succeed or fail from time to time.
When the job abbort he gets this e-mail:


 Start Time       = 10/19/2012 15:25:17
 End Time         = 10/19/2012 17:07:20
 CPU              = 01:40:35
 Max vmem         = 433.707M
failed assumedly after job because:
job 5433573.1 died through signal XCPU (24)

So the job was running for 1h40, then get killed.

But the queue that he submitted to has a CPU time limit
of one week. Among the output of "qconf -sq <queue>":
s_cpu                 168:00:00
h_cpu                 169:00:00

Any idea?

Jérémie

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] Jobs died through signal XCPU while not exceeding limit

Reply via email to