Dear all,

One user on our cluster is having this problem, that I've never
seen before. According to him there is some randomness, the
same job may succeed or fail from time to time.
When the job abbort he gets this e-mail:

 Start Time       = 10/19/2012 15:25:17
 End Time         = 10/19/2012 17:07:20
 CPU              = 01:40:35
 Max vmem         = 433.707M
failed assumedly after job because:
job 5433573.1 died through signal XCPU (24)

So the job was running for 1h40, then get killed.

But the queue that he submitted to has a CPU time limit
of one week. Among the output of "qconf -sq <queue>":
s_cpu                 168:00:00
h_cpu                 169:00:00

Any idea?

Jérémie

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to