Re: [gridengine users] Jobs died through signal XCPU while not exceeding limit

Reuti Fri, 19 Oct 2012 10:18:27 -0700

Am 19.10.2012 um 19:01 schrieb Jérémie Dubois-Lacoste:

> One user on our cluster is having this problem, that I've never
> seen before. According to him there is some randomness, the
> same job may succeed or fail from time to time.
> When the job abbort he gets this e-mail:
> 
> Start Time       = 10/19/2012 15:25:17
> End Time         = 10/19/2012 17:07:20
> CPU              = 01:40:35
> Max vmem         = 433.707M


It's also send if s_vmem is passed.

-- Reuti


> failed assumedly after job because:
> job 5433573.1 died through signal XCPU (24)
> 
> So the job was running for 1h40, then get killed.
> 
> But the queue that he submitted to has a CPU time limit
> of one week. Among the output of "qconf -sq <queue>":
> s_cpu                 168:00:00
> h_cpu                 169:00:00
> 
> Any idea?
> 
> Jérémie
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Jobs died through signal XCPU while not exceeding limit

Reply via email to