Am 19.10.2012 um 19:01 schrieb Jérémie Dubois-Lacoste: > One user on our cluster is having this problem, that I've never > seen before. According to him there is some randomness, the > same job may succeed or fail from time to time. > When the job abbort he gets this e-mail: > > Start Time = 10/19/2012 15:25:17 > End Time = 10/19/2012 17:07:20 > CPU = 01:40:35 > Max vmem = 433.707M
It's also send if s_vmem is passed. -- Reuti > failed assumedly after job because: > job 5433573.1 died through signal XCPU (24) > > So the job was running for 1h40, then get killed. > > But the queue that he submitted to has a CPU time limit > of one week. Among the output of "qconf -sq <queue>": > s_cpu 168:00:00 > h_cpu 169:00:00 > > Any idea? > > Jérémie > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
