Dear all, One user on our cluster is having this problem, that I've never seen before. According to him there is some randomness, the same job may succeed or fail from time to time. When the job abbort he gets this e-mail:
Start Time = 10/19/2012 15:25:17 End Time = 10/19/2012 17:07:20 CPU = 01:40:35 Max vmem = 433.707M failed assumedly after job because: job 5433573.1 died through signal XCPU (24) So the job was running for 1h40, then get killed. But the queue that he submitted to has a CPU time limit of one week. Among the output of "qconf -sq <queue>": s_cpu 168:00:00 h_cpu 169:00:00 Any idea? Jérémie _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
