Am 19.10.2012 um 19:43 schrieb Jérémie Dubois-Lacoste: > afair, when vmem is passed, the abort message says KILL, > not XCPU. But anyway 433M is below the limit (soft 450, > hard 480), so I don't think the memory is involved here.
Defined by M or m? M = base 1024 m = base 1000 -- Reuti (man sge_types) > 2012/10/19 Reuti <[email protected]>: >> Am 19.10.2012 um 19:01 schrieb Jérémie Dubois-Lacoste: >> >>> One user on our cluster is having this problem, that I've never >>> seen before. According to him there is some randomness, the >>> same job may succeed or fail from time to time. >>> When the job abbort he gets this e-mail: >>> >>> Start Time = 10/19/2012 15:25:17 >>> End Time = 10/19/2012 17:07:20 >>> CPU = 01:40:35 >>> Max vmem = 433.707M >> >> It's also send if s_vmem is passed. >> >> -- Reuti >> >> >>> failed assumedly after job because: >>> job 5433573.1 died through signal XCPU (24) >>> >>> So the job was running for 1h40, then get killed. >>> >>> But the queue that he submitted to has a CPU time limit >>> of one week. Among the output of "qconf -sq <queue>": >>> s_cpu 168:00:00 >>> h_cpu 169:00:00 >>> >>> Any idea? >>> >>> Jérémie >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >>> >> > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
