Re: [gridengine users] Jobs died through signal XCPU while not exceeding limit

Jérémie Dubois-Lacoste Fri, 19 Oct 2012 10:44:24 -0700

afair, when vmem is passed, the abort message says KILL,
not XCPU. But anyway 433M is below the limit (soft 450,
hard 480), so I don't think the memory is involved here.


J

2012/10/19 Reuti <[email protected]>:
> Am 19.10.2012 um 19:01 schrieb Jérémie Dubois-Lacoste:
>
>> One user on our cluster is having this problem, that I've never
>> seen before. According to him there is some randomness, the
>> same job may succeed or fail from time to time.
>> When the job abbort he gets this e-mail:
>>
>> Start Time       = 10/19/2012 15:25:17
>> End Time         = 10/19/2012 17:07:20
>> CPU              = 01:40:35
>> Max vmem         = 433.707M
>
> It's also send if s_vmem is passed.
>
> -- Reuti
>
>
>> failed assumedly after job because:
>> job 5433573.1 died through signal XCPU (24)
>>
>> So the job was running for 1h40, then get killed.
>>
>> But the queue that he submitted to has a CPU time limit
>> of one week. Among the output of "qconf -sq <queue>":
>> s_cpu                 168:00:00
>> h_cpu                 169:00:00
>>
>> Any idea?
>>
>> Jérémie
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Jobs died through signal XCPU while not exceeding limit

Reply via email to