Am 13.03.2012 um 12:03 schrieb Lars van der bijl:

> On 13 March 2012 11:18, Reuti <[email protected]> wrote:
>> Hi,
>> 
>> Am 13.03.2012 um 10:59 schrieb Lars van der bijl:
>> 
>>> Hey everyone,
>>> 
>>> Where having the following problem.
>>> 
>>> randomly on some task we start getting "CPU time limit exceeded". we
>> 
>> You notice that in the messages file of SGE on the execution host or where 
>> do you get the statement?
>> 
> 
> we get this in our stderr output.

Then I would say it's not a limit by SGE. Can you set up any time limit in the 
appliation itself?


>>> don't specify a time limit. we do specify h_vmem.
>>> this only happens on some tasks and not other. even between same tasks
>>> from a batch on the same machine.
>> 
>> It could be a set limit in the queue definition (h_cpu), specified for some 
>> particular jobs (-l h_cpu=...).
>> 
>> The time for an SGE limit is usually mentioned in the messages file. Is it 
>> always the same time?
>> 
> 
> 03/13/2012 05:41:24|worker|nano|W|rescheduling job 61607.121
> 03/13/2012 05:41:24|worker|nano|W|job 61607.131 failed on host louie
> general rescheduling on application error because: 03/13/2012 05:41:23
> [0:10105]: exit_status of job start = 100

So, the job was rescheduled (do you know why?), but the restart failed and put 
the job in error status (because of exit code 100). Do you see this?

Can you elaborate in some why what is going on there in detail - is it supposed 
to fail if it's just rescheduled without cleaning any former files or so?

-- Reuti


> unless [0:10105] is the limit i'm not sure.
> 
> 
> 
>> -- Reuti


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to