Am 13.03.2012 um 12:03 schrieb Lars van der bijl: > On 13 March 2012 11:18, Reuti <[email protected]> wrote: >> Hi, >> >> Am 13.03.2012 um 10:59 schrieb Lars van der bijl: >> >>> Hey everyone, >>> >>> Where having the following problem. >>> >>> randomly on some task we start getting "CPU time limit exceeded". we >> >> You notice that in the messages file of SGE on the execution host or where >> do you get the statement? >> > > we get this in our stderr output.
Then I would say it's not a limit by SGE. Can you set up any time limit in the appliation itself? >>> don't specify a time limit. we do specify h_vmem. >>> this only happens on some tasks and not other. even between same tasks >>> from a batch on the same machine. >> >> It could be a set limit in the queue definition (h_cpu), specified for some >> particular jobs (-l h_cpu=...). >> >> The time for an SGE limit is usually mentioned in the messages file. Is it >> always the same time? >> > > 03/13/2012 05:41:24|worker|nano|W|rescheduling job 61607.121 > 03/13/2012 05:41:24|worker|nano|W|job 61607.131 failed on host louie > general rescheduling on application error because: 03/13/2012 05:41:23 > [0:10105]: exit_status of job start = 100 So, the job was rescheduled (do you know why?), but the restart failed and put the job in error status (because of exit code 100). Do you see this? Can you elaborate in some why what is going on there in detail - is it supposed to fail if it's just rescheduled without cleaning any former files or so? -- Reuti > unless [0:10105] is the limit i'm not sure. > > > >> -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
