those were inputs for debugging. job 1058200.1 failed on host assumedly after job because: job 1058200.1 died through signal USR2 (12)
09/26/2012 17:47:02|worker|E|denied: job "1058200" does not exist 50 out of 80 batch jobs got killed in the similar way and also one of the job in queue was also killed., does qmaster needs reboot. On Thu, Sep 27, 2012 at 9:39 PM, Reuti <[email protected]> wrote: > Am 26.09.2012 um 13:48 schrieb Vamsi Krishna: > > *Exit code 140:* The job exceeded the "wall clock" time limit, h_rt is > setto infinity > submit with -notify by default. > > > Is this a statement or a question? There can be more reasons for SIGUSR2 > like a passed memory limit as a result of -notify, or it can only be warned > as someone killed the job with a `qdel`. > > How can it run into h_rt when it's set to infinity? > > -- Reuti > > > > --PVK > > On Wed, Sep 26, 2012 at 12:46 PM, Reuti <[email protected]>wrote: > >> Am 26.09.2012 um 08:53 schrieb Vamsi Krishna: >> >> > some of the batch jobs are killed and qacct -j of the job id >> > >> > failed 100 : assumedly after job >> > exit_status 140 >> >> It's 128 + 12 = SIGUSR2. So what can cause this signal to be generated? >> >> Something in your job? >> >> You submit with -notify? >> >> -- Reuti >> >> >> > >> > >> > what could be the reason. >> > >> > Regards >> > PVK >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > https://gridengine.org/mailman/listinfo/users >> >> > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
