On 14 February 2013 12:50, Reuti <[email protected]> wrote: > Am 13.02.2013 um 16:05 schrieb Lars van der bijl: > > > On 13 February 2013 15:35, Reuti <[email protected]> wrote: > > Am 13.02.2013 um 15:16 schrieb Lars van der bijl: > > > > > hey everyone, > > > > > > we always set a v_smem values and catch this so that task don't use to > much memory. but we want to make sure they fall into a error state because > of dependencies. > > > > > > with SGE 8.1.2 we are seeing a lot of our machine not doing this > properly. > > > > > > $ qacct -j 10970 > > > ... > > > failed 100 : assumedly after job > > > exit_status 152 > > > > 152 = 128 + 24 = 24) SIGXCPU > > > > So this works. > > > > > so we catch the 152 and raise a 100 our self's but still they get > removed from the grid and there dependencies start. anyone have any ideas > what could cause this? > > > > How do you catch the signal and raise the error? Were the jobs submitted > with DRMAA? A simple job like: > > > > we are not using DRMAA. just qsub > > we have a prolog script that checks the exit status of the task and > raises it own. > > You mean epilog - right? >
your right. epilog. > > > > exit_status=`grep "exit_status" $SGE_JOB_SPOOL_DIR/usage | cut -d'=' -f > 2` > > It looks like you can't put a job into error state once it exited by a > signal (an `exit 152` doesn't block putting it into error state though). > > Can you add a line: > > trap 'exit 152' xcpu > > to your scripts? > I could but would that make the epilog run on the task correctly? isn't that what happening now because the qacct shows a exit of 152 and my epilog raising a 100. I could understand adding trap 'exit 100' xcpu working because it's run in the main thread. > > -- Reuti > > > > we then have a python script that checks the number of re-tries and exit > with 99 or 100 based on that. > > > > > > #!/bin/sh > > trap 'echo got it; exit 100' xcpu > > kill -xcpu $$ > > > > is working as expected? > > > > this worked as expected. > > > > > > > > -- Reuti > > > > > > > Lars > > > _______________________________________________ > > > users mailing list > > > [email protected] > > > https://gridengine.org/mailman/listinfo/users > > > > > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
