Am 13.02.2013 um 15:16 schrieb Lars van der bijl: > hey everyone, > > we always set a v_smem values and catch this so that task don't use to much > memory. but we want to make sure they fall into a error state because of > dependencies. > > with SGE 8.1.2 we are seeing a lot of our machine not doing this properly. > > $ qacct -j 10970 > ... > failed 100 : assumedly after job > exit_status 152
152 = 128 + 24 = 24) SIGXCPU So this works. > so we catch the 152 and raise a 100 our self's but still they get removed > from the grid and there dependencies start. anyone have any ideas what could > cause this? How do you catch the signal and raise the error? Were the jobs submitted with DRMAA? A simple job like: #!/bin/sh trap 'echo got it; exit 100' xcpu kill -xcpu $$ is working as expected? -- Reuti > Lars > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
