Am 13.02.2013 um 15:16 schrieb Lars van der bijl:

> hey everyone,
> 
> we always set a v_smem values and catch this so that task don't use to much 
> memory. but we want to make sure they fall into a error state because of 
> dependencies.
> 
> with SGE 8.1.2 we are seeing a lot of our machine not doing this properly.
> 
> $ qacct -j 10970
> ... 
> failed       100 : assumedly after job
> exit_status  152                 

152 = 128 + 24 = 24) SIGXCPU

So this works.

> so we catch the 152 and raise a 100 our self's but still they get removed 
> from the grid and there dependencies start. anyone have any ideas what could 
> cause this?

How do you catch the signal and raise the error? Were the jobs submitted with 
DRMAA? A simple job like:

#!/bin/sh
trap 'echo got it; exit 100' xcpu
kill -xcpu $$

is working as expected?

-- Reuti


> Lars
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to