On 14 February 2013 12:50, Reuti <[email protected]> wrote:

> Am 13.02.2013 um 16:05 schrieb Lars van der bijl:
>
> > On 13 February 2013 15:35, Reuti <[email protected]> wrote:
> > Am 13.02.2013 um 15:16 schrieb Lars van der bijl:
> >
> > > hey everyone,
> > >
> > > we always set a v_smem values and catch this so that task don't use to
> much memory. but we want to make sure they fall into a error state because
> of dependencies.
> > >
> > > with SGE 8.1.2 we are seeing a lot of our machine not doing this
> properly.
> > >
> > > $ qacct -j 10970
> > > ...
> > > failed       100 : assumedly after job
> > > exit_status  152
> >
> > 152 = 128 + 24 = 24) SIGXCPU
> >
> > So this works.
> >
> > > so we catch the 152 and raise a 100 our self's but still they get
> removed from the grid and there dependencies start. anyone have any ideas
> what could cause this?
> >
> > How do you catch the signal and raise the error? Were the jobs submitted
> with DRMAA? A simple job like:
> >
> > we are not using DRMAA. just qsub
> > we have a prolog script that checks the exit status of the task and
> raises it own.
>
> You mean epilog - right?
>

your right. epilog.


>
>
> > exit_status=`grep "exit_status" $SGE_JOB_SPOOL_DIR/usage | cut -d'=' -f
> 2`
>
> It looks like you can't put a job into error state once it exited by a
> signal (an `exit 152` doesn't block putting it into error state though).
>
> Can you add a line:
>
> trap 'exit 152' xcpu
>
> to your scripts?
>

I could but would that make the epilog run on the task correctly? isn't
that what happening now because the qacct shows a exit of 152 and my epilog
raising a 100.
I could understand adding

trap 'exit 100' xcpu

working because it's run in the main thread.



>
> -- Reuti
>
>
> > we then have a python script that checks the number of re-tries and exit
> with 99 or 100 based on that.
> >
> >
> > #!/bin/sh
> > trap 'echo got it; exit 100' xcpu
> > kill -xcpu $$
> >
> > is working as expected?
> >
> > this worked as expected.
> >
> >
> >
> > -- Reuti
> >
> >
> > > Lars
> > > _______________________________________________
> > > users mailing list
> > > [email protected]
> > > https://gridengine.org/mailman/listinfo/users
> >
> >
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to