Yes, still the same question, I'm trying to get a proper exit code for
"qsub -sync y" :)
When I talk about graceful shutdown, I only talk about the slaves. It
really seems to me that whatever happens, if the slave tasks are not
cleanly shut down, qsub will always show this "Unable to run job" message
and return 0.

2012/9/21 Reuti <[email protected]>

> Am 21.09.2012 um 16:13 schrieb Julien Nicoulaud:
>
> > I tried to implement the -notify + trap USR2 solution, but could not get
> it to work. I can trap the USR2 signal in the qmaster PE script, but as
> soon as it is sent, the slave tasks get killed, leaving my application no
> time to cleanly shut them down. The qmaster log displays:
>
> Is this a new question? Originally you wanted to get a proper exit code
> for -sync y, now to gracefully shut down.
>
> -- Reuti
>
>
> > tightly integrated parallel task 61969.1 task 1.computeXX failed -
> killing job
> >
> > The queue is configured with "notify 00:00:60", so that should leave at
> least one minute. I also tried to trap USR2 in the PE script and not
> forward it all to child processes, but slave tasks still get killed. Is
> there something else specific to do to avoid this?
> >
> > 2012/9/19 Julien Nicoulaud <[email protected]>
> > Yes, that's what I meant. For me, if control_slaves is FALSE, qsub
> returns with a non-zero exit code after h_rt is elapsed.
> >
> >
> > 2012/9/19 Reuti <[email protected]>
> > Hi,
> >
> > Am 19.09.2012 um 14:36 schrieb Julien Nicoulaud:
> >
> > > On SGE 6.2u5, I submit jobs with -sync y and h_rt. When the jobs gets
> killed after the time is elapsed, qsub prints a "Unable to run job" message
> but exists with code 0.  I tried to trap KILL signal
> > > inside the job script, but it does not seem to affect qsub return
> code. Is it possible to make it return 1 ?
> > >
> > > Note: it only behaves this way for jobs running in a tightly
> integrated parallel environment. In a loosely integrated PE, qsub returns 1
> in this case...
> >
> > You mean the setting of "control_slaves"? For me it's always 0 if I
> request a PE.
> >
> > -- Reuti
> >
> >
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to