Thanks for your answers.

@Dave The crashes seem to occur on RHEL/CentOS 5, I'm using Centos 6 and so
far I did not have any crash.

The changeset linked seems to indicate there is no chance I can get this to
work fine on 6.2u5, but I'm still wondering: if this is ruled by some
regular polling, what defines its period ? May be it's configurable. I can
afford more load on the qmaster host.


2012/9/3 Reuti <[email protected]>

> Hi,
>
> Am 03.09.2012 um 17:11 schrieb Julien Nicoulaud:
>
> > I'm using SGE 6.2u5. I built a tightly integrated parallel environment
> for my application, using "qrsh -inherit". Everything works fine, but at
> the end of every job using the PE, there is a huge time between the moment
> when the PE script returns and the moment when the parent qsub returns
> (approx 2 minutes).
> >
> > The only case were it returns fast is when I send a SIGINT to the parent
> qsub. In every other configuration, there is this delay. This happens
> whatever the result of the PE script is, or whether qrsh processes are
> cleanly shutdown before returning or not.
> >
> > My PE does not have any stop_proc_args, my queue no epilog.
> >
> > I can't find any relevant trace of what happens in the meantime in the
> logs.
> >
> > Is this normal behavior?
>
> Yes.
>
>
> > Is there some kind of polling mechanism?
>
> It's a safety precaution to be sure all tasks ended.
>
> But it's fixed in some follow up versions of SGE IIRC, but I can't find
> any links about it.
>
> -- Reuti
>
>
> > Regards,
> > Julien
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to