Hi *, we are using a qlogin wrapper script, as mentioned below. It looks like that this setup prevents the sge to reach the terminate_method.
Bert > -----Original Message----- > From: [email protected] [mailto:[email protected]] On > Behalf Of Wiegers, > Bert > Sent: Tuesday, December 03, 2013 9:01 AM > To: [email protected] > Subject: Re: [gridengine users] qlogin with ssh > > Hi Reuti, > > The processtree looks like this > root 20939 0.0 0.0 1242552 5892 ? Sl Nov14 18:57 > /export/opt/SGE-8.1.6/bin/lx- > amd64/sge_execd > root 33874 99.7 0.0 34164 2828 ? R 08:47 0:22 \_ > sge_shepherd-18003 -bg > root 33882 0.0 0.0 98156 3836 pts/1 Ss+ 08:47 0:00 \_ > sshd: xxxxxx [priv] > xxxxxx 33884 0.0 0.0 98156 2044 pts/1 S+ 08:47 0:00 \_ > sshd: xxxxxx@pts/2 > xxxxxx 33885 1.1 0.0 14556 3260 pts/2 SNs 08:47 0:00 > \_ -tcsh > it stays the same as long as I am logged on to the node. > > The Job is still listed in qstat. > > In the messages of the scheduler I find these hints: > 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished since > 90s > > When I logout afterwards I see in the messages > 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate job > 18003.1 > 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY qmaster > enforced h_rt, h_cpu, > or h_vmem limit because: <unknown reason> > > Bert > > > > > -----Original Message----- > > From: Reuti [mailto:[email protected]] > > Sent: Monday, December 02, 2013 6:43 PM > > To: Wiegers, Bert > > Cc: [email protected] > > Subject: Re: [gridengine users] qlogin with ssh > > > > Hi, > > > > Am 02.12.2013 um 18:28 schrieb Wiegers, Bert: > > > > > we are running the SGE 8.1.6. > > > We have configured some interactive queues and use qlogin with the > > > wrapper-script (... /usr/bin/ssh -Y -p $PORT $HOST). > > > In our setup the user is forced to use the h_rt variable. > > > Unfortunatly qlogin does not care if the walltime is overdue. > > > The shepherd seems to be unable to kill the qlogin sessions, when the > > > user is still connected to the node. > > > Has anyone a solution or a workaround for this? > > > > Is the `sshd` a child of the `shephered`, i.e. something like: > > > > $ ps -e f > > ... > > 6656 ? Sl 56:23 /usr/sge/bin/lx24-x86/sge_execd > > 9391 ? S 0:00 \_ sge_shepherd-10502 -bg > > 9392 ? Ss 0:00 \_ sshd: reuti [priv] > > 9398 ? S 0:00 \_ sshd: reuti@pts/2 > > 9405 pts/2 Ss 0:00 \_ -bash > > > > How does the process tree look like after "h_rt" expired - did the job > > vanish from the `qstat` too? > > > > -- Reuti > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
