btw. on the nodes where I am logged on with qlogin the CPU-usage from the shepherd is always on 100%
> -----Original Message----- > From: Reuti [mailto:[email protected]] > Sent: Wednesday, December 04, 2013 5:33 PM > To: Wiegers, Bert > Cc: [email protected] > Subject: Re: [gridengine users] qlogin with ssh > > Am 04.12.2013 um 17:19 schrieb Wiegers, Bert: > > > Hi *, > > > > we are using a qlogin wrapper script, as mentioned below. > > It looks like that this setup prevents the sge to reach the > > terminate_method. > > You defined a custom "terminate_method"? Can you please post it? > > -- Reuti > > > > Bert > > > >> -----Original Message----- > >> From: [email protected] [mailto:[email protected]] > >> On Behalf Of > Wiegers, > >> Bert > >> Sent: Tuesday, December 03, 2013 9:01 AM > >> To: [email protected] > >> Subject: Re: [gridengine users] qlogin with ssh > >> > >> Hi Reuti, > >> > >> The processtree looks like this > >> root 20939 0.0 0.0 1242552 5892 ? Sl Nov14 18:57 > >> /export/opt/SGE-8.1.6/bin/lx- > >> amd64/sge_execd > >> root 33874 99.7 0.0 34164 2828 ? R 08:47 0:22 \_ > >> sge_shepherd-18003 -bg > >> root 33882 0.0 0.0 98156 3836 pts/1 Ss+ 08:47 0:00 \_ > >> sshd: xxxxxx [priv] > >> xxxxxx 33884 0.0 0.0 98156 2044 pts/1 S+ 08:47 0:00 \_ > >> sshd: xxxxxx@pts/2 > >> xxxxxx 33885 1.1 0.0 14556 3260 pts/2 SNs 08:47 0:00 > >> \_ -tcsh > >> it stays the same as long as I am logged on to the node. > >> > >> The Job is still listed in qstat. > >> > >> In the messages of the scheduler I find these hints: > >> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished > >> since 90s > >> > >> When I logout afterwards I see in the messages > >> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate job > >> 18003.1 > >> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY > >> qmaster enforced h_rt, > h_cpu, > >> or h_vmem limit because: <unknown reason> > >> > >> Bert > >> > >> > >> > >>> -----Original Message----- > >>> From: Reuti [mailto:[email protected]] > >>> Sent: Monday, December 02, 2013 6:43 PM > >>> To: Wiegers, Bert > >>> Cc: [email protected] > >>> Subject: Re: [gridengine users] qlogin with ssh > >>> > >>> Hi, > >>> > >>> Am 02.12.2013 um 18:28 schrieb Wiegers, Bert: > >>> > >>>> we are running the SGE 8.1.6. > >>>> We have configured some interactive queues and use qlogin with the > >>>> wrapper-script (... /usr/bin/ssh -Y -p $PORT $HOST). > >>>> In our setup the user is forced to use the h_rt variable. > >>>> Unfortunatly qlogin does not care if the walltime is overdue. > >>>> The shepherd seems to be unable to kill the qlogin sessions, when the > >>>> user is still connected to the node. > >>>> Has anyone a solution or a workaround for this? > >>> > >>> Is the `sshd` a child of the `shephered`, i.e. something like: > >>> > >>> $ ps -e f > >>> ... > >>> 6656 ? Sl 56:23 /usr/sge/bin/lx24-x86/sge_execd > >>> 9391 ? S 0:00 \_ sge_shepherd-10502 -bg > >>> 9392 ? Ss 0:00 \_ sshd: reuti [priv] > >>> 9398 ? S 0:00 \_ sshd: reuti@pts/2 > >>> 9405 pts/2 Ss 0:00 \_ -bash > >>> > >>> How does the process tree look like after "h_rt" expired - did the job > >>> vanish from the `qstat` > > too? > >>> > >>> -- Reuti > >> > >> _______________________________________________ > >> users mailing list > >> [email protected] > >> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
