Am 04.12.2013 um 17:19 schrieb Wiegers, Bert: > Hi *, > > we are using a qlogin wrapper script, as mentioned below. > It looks like that this setup prevents the sge to reach the terminate_method.
You defined a custom "terminate_method"? Can you please post it? -- Reuti > Bert > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On >> Behalf Of Wiegers, >> Bert >> Sent: Tuesday, December 03, 2013 9:01 AM >> To: [email protected] >> Subject: Re: [gridengine users] qlogin with ssh >> >> Hi Reuti, >> >> The processtree looks like this >> root 20939 0.0 0.0 1242552 5892 ? Sl Nov14 18:57 >> /export/opt/SGE-8.1.6/bin/lx- >> amd64/sge_execd >> root 33874 99.7 0.0 34164 2828 ? R 08:47 0:22 \_ >> sge_shepherd-18003 -bg >> root 33882 0.0 0.0 98156 3836 pts/1 Ss+ 08:47 0:00 \_ >> sshd: xxxxxx [priv] >> xxxxxx 33884 0.0 0.0 98156 2044 pts/1 S+ 08:47 0:00 \_ >> sshd: xxxxxx@pts/2 >> xxxxxx 33885 1.1 0.0 14556 3260 pts/2 SNs 08:47 0:00 >> \_ -tcsh >> it stays the same as long as I am logged on to the node. >> >> The Job is still listed in qstat. >> >> In the messages of the scheduler I find these hints: >> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished since >> 90s >> >> When I logout afterwards I see in the messages >> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate job >> 18003.1 >> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY qmaster >> enforced h_rt, h_cpu, >> or h_vmem limit because: <unknown reason> >> >> Bert >> >> >> >>> -----Original Message----- >>> From: Reuti [mailto:[email protected]] >>> Sent: Monday, December 02, 2013 6:43 PM >>> To: Wiegers, Bert >>> Cc: [email protected] >>> Subject: Re: [gridengine users] qlogin with ssh >>> >>> Hi, >>> >>> Am 02.12.2013 um 18:28 schrieb Wiegers, Bert: >>> >>>> we are running the SGE 8.1.6. >>>> We have configured some interactive queues and use qlogin with the >>>> wrapper-script (... /usr/bin/ssh -Y -p $PORT $HOST). >>>> In our setup the user is forced to use the h_rt variable. >>>> Unfortunatly qlogin does not care if the walltime is overdue. >>>> The shepherd seems to be unable to kill the qlogin sessions, when the >>>> user is still connected to the node. >>>> Has anyone a solution or a workaround for this? >>> >>> Is the `sshd` a child of the `shephered`, i.e. something like: >>> >>> $ ps -e f >>> ... >>> 6656 ? Sl 56:23 /usr/sge/bin/lx24-x86/sge_execd >>> 9391 ? S 0:00 \_ sge_shepherd-10502 -bg >>> 9392 ? Ss 0:00 \_ sshd: reuti [priv] >>> 9398 ? S 0:00 \_ sshd: reuti@pts/2 >>> 9405 pts/2 Ss 0:00 \_ -bash >>> >>> How does the process tree look like after "h_rt" expired - did the job >>> vanish from the `qstat` > too? >>> >>> -- Reuti >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
