Hi, Am 04.12.2013 um 22:47 schrieb Wiegers, Bert:
> I haven't tried this yet, because I can't find the right location for the > needed patch in the openssh sources: > > patch: > in main(): > init_rng(); > #ifdef SGESSH_INTEGRATION > sgessh_readconfig(); > #endif > > Changelog from openssh > 20110909 > - (dtucker) [entropy.h] Bug #1932: remove old definition of init_rng. From > Colin Watson. > > Has anyone done it? Comparing older and actual source it has to be put right after: __progname = ssh_get_progname(av[0]); (untested) -- Reuti > > execd_params ENABLE_ADDGRP_KILL=TRUE > is already there. > > Bert > >> -----Original Message----- >> From: Reuti [mailto:[email protected]] >> Sent: Wednesday, December 04, 2013 10:30 PM >> To: Wiegers, Bert >> Cc: [email protected] >> Subject: Re: [gridengine users] qlogin with ssh >> >> Am 04.12.2013 um 21:59 schrieb Wiegers, Bert: >> >>> According to the man-page of queue_conf >>> the kill -9 command should have been sent by default (we tried this first). >>> This killscript below was an attempt to fix the problem. >>> Both don't work. >> >> Then it might be promising to get a tight SSH integration: >> >> http://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html >> >> section "SSH TIGHT INTEGRATION". I wonder why I forgot to mention there that >> it needs >> "execd_params ENABLE_ADDGRP_KILL=TRUE" in SGE's configuration. >> >> -- Reuti >> >> >>> Bert >>> >>> >>> >>>> -----Original Message----- >>>> From: Reuti [mailto:[email protected]] >>>> Sent: Wednesday, December 04, 2013 6:28 PM >>>> To: Wiegers, Bert >>>> Cc: [email protected] >>>> Subject: Re: [gridengine users] qlogin with ssh >>>> >>>> Am 04.12.2013 um 17:47 schrieb Wiegers, Bert: >>>> >>>>> our setup is >>>>> >>>>> sge_conf: >>>>> qlogin_command >>>>> /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh >>>>> >>>>> cat /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh >>>>> #!/bin/sh >>>>> HOST=$1 >>>>> PORT=$2 >>>>> /usr/bin/ssh -Y -p $PORT $HOST >>>>> >>>>> >>>>> queue_conf: >>>>> terminate_method >>>>> /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh \ >>>>> $job_pid $job_owner >>>> >>>> What was the motivation to have a custom method? >>>> >>>> The default is to send a kill to the complete process group, i.e. >>>> something like >>>> >>>> kill -9 -- -$1 >>>> >>>> in your setup. >>>> >>>> >>>>> cat /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh >>>>> #!/bin/bash >>>>> >>>>> if [ $# -ne 2 ] ; then >>>>> echo "Usage:" $0 job_pid job_owner >>>>> exit 1 >>>>> fi >>>>> >>>>> job_pid=$1 >>>>> job_owner=$2 >>>>> >>>>> # try and kill the session group - the group leader is the shell >>>>> # executing the job script >>>>> pkill -s $job_pid if [ $? -ne 0 ] ; then >>>>> kill $job_pid >>>> >>>> AFAICS the sid can be different from the pid or pgrp. And the even when >>>> they are the same: it's >> the >>>> sid of the sshd, not the shell. >>>> >>>> -- Reuti >>>> >>>> >>>>> fi >>>>> >>>>> # cleanup grace period >>>>> sleep 10 >>>>> pkill -9 -s $job_pid >>>>> if [ $? -ne 0 ] ; then >>>>> kill -9 $job_pid >>>>> fi >>>>> >>>>> >>>>> >>>>> Bert >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Reuti [mailto:[email protected]] >>>>>> Sent: Wednesday, December 04, 2013 5:33 PM >>>>>> To: Wiegers, Bert >>>>>> Cc: [email protected] >>>>>> Subject: Re: [gridengine users] qlogin with ssh >>>>>> >>>>>> Am 04.12.2013 um 17:19 schrieb Wiegers, Bert: >>>>>> >>>>>>> Hi *, >>>>>>> >>>>>>> we are using a qlogin wrapper script, as mentioned below. >>>>>>> It looks like that this setup prevents the sge to reach the >>>>>>> terminate_method. >>>>>> >>>>>> You defined a custom "terminate_method"? Can you please post it? >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> Bert >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: [email protected] >>>>>>>> [mailto:[email protected]] On Behalf Of >>>>>> Wiegers, >>>>>>>> Bert >>>>>>>> Sent: Tuesday, December 03, 2013 9:01 AM >>>>>>>> To: [email protected] >>>>>>>> Subject: Re: [gridengine users] qlogin with ssh >>>>>>>> >>>>>>>> Hi Reuti, >>>>>>>> >>>>>>>> The processtree looks like this >>>>>>>> root 20939 0.0 0.0 1242552 5892 ? Sl Nov14 18:57 >>>>>>>> /export/opt/SGE-8.1.6/bin/lx- >>>>>>>> amd64/sge_execd >>>>>>>> root 33874 99.7 0.0 34164 2828 ? R 08:47 0:22 \_ >>>>>>>> sge_shepherd-18003 -bg >>>>>>>> root 33882 0.0 0.0 98156 3836 pts/1 Ss+ 08:47 0:00 >>>>>>>> \_ sshd: xxxxxx [priv] >>>>>>>> xxxxxx 33884 0.0 0.0 98156 2044 pts/1 S+ 08:47 0:00 >>>>>>>> \_ sshd: xxxxxx@pts/2 >>>>>>>> xxxxxx 33885 1.1 0.0 14556 3260 pts/2 SNs 08:47 0:00 >>>>>>>> \_ -tcsh >>>>>>>> it stays the same as long as I am logged on to the node. >>>>>>>> >>>>>>>> The Job is still listed in qstat. >>>>>>>> >>>>>>>> In the messages of the scheduler I find these hints: >>>>>>>> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished >>>>>>>> since 90s >>>>>>>> >>>>>>>> When I logout afterwards I see in the messages >>>>>>>> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate >>>>>>>> job 18003.1 >>>>>>>> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY >>>>>>>> qmaster enforced >> h_rt, >>>>>> h_cpu, >>>>>>>> or h_vmem limit because: <unknown reason> >>>>>>>> >>>>>>>> Bert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Reuti [mailto:[email protected]] >>>>>>>>> Sent: Monday, December 02, 2013 6:43 PM >>>>>>>>> To: Wiegers, Bert >>>>>>>>> Cc: [email protected] >>>>>>>>> Subject: Re: [gridengine users] qlogin with ssh >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Am 02.12.2013 um 18:28 schrieb Wiegers, Bert: >>>>>>>>> >>>>>>>>>> we are running the SGE 8.1.6. >>>>>>>>>> We have configured some interactive queues and use qlogin with the >>>>>>>>>> wrapper-script (... /usr/bin/ssh -Y -p $PORT $HOST). >>>>>>>>>> In our setup the user is forced to use the h_rt variable. >>>>>>>>>> Unfortunatly qlogin does not care if the walltime is overdue. >>>>>>>>>> The shepherd seems to be unable to kill the qlogin sessions, when the >>>>>>>>>> user is still connected to the node. >>>>>>>>>> Has anyone a solution or a workaround for this? >>>>>>>>> >>>>>>>>> Is the `sshd` a child of the `shephered`, i.e. something like: >>>>>>>>> >>>>>>>>> $ ps -e f >>>>>>>>> ... >>>>>>>>> 6656 ? Sl 56:23 /usr/sge/bin/lx24-x86/sge_execd >>>>>>>>> 9391 ? S 0:00 \_ sge_shepherd-10502 -bg >>>>>>>>> 9392 ? Ss 0:00 \_ sshd: reuti [priv] >>>>>>>>> 9398 ? S 0:00 \_ sshd: reuti@pts/2 >>>>>>>>> 9405 pts/2 Ss 0:00 \_ -bash >>>>>>>>> >>>>>>>>> How does the process tree look like after "h_rt" expired - did the >>>>>>>>> job vanish from the `qstat` >>>>>>> too? >>>>>>>>> >>>>>>>>> -- Reuti >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> [email protected] >>>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> [email protected] >>>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
