Hi,

Am 04.12.2013 um 22:47 schrieb Wiegers, Bert:

> I haven't tried this yet, because I can't find the right location for the 
> needed patch in the openssh sources:
> 
> patch:
>              in main():
>                     init_rng();
>                     #ifdef SGESSH_INTEGRATION
>                     sgessh_readconfig();
>                     #endif
> 
> Changelog from openssh
> 20110909
> - (dtucker) [entropy.h] Bug #1932: remove old definition of init_rng.  From
>   Colin Watson.
> 
> Has anyone done it?

Comparing older and actual source it has to be put right after:

__progname = ssh_get_progname(av[0]);

(untested)

-- Reuti


> 
> execd_params ENABLE_ADDGRP_KILL=TRUE
> is already there.
> 
> Bert
> 
>> -----Original Message-----
>> From: Reuti [mailto:[email protected]]
>> Sent: Wednesday, December 04, 2013 10:30 PM
>> To: Wiegers, Bert
>> Cc: [email protected]
>> Subject: Re: [gridengine users] qlogin with ssh
>> 
>> Am 04.12.2013 um 21:59 schrieb Wiegers, Bert:
>> 
>>> According to the man-page of queue_conf
>>> the kill -9 command should have been sent by default (we tried this first).
>>> This killscript below was an attempt to fix the problem.
>>> Both don't work.
>> 
>> Then it might be promising to get a tight SSH integration:
>> 
>> http://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
>> 
>> section "SSH TIGHT INTEGRATION". I wonder why I forgot to mention there that 
>> it needs
>> "execd_params ENABLE_ADDGRP_KILL=TRUE" in SGE's configuration.
>> 
>> -- Reuti
>> 
>> 
>>> Bert
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Reuti [mailto:[email protected]]
>>>> Sent: Wednesday, December 04, 2013 6:28 PM
>>>> To: Wiegers, Bert
>>>> Cc: [email protected]
>>>> Subject: Re: [gridengine users] qlogin with ssh
>>>> 
>>>> Am 04.12.2013 um 17:47 schrieb Wiegers, Bert:
>>>> 
>>>>> our setup is
>>>>> 
>>>>> sge_conf:
>>>>> qlogin_command               
>>>>> /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh
>>>>> 
>>>>> cat /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh
>>>>> #!/bin/sh
>>>>> HOST=$1
>>>>> PORT=$2
>>>>> /usr/bin/ssh -Y -p $PORT $HOST
>>>>> 
>>>>> 
>>>>> queue_conf:
>>>>> terminate_method      
>>>>> /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh \
>>>>>                    $job_pid $job_owner
>>>> 
>>>> What was the motivation to have a custom method?
>>>> 
>>>> The default is to send a kill to the complete process group, i.e. 
>>>> something like
>>>> 
>>>> kill -9 -- -$1
>>>> 
>>>> in your setup.
>>>> 
>>>> 
>>>>> cat /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh
>>>>> #!/bin/bash
>>>>> 
>>>>> if [ $# -ne 2 ] ; then
>>>>> echo "Usage:" $0 job_pid job_owner
>>>>> exit 1
>>>>> fi
>>>>> 
>>>>> job_pid=$1
>>>>> job_owner=$2
>>>>> 
>>>>> # try and kill the session group - the group leader is the shell
>>>>> # executing the job script
>>>>> pkill -s $job_pid if [ $? -ne 0 ] ; then
>>>>>      kill $job_pid
>>>> 
>>>> AFAICS the sid can be different from the pid or pgrp. And the even when 
>>>> they are the same: it's
>> the
>>>> sid of the sshd, not the shell.
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> fi
>>>>> 
>>>>> # cleanup grace period
>>>>> sleep 10
>>>>> pkill -9 -s $job_pid
>>>>> if [ $? -ne 0 ] ; then
>>>>>      kill -9 $job_pid
>>>>> fi
>>>>> 
>>>>> 
>>>>> 
>>>>> Bert
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Reuti [mailto:[email protected]]
>>>>>> Sent: Wednesday, December 04, 2013 5:33 PM
>>>>>> To: Wiegers, Bert
>>>>>> Cc: [email protected]
>>>>>> Subject: Re: [gridengine users] qlogin with ssh
>>>>>> 
>>>>>> Am 04.12.2013 um 17:19 schrieb Wiegers, Bert:
>>>>>> 
>>>>>>> Hi *,
>>>>>>> 
>>>>>>> we are using a qlogin wrapper script, as mentioned below.
>>>>>>> It looks like that this setup prevents the sge to reach the 
>>>>>>> terminate_method.
>>>>>> 
>>>>>> You defined a custom "terminate_method"? Can you please post it?
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> 
>>>>>>> Bert
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: [email protected] 
>>>>>>>> [mailto:[email protected]] On Behalf Of
>>>>>> Wiegers,
>>>>>>>> Bert
>>>>>>>> Sent: Tuesday, December 03, 2013 9:01 AM
>>>>>>>> To: [email protected]
>>>>>>>> Subject: Re: [gridengine users] qlogin with ssh
>>>>>>>> 
>>>>>>>> Hi Reuti,
>>>>>>>> 
>>>>>>>> The processtree looks like this
>>>>>>>> root     20939  0.0  0.0 1242552 5892 ?        Sl   Nov14  18:57 
>>>>>>>> /export/opt/SGE-8.1.6/bin/lx-
>>>>>>>> amd64/sge_execd
>>>>>>>> root     33874 99.7  0.0  34164  2828 ?        R    08:47   0:22  \_ 
>>>>>>>> sge_shepherd-18003 -bg
>>>>>>>> root     33882  0.0  0.0  98156  3836 pts/1    Ss+  08:47   0:00      
>>>>>>>> \_ sshd: xxxxxx [priv]
>>>>>>>> xxxxxx 33884  0.0  0.0  98156  2044 pts/1    S+   08:47   0:00         
>>>>>>>>  \_ sshd: xxxxxx@pts/2
>>>>>>>> xxxxxx 33885  1.1  0.0  14556  3260 pts/2    SNs  08:47   0:00         
>>>>>>>>      \_ -tcsh
>>>>>>>> it stays the same as long as I am logged on to the node.
>>>>>>>> 
>>>>>>>> The Job is still listed in qstat.
>>>>>>>> 
>>>>>>>> In the messages of the scheduler I find these hints:
>>>>>>>> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished 
>>>>>>>> since 90s
>>>>>>>> 
>>>>>>>> When I logout afterwards I see  in the messages
>>>>>>>> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate 
>>>>>>>> job 18003.1
>>>>>>>> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY 
>>>>>>>> qmaster enforced
>> h_rt,
>>>>>> h_cpu,
>>>>>>>> or h_vmem limit because: <unknown reason>
>>>>>>>> 
>>>>>>>> Bert
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Reuti [mailto:[email protected]]
>>>>>>>>> Sent: Monday, December 02, 2013 6:43 PM
>>>>>>>>> To: Wiegers, Bert
>>>>>>>>> Cc: [email protected]
>>>>>>>>> Subject: Re: [gridengine users] qlogin with ssh
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Am 02.12.2013 um 18:28 schrieb Wiegers, Bert:
>>>>>>>>> 
>>>>>>>>>> we are running the SGE 8.1.6.
>>>>>>>>>> We have configured some interactive queues and use qlogin with the
>>>>>>>>>> wrapper-script  (... /usr/bin/ssh -Y -p $PORT $HOST).
>>>>>>>>>> In our setup the user is forced to use the  h_rt variable.
>>>>>>>>>> Unfortunatly qlogin does not care if the walltime is overdue.
>>>>>>>>>> The shepherd seems to be unable to kill the qlogin sessions, when the
>>>>>>>>>> user is still connected to the node.
>>>>>>>>>> Has anyone a solution or a workaround for this?
>>>>>>>>> 
>>>>>>>>> Is the `sshd` a child of the `shephered`, i.e. something like:
>>>>>>>>> 
>>>>>>>>> $ ps -e f
>>>>>>>>> ...
>>>>>>>>> 6656 ?        Sl    56:23 /usr/sge/bin/lx24-x86/sge_execd
>>>>>>>>> 9391 ?        S      0:00  \_ sge_shepherd-10502 -bg
>>>>>>>>> 9392 ?        Ss     0:00      \_ sshd: reuti [priv]
>>>>>>>>> 9398 ?        S      0:00          \_ sshd: reuti@pts/2
>>>>>>>>> 9405 pts/2    Ss     0:00              \_ -bash
>>>>>>>>> 
>>>>>>>>> How does the process tree look like after "h_rt" expired - did the 
>>>>>>>>> job vanish from the `qstat`
>>>>>>> too?
>>>>>>>>> 
>>>>>>>>> -- Reuti
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> [email protected]
>>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to