Am 04.12.2013 um 17:47 schrieb Wiegers, Bert:

> our setup is
> 
> sge_conf:
> qlogin_command               
> /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh
> 
> cat /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh
> #!/bin/sh
> HOST=$1
> PORT=$2
> /usr/bin/ssh -Y -p $PORT $HOST
> 
> 
> queue_conf:
> terminate_method      /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh \
>                      $job_pid $job_owner

What was the motivation to have a custom method?

The default is to send a kill to the complete process group, i.e. something like

kill -9 -- -$1
 
in your setup.


> cat /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh
> #!/bin/bash
> 
> if [ $# -ne 2 ] ; then
>  echo "Usage:" $0 job_pid job_owner
>  exit 1
> fi
> 
> job_pid=$1
> job_owner=$2
> 
> # try and kill the session group - the group leader is the shell 
> # executing the job script 
> pkill -s $job_pid if [ $? -ne 0 ] ; then
>        kill $job_pid

AFAICS the sid can be different from the pid or pgrp. And the even when they 
are the same: it's the sid of the sshd, not the shell.

-- Reuti


> fi
> 
> # cleanup grace period
> sleep 10
> pkill -9 -s $job_pid
> if [ $? -ne 0 ] ; then
>        kill -9 $job_pid
> fi
> 
> 
> 
> Bert
> 
> 
>> -----Original Message-----
>> From: Reuti [mailto:[email protected]]
>> Sent: Wednesday, December 04, 2013 5:33 PM
>> To: Wiegers, Bert
>> Cc: [email protected]
>> Subject: Re: [gridengine users] qlogin with ssh
>> 
>> Am 04.12.2013 um 17:19 schrieb Wiegers, Bert:
>> 
>>> Hi *,
>>> 
>>> we are using a qlogin wrapper script, as mentioned below.
>>> It looks like that this setup prevents the sge to reach the 
>>> terminate_method.
>> 
>> You defined a custom "terminate_method"? Can you please post it?
>> 
>> -- Reuti
>> 
>> 
>>> Bert
>>> 
>>>> -----Original Message-----
>>>> From: [email protected] [mailto:[email protected]] 
>>>> On Behalf Of
>> Wiegers,
>>>> Bert
>>>> Sent: Tuesday, December 03, 2013 9:01 AM
>>>> To: [email protected]
>>>> Subject: Re: [gridengine users] qlogin with ssh
>>>> 
>>>> Hi Reuti,
>>>> 
>>>> The processtree looks like this
>>>> root     20939  0.0  0.0 1242552 5892 ?        Sl   Nov14  18:57 
>>>> /export/opt/SGE-8.1.6/bin/lx-
>>>> amd64/sge_execd
>>>> root     33874 99.7  0.0  34164  2828 ?        R    08:47   0:22  \_ 
>>>> sge_shepherd-18003 -bg
>>>> root     33882  0.0  0.0  98156  3836 pts/1    Ss+  08:47   0:00      \_ 
>>>> sshd: xxxxxx [priv]
>>>> xxxxxx 33884  0.0  0.0  98156  2044 pts/1    S+   08:47   0:00          \_ 
>>>> sshd: xxxxxx@pts/2
>>>> xxxxxx 33885  1.1  0.0  14556  3260 pts/2    SNs  08:47   0:00             
>>>>  \_ -tcsh
>>>> it stays the same as long as I am logged on to the node.
>>>> 
>>>> The Job is still listed in qstat.
>>>> 
>>>> In the messages of the scheduler I find these hints:
>>>> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished 
>>>> since 90s
>>>> 
>>>> When I logout afterwards I see  in the messages
>>>> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate job 
>>>> 18003.1
>>>> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY 
>>>> qmaster enforced h_rt,
>> h_cpu,
>>>> or h_vmem limit because: <unknown reason>
>>>> 
>>>> Bert
>>>> 
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Reuti [mailto:[email protected]]
>>>>> Sent: Monday, December 02, 2013 6:43 PM
>>>>> To: Wiegers, Bert
>>>>> Cc: [email protected]
>>>>> Subject: Re: [gridengine users] qlogin with ssh
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Am 02.12.2013 um 18:28 schrieb Wiegers, Bert:
>>>>> 
>>>>>> we are running the SGE 8.1.6.
>>>>>> We have configured some interactive queues and use qlogin with the
>>>>>> wrapper-script  (... /usr/bin/ssh -Y -p $PORT $HOST).
>>>>>> In our setup the user is forced to use the  h_rt variable.
>>>>>> Unfortunatly qlogin does not care if the walltime is overdue.
>>>>>> The shepherd seems to be unable to kill the qlogin sessions, when the
>>>>>> user is still connected to the node.
>>>>>> Has anyone a solution or a workaround for this?
>>>>> 
>>>>> Is the `sshd` a child of the `shephered`, i.e. something like:
>>>>> 
>>>>> $ ps -e f
>>>>> ...
>>>>> 6656 ?        Sl    56:23 /usr/sge/bin/lx24-x86/sge_execd
>>>>> 9391 ?        S      0:00  \_ sge_shepherd-10502 -bg
>>>>> 9392 ?        Ss     0:00      \_ sshd: reuti [priv]
>>>>> 9398 ?        S      0:00          \_ sshd: reuti@pts/2
>>>>> 9405 pts/2    Ss     0:00              \_ -bash
>>>>> 
>>>>> How does the process tree look like after "h_rt" expired - did the job 
>>>>> vanish from the `qstat`
>>> too?
>>>>> 
>>>>> -- Reuti
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to