Am 04.12.2013 um 17:19 schrieb Wiegers, Bert:

> Hi *,
> 
> we are using a qlogin wrapper script, as mentioned below.
> It looks like that this setup prevents the sge to reach the terminate_method.

You defined a custom "terminate_method"? Can you please post it?

-- Reuti


> Bert
> 
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On 
>> Behalf Of Wiegers,
>> Bert
>> Sent: Tuesday, December 03, 2013 9:01 AM
>> To: [email protected]
>> Subject: Re: [gridengine users] qlogin with ssh
>> 
>> Hi Reuti,
>> 
>> The processtree looks like this
>> root     20939  0.0  0.0 1242552 5892 ?        Sl   Nov14  18:57 
>> /export/opt/SGE-8.1.6/bin/lx-
>> amd64/sge_execd
>> root     33874 99.7  0.0  34164  2828 ?        R    08:47   0:22  \_ 
>> sge_shepherd-18003 -bg
>> root     33882  0.0  0.0  98156  3836 pts/1    Ss+  08:47   0:00      \_ 
>> sshd: xxxxxx [priv]
>> xxxxxx 33884  0.0  0.0  98156  2044 pts/1    S+   08:47   0:00          \_ 
>> sshd: xxxxxx@pts/2
>> xxxxxx 33885  1.1  0.0  14556  3260 pts/2    SNs  08:47   0:00              
>> \_ -tcsh
>> it stays the same as long as I am logged on to the node.
>> 
>> The Job is still listed in qstat.
>> 
>> In the messages of the scheduler I find these hints:
>> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished since 
>> 90s
>> 
>> When I logout afterwards I see  in the messages
>> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate job 
>> 18003.1
>> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY qmaster 
>> enforced h_rt, h_cpu,
>> or h_vmem limit because: <unknown reason>
>> 
>> Bert
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: Reuti [mailto:[email protected]]
>>> Sent: Monday, December 02, 2013 6:43 PM
>>> To: Wiegers, Bert
>>> Cc: [email protected]
>>> Subject: Re: [gridengine users] qlogin with ssh
>>> 
>>> Hi,
>>> 
>>> Am 02.12.2013 um 18:28 schrieb Wiegers, Bert:
>>> 
>>>> we are running the SGE 8.1.6.
>>>> We have configured some interactive queues and use qlogin with the
>>>> wrapper-script  (... /usr/bin/ssh -Y -p $PORT $HOST).
>>>> In our setup the user is forced to use the  h_rt variable.
>>>> Unfortunatly qlogin does not care if the walltime is overdue.
>>>> The shepherd seems to be unable to kill the qlogin sessions, when the
>>>> user is still connected to the node.
>>>> Has anyone a solution or a workaround for this?
>>> 
>>> Is the `sshd` a child of the `shephered`, i.e. something like:
>>> 
>>> $ ps -e f
>>> ...
>>> 6656 ?        Sl    56:23 /usr/sge/bin/lx24-x86/sge_execd
>>> 9391 ?        S      0:00  \_ sge_shepherd-10502 -bg
>>> 9392 ?        Ss     0:00      \_ sshd: reuti [priv]
>>> 9398 ?        S      0:00          \_ sshd: reuti@pts/2
>>> 9405 pts/2    Ss     0:00              \_ -bash
>>> 
>>> How does the process tree look like after "h_rt" expired - did the job 
>>> vanish from the `qstat`
> too?
>>> 
>>> -- Reuti
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to