our setup is sge_conf: qlogin_command /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh
cat /export/opt/SGE-8.1.6/utilbin/lx-amd64/qlogin_wrapper.sh
#!/bin/sh
HOST=$1
PORT=$2
/usr/bin/ssh -Y -p $PORT $HOST
queue_conf:
terminate_method /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh \
$job_pid $job_owner
cat /export/opt/SGE-8.1.6/scripts/case_terminate_method.sh
#!/bin/bash
if [ $# -ne 2 ] ; then
echo "Usage:" $0 job_pid job_owner
exit 1
fi
job_pid=$1
job_owner=$2
# try and kill the session group - the group leader is the shell
# executing the job script
pkill -s $job_pid if [ $? -ne 0 ] ; then
kill $job_pid
fi
# cleanup grace period
sleep 10
pkill -9 -s $job_pid
if [ $? -ne 0 ] ; then
kill -9 $job_pid
fi
Bert
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Wednesday, December 04, 2013 5:33 PM
> To: Wiegers, Bert
> Cc: [email protected]
> Subject: Re: [gridengine users] qlogin with ssh
>
> Am 04.12.2013 um 17:19 schrieb Wiegers, Bert:
>
> > Hi *,
> >
> > we are using a qlogin wrapper script, as mentioned below.
> > It looks like that this setup prevents the sge to reach the
> > terminate_method.
>
> You defined a custom "terminate_method"? Can you please post it?
>
> -- Reuti
>
>
> > Bert
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:[email protected]]
> >> On Behalf Of
> Wiegers,
> >> Bert
> >> Sent: Tuesday, December 03, 2013 9:01 AM
> >> To: [email protected]
> >> Subject: Re: [gridengine users] qlogin with ssh
> >>
> >> Hi Reuti,
> >>
> >> The processtree looks like this
> >> root 20939 0.0 0.0 1242552 5892 ? Sl Nov14 18:57
> >> /export/opt/SGE-8.1.6/bin/lx-
> >> amd64/sge_execd
> >> root 33874 99.7 0.0 34164 2828 ? R 08:47 0:22 \_
> >> sge_shepherd-18003 -bg
> >> root 33882 0.0 0.0 98156 3836 pts/1 Ss+ 08:47 0:00 \_
> >> sshd: xxxxxx [priv]
> >> xxxxxx 33884 0.0 0.0 98156 2044 pts/1 S+ 08:47 0:00 \_
> >> sshd: xxxxxx@pts/2
> >> xxxxxx 33885 1.1 0.0 14556 3260 pts/2 SNs 08:47 0:00
> >> \_ -tcsh
> >> it stays the same as long as I am logged on to the node.
> >>
> >> The Job is still listed in qstat.
> >>
> >> In the messages of the scheduler I find these hints:
> >> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished
> >> since 90s
> >>
> >> When I logout afterwards I see in the messages
> >> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate job
> >> 18003.1
> >> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY
> >> qmaster enforced h_rt,
> h_cpu,
> >> or h_vmem limit because: <unknown reason>
> >>
> >> Bert
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Reuti [mailto:[email protected]]
> >>> Sent: Monday, December 02, 2013 6:43 PM
> >>> To: Wiegers, Bert
> >>> Cc: [email protected]
> >>> Subject: Re: [gridengine users] qlogin with ssh
> >>>
> >>> Hi,
> >>>
> >>> Am 02.12.2013 um 18:28 schrieb Wiegers, Bert:
> >>>
> >>>> we are running the SGE 8.1.6.
> >>>> We have configured some interactive queues and use qlogin with the
> >>>> wrapper-script (... /usr/bin/ssh -Y -p $PORT $HOST).
> >>>> In our setup the user is forced to use the h_rt variable.
> >>>> Unfortunatly qlogin does not care if the walltime is overdue.
> >>>> The shepherd seems to be unable to kill the qlogin sessions, when the
> >>>> user is still connected to the node.
> >>>> Has anyone a solution or a workaround for this?
> >>>
> >>> Is the `sshd` a child of the `shephered`, i.e. something like:
> >>>
> >>> $ ps -e f
> >>> ...
> >>> 6656 ? Sl 56:23 /usr/sge/bin/lx24-x86/sge_execd
> >>> 9391 ? S 0:00 \_ sge_shepherd-10502 -bg
> >>> 9392 ? Ss 0:00 \_ sshd: reuti [priv]
> >>> 9398 ? S 0:00 \_ sshd: reuti@pts/2
> >>> 9405 pts/2 Ss 0:00 \_ -bash
> >>>
> >>> How does the process tree look like after "h_rt" expired - did the job
> >>> vanish from the `qstat`
> > too?
> >>>
> >>> -- Reuti
> >>
> >> _______________________________________________
> >> users mailing list
> >> [email protected]
> >> https://gridengine.org/mailman/listinfo/users
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
