On Wed, 13 May 2015 07:44:08 +0000 "[email protected]" <[email protected]> wrote:
> Hi Reuti, > > The value in /opt/sge/default/spool/active_jobs/8143543.1/addgrpid is not > there in /proc/ > > But the the child processes of the job are available in /proc/. > > Can you please suggest a solution. > There are two possible solutions specified in the section of the man page Reuti referred you to. Either compile and use a patched sshd or configure your existing sshd to use the pam_sge-qrsh-setup PAM module. > Regards, > Sudha > > -----Original Message----- > From: Reuti [mailto:[email protected]] > Sent: Tuesday, May 12, 2015 8:53 PM > To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > Cc: [email protected]; [email protected] > Subject: Re: [gridengine users] grid jobs not visible with qstat output > > > > Am 12.05.2015 um 17:03 schrieb <[email protected]> > > <[email protected]>: > > > > Hi Reuti, > > > > In the link suggested by you > > (https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html ) it > > is mentioned as below > > > > "To have a tight integration of SSH into SGE, the started sshd needs an > > additional group ID to be attached." > > > > Checked the configuration from our side and the addgrpid is generated > > > > /opt/sge/default/spool/active_jobs/8143543.1 : ls addgrpid > > Yes, but not attached to all processes. Processes running in a tight > integration needs them attached like something in /proc: > > reuti@node:/proc/24989> cat status > ... > Groups: 20082 24000 25000 > > And the 20082 is the additional one. > > -- Reuti > > > > > > Regards, > > Sudha > > > > -----Original Message----- > > From: Reuti [mailto:[email protected]] > > Sent: Monday, May 11, 2015 2:08 AM > > To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > > Cc: [email protected]; [email protected] > > Subject: Re: [gridengine users] grid jobs not visible with qstat > > output > > > > > > Am 10.05.2015 um 19:30 schrieb <[email protected]> > > <[email protected]>: > > > >> Hi Reuti, > >> > >> The startup mechanism is as below > >> > >> qlogin_daemon /usr/sbin/sshd -i > >> qlogin_command /gridapl1/HWEE_ge6/new/qssh > > > > Then it's most likely that the `ssh` is not tightly integrated into SGE. > > Please have a look at: > > > > https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html > > > > section "SSH TIGHT INTEGRATION". > > > > -- Reuti > > > > > >> Regards, > >> Sudha > >> > >> -----Original Message----- > >> From: Reuti [mailto:[email protected]] > >> Sent: Friday, May 08, 2015 10:50 PM > >> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > >> Cc: [email protected]; [email protected] > >> Subject: Re: [gridengine users] grid jobs not visible with qstat > >> output > >> > >> > >>> Am 08.05.2015 um 16:57 schrieb [email protected]: > >>> > >>> Hi Zhang, > >>> > >>> Please find the o/p > >>> > >>> 32682 61457200 27020 karppa 32682 > >>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter > >>> /gridapl1/HWEE_ge6/default/spo > >>> 32734 61457200 27020 karppa 32734 \_ /bin/ksh ./run_it_file.vcs > >>> 33043 61457200 27020 karppa 32734 \_ /bin/ksh ./vcs.start.dh.no_gui > >>> 33059 61457200 27020 karppa 32734 \_ > >>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+ > >>> 38048 61457200 27020 karppa 32734 \_ [target.bin] <defunct> > >>> 5049 61457200 27020 karppa 5049 > >>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter > >>> /gridapl1/HWEE_ge6/default/spoo > >>> 5101 61457200 27020 karppa 5101 \_ /bin/ksh ./run_it_file.vcs > >>> 5408 61457200 27020 karppa 5101 \_ /bin/ksh ./vcs.start.dh.no_gui > >>> 5424 61457200 27020 karppa 5101 \_ > >>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm > >>> line+cond+fsm+branch+tgl+a > >>> 9089 61457200 27020 karppa 5101 \_ [target.bin] <defunct> > >> > >> The problem seems to be, that the `qrsh`starter` is no longer bound to the > >> "sge_shephered". This was after the job? How does it look like while SGE > >> still knows about the job. What is the startup mechanism: > >> > >> $ qconf -sconf > >> ... > >> qlogin_command builtin > >> qlogin_daemon builtin > >> rlogin_command builtin > >> rlogin_daemon builtin > >> rsh_command builtin > >> rsh_daemon builtin > >> > >> -- Reuti > >> > >> > >>> Regards, > >>> Sudha > >>> > >>> -----Original Message----- > >>> From: Feng Zhang [mailto:[email protected]] > >>> Sent: Friday, May 08, 2015 7:35 PM > >>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > >>> Subject: Re: [gridengine users] grid jobs not visible with qstat > >>> output > >>> > >>> Sudha, > >>> > >>> Can you run "ps -e f -o pid,ppid,command", which can show more details? > >>> > >>> On Fri, May 8, 2015 at 4:09 AM, <[email protected]> wrote: > >>>> Hi Reuti, > >>>> > >>>> The processes are not bound to sge_shepherd anymore. > >>>> > >>>> Below are the qrsh_starter processes running still > >>>> > >>>> 5049 ? 00:00:00 qrsh_starter > >>>> 5101 ? 00:00:00 run_it_file.vcs > >>>> 5408 ? 00:00:00 vcs.start.dh.no > >>>> 5424 ? 8-20:57:02 simv > >>>> 9089 ? 00:00:00 target.bin <defunct> > >>>> 16868 ? 00:00:00 sshd > >>>> 16913 pts/9 00:00:00 bash > >>>> 17371 pts/9 00:00:00 ps > >>>> 32682 ? 00:00:00 qrsh_starter > >>>> 32734 ? 00:00:00 run_it_file.vcs > >>>> 33043 ? 00:00:00 vcs.start.dh.no > >>>> 33059 ? 8-21:19:03 simv > >>>> 38048 ? 00:00:00 target.bin <defunct> > >>>> > >>>> Regards, > >>>> Sudha > >>>> > >>>> -----Original Message----- > >>>> From: Reuti [mailto:[email protected]] > >>>> Sent: Thursday, May 07, 2015 9:52 PM > >>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > >>>> Cc: [email protected]; [email protected] > >>>> Subject: Re: [gridengine users] grid jobs not visible with qstat > >>>> output > >>>> > >>>> Are the processes still bound to the sge_shephered or did they jump out > >>>> of the process tree? By what method were they started by qrsh_starter: > >>>> "builtin" or by defining `ssh`? > >>>> > >>>> -- Reuti > >>>> > >>>> > >>>>> Am 07.05.2015 um 18:00 schrieb <[email protected]> > >>>>> <[email protected]>: > >>>>> > >>>>> Hi, > >>>>> > >>>>> No the slots are not being used anymore > >>>>> > >>>>> That according to qstat I seem not to have any jobs at host. However, > >>>>> there are my processes running in that specific host (launched by > >>>>> qrsh_starter) that are altogether consuming 200% of CPU and licenses. > >>>>> The problem here is that the processes have been running there over a > >>>>> week and I haven't been aware of those. I've thought that the processes > >>>>> were killed when the job was killed with qdel. > >>>>> > >>>>> What could be the reason for this. > >>>>> > >>>>> Regards, > >>>>> Sudha > >>>>> > >>>>> From: Srirangam Addepalli [mailto:[email protected]] > >>>>> Sent: Wednesday, May 06, 2015 7:52 PM > >>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > >>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat > >>>>> output > >>>>> > >>>>> That would be strange. Do the slots on the host show as being used. > >>>>> > >>>>> qhost -j -h hostname should list the jobs that Grid Engine is aware of. > >>>>> Unless qrsh some how spwanned a process that is not bound by sge_execd. > >>>>> On the client/ execution host what info do you have in active_jobs and > >>>>> jobs directories. It is more likely that the qrsh session is > >>>>> terminated but left resident processes. > >>>>> > >>>>> Rangam > >>>>> > >>>>> On Wed, May 6, 2015 at 9:05 AM, <[email protected]> wrote: > >>>>> Hi, > >>>>> > >>>>> I noticed that I've had two grid jobs running over a week on a machine > >>>>> of which I haven't been aware of. Both of the jobs have been launched > >>>>> with qrsh but they are not visible with qstat thus for a reason or > >>>>> another they are no longer included in grid book-keeping. This issue > >>>>> will cause that grid resources are wasted for ghost jobs as for example > >>>>> both of my jobs seem to consume 100% CPU on the host. > >>>>> > >>>>> Can anyone please explain on this. > >>>>> > >>>>> Regards, > >>>>> Sudha > >>>>> > >>>>> The information contained in this electronic message and any > >>>>> attachments to this message are intended for the exclusive use of > >>>>> the addressee(s) and may contain proprietary, confidential or > >>>>> privileged information. If you are not the intended recipient, you > >>>>> should not disseminate, distribute or copy this e-mail. Please > >>>>> notify the sender immediately and destroy all copies of this > >>>>> message and any attachments. WARNING: Computer viruses can be > >>>>> transmitted via email. The recipient should check this email and > >>>>> any attachments for the presence of viruses. The company accepts > >>>>> no liability for any damage caused by any virus transmitted by > >>>>> this email. www.wipro.com > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> [email protected] > >>>>> https://gridengine.org/mailman/listinfo/users > >>>>> > >>>>> > >>>>> The information contained in this electronic message and any > >>>>> attachments to this message are intended for the exclusive use of > >>>>> the addressee(s) and may contain proprietary, confidential or > >>>>> privileged information. If you are not the intended recipient, you > >>>>> should not disseminate, distribute or copy this e-mail. Please > >>>>> notify the sender immediately and destroy all copies of this > >>>>> message and any attachments. WARNING: Computer viruses can be > >>>>> transmitted via email. The recipient should check this email and > >>>>> any attachments for the presence of viruses. The company accepts > >>>>> no liability for any damage caused by any virus transmitted by > >>>>> this email. www.wipro.com > >>>> > >>>> The information contained in this electronic message and any > >>>> attachments to this message are intended for the exclusive use of > >>>> the addressee(s) and may contain proprietary, confidential or > >>>> privileged information. If you are not the intended recipient, you > >>>> should not disseminate, distribute or copy this e-mail. Please > >>>> notify the sender immediately and destroy all copies of this > >>>> message and any attachments. WARNING: Computer viruses can be > >>>> transmitted via email. The recipient should check this email and > >>>> any attachments for the presence of viruses. The company accepts no > >>>> liability for any damage caused by any virus transmitted by this > >>>> email. www.wipro.com > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> [email protected] > >>>> https://gridengine.org/mailman/listinfo/users > >>> > >>> > >>> > >>> -- > >>> Best, > >>> > >>> Feng > >>> The information contained in this electronic message and any > >>> attachments to this message are intended for the exclusive use of > >>> the addressee(s) and may contain proprietary, confidential or > >>> privileged information. If you are not the intended recipient, you > >>> should not disseminate, distribute or copy this e-mail. Please > >>> notify the sender immediately and destroy all copies of this message > >>> and any attachments. WARNING: Computer viruses can be transmitted > >>> via email. The recipient should check this email and any attachments > >>> for the presence of viruses. The company accepts no liability for > >>> any damage caused by any virus transmitted by this email. > >>> www.wipro.com > >>> > >>> _______________________________________________ > >>> users mailing list > >>> [email protected] > >>> https://gridengine.org/mailman/listinfo/users > >>> > >> > >> The information contained in this electronic message and any > >> attachments to this message are intended for the exclusive use of the > >> addressee(s) and may contain proprietary, confidential or privileged > >> information. If you are not the intended recipient, you should not > >> disseminate, distribute or copy this e-mail. Please notify the sender > >> immediately and destroy all copies of this message and any > >> attachments. WARNING: Computer viruses can be transmitted via email. > >> The recipient should check this email and any attachments for the > >> presence of viruses. The company accepts no liability for any damage > >> caused by any virus transmitted by this email. www.wipro.com > >> > > > > The information contained in this electronic message and any > > attachments to this message are intended for the exclusive use of the > > addressee(s) and may contain proprietary, confidential or privileged > > information. If you are not the intended recipient, you should not > > disseminate, distribute or copy this e-mail. Please notify the sender > > immediately and destroy all copies of this message and any > > attachments. WARNING: Computer viruses can be transmitted via email. > > The recipient should check this email and any attachments for the > > presence of viruses. The company accepts no liability for any damage > > caused by any virus transmitted by this email. www.wipro.com > > > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. WARNING: Computer viruses can be transmitted via > email. The recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage caused > by any virus transmitted by this email. www.wipro.com > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users -- William Hay <[email protected]>
pgpyW78IDdzSE.pgp
Description: PGP signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
