> Am 12.05.2015 um 17:03 schrieb <[email protected]> > <[email protected]>: > > Hi Reuti, > > In the link suggested by you > (https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html ) it is > mentioned as below > > "To have a tight integration of SSH into SGE, the started sshd needs an > additional group ID to be attached." > > Checked the configuration from our side and the addgrpid is generated > > /opt/sge/default/spool/active_jobs/8143543.1 : ls > addgrpid
Yes, but not attached to all processes. Processes running in a tight integration needs them attached like something in /proc: reuti@node:/proc/24989> cat status ... Groups: 20082 24000 25000 And the 20082 is the additional one. -- Reuti > > Regards, > Sudha > > -----Original Message----- > From: Reuti [mailto:[email protected]] > Sent: Monday, May 11, 2015 2:08 AM > To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > Cc: [email protected]; [email protected] > Subject: Re: [gridengine users] grid jobs not visible with qstat output > > > Am 10.05.2015 um 19:30 schrieb <[email protected]> > <[email protected]>: > >> Hi Reuti, >> >> The startup mechanism is as below >> >> qlogin_daemon /usr/sbin/sshd -i >> qlogin_command /gridapl1/HWEE_ge6/new/qssh > > Then it's most likely that the `ssh` is not tightly integrated into SGE. > Please have a look at: > > https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html > > section "SSH TIGHT INTEGRATION". > > -- Reuti > > >> Regards, >> Sudha >> >> -----Original Message----- >> From: Reuti [mailto:[email protected]] >> Sent: Friday, May 08, 2015 10:50 PM >> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >> Cc: [email protected]; [email protected] >> Subject: Re: [gridengine users] grid jobs not visible with qstat output >> >> >>> Am 08.05.2015 um 16:57 schrieb [email protected]: >>> >>> Hi Zhang, >>> >>> Please find the o/p >>> >>> 32682 61457200 27020 karppa 32682 >>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter >>> /gridapl1/HWEE_ge6/default/spo >>> 32734 61457200 27020 karppa 32734 \_ /bin/ksh ./run_it_file.vcs >>> 33043 61457200 27020 karppa 32734 \_ /bin/ksh ./vcs.start.dh.no_gui >>> 33059 61457200 27020 karppa 32734 \_ >>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+ >>> 38048 61457200 27020 karppa 32734 \_ [target.bin] <defunct> >>> 5049 61457200 27020 karppa 5049 >>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter >>> /gridapl1/HWEE_ge6/default/spoo >>> 5101 61457200 27020 karppa 5101 \_ /bin/ksh ./run_it_file.vcs >>> 5408 61457200 27020 karppa 5101 \_ /bin/ksh ./vcs.start.dh.no_gui >>> 5424 61457200 27020 karppa 5101 \_ >>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+a >>> 9089 61457200 27020 karppa 5101 \_ [target.bin] <defunct> >> >> The problem seems to be, that the `qrsh`starter` is no longer bound to the >> "sge_shephered". This was after the job? How does it look like while SGE >> still knows about the job. What is the startup mechanism: >> >> $ qconf -sconf >> ... >> qlogin_command builtin >> qlogin_daemon builtin >> rlogin_command builtin >> rlogin_daemon builtin >> rsh_command builtin >> rsh_daemon builtin >> >> -- Reuti >> >> >>> Regards, >>> Sudha >>> >>> -----Original Message----- >>> From: Feng Zhang [mailto:[email protected]] >>> Sent: Friday, May 08, 2015 7:35 PM >>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >>> Subject: Re: [gridengine users] grid jobs not visible with qstat output >>> >>> Sudha, >>> >>> Can you run "ps -e f -o pid,ppid,command", which can show more details? >>> >>> On Fri, May 8, 2015 at 4:09 AM, <[email protected]> wrote: >>>> Hi Reuti, >>>> >>>> The processes are not bound to sge_shepherd anymore. >>>> >>>> Below are the qrsh_starter processes running still >>>> >>>> 5049 ? 00:00:00 qrsh_starter >>>> 5101 ? 00:00:00 run_it_file.vcs >>>> 5408 ? 00:00:00 vcs.start.dh.no >>>> 5424 ? 8-20:57:02 simv >>>> 9089 ? 00:00:00 target.bin <defunct> >>>> 16868 ? 00:00:00 sshd >>>> 16913 pts/9 00:00:00 bash >>>> 17371 pts/9 00:00:00 ps >>>> 32682 ? 00:00:00 qrsh_starter >>>> 32734 ? 00:00:00 run_it_file.vcs >>>> 33043 ? 00:00:00 vcs.start.dh.no >>>> 33059 ? 8-21:19:03 simv >>>> 38048 ? 00:00:00 target.bin <defunct> >>>> >>>> Regards, >>>> Sudha >>>> >>>> -----Original Message----- >>>> From: Reuti [mailto:[email protected]] >>>> Sent: Thursday, May 07, 2015 9:52 PM >>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >>>> Cc: [email protected]; [email protected] >>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output >>>> >>>> Are the processes still bound to the sge_shephered or did they jump out of >>>> the process tree? By what method were they started by qrsh_starter: >>>> "builtin" or by defining `ssh`? >>>> >>>> -- Reuti >>>> >>>> >>>>> Am 07.05.2015 um 18:00 schrieb <[email protected]> >>>>> <[email protected]>: >>>>> >>>>> Hi, >>>>> >>>>> No the slots are not being used anymore >>>>> >>>>> That according to qstat I seem not to have any jobs at host. However, >>>>> there are my processes running in that specific host (launched by >>>>> qrsh_starter) that are altogether consuming 200% of CPU and licenses. The >>>>> problem here is that the processes have been running there over a week >>>>> and I haven't been aware of those. I've thought that the processes were >>>>> killed when the job was killed with qdel. >>>>> >>>>> What could be the reason for this. >>>>> >>>>> Regards, >>>>> Sudha >>>>> >>>>> From: Srirangam Addepalli [mailto:[email protected]] >>>>> Sent: Wednesday, May 06, 2015 7:52 PM >>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output >>>>> >>>>> That would be strange. Do the slots on the host show as being used. >>>>> >>>>> qhost -j -h hostname should list the jobs that Grid Engine is aware of. >>>>> Unless qrsh some how spwanned a process that is not bound by sge_execd. >>>>> On the client/ execution host what info do you have in active_jobs and >>>>> jobs directories. It is more likely that the qrsh session is terminated >>>>> but left resident processes. >>>>> >>>>> Rangam >>>>> >>>>> On Wed, May 6, 2015 at 9:05 AM, <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> I noticed that I've had two grid jobs running over a week on a machine of >>>>> which I haven't been aware of. Both of the jobs have been launched with >>>>> qrsh but they are not visible with qstat thus for a reason or another >>>>> they are no longer included in grid book-keeping. This issue will cause >>>>> that grid resources are wasted for ghost jobs as for example both of my >>>>> jobs seem to consume 100% CPU on the host. >>>>> >>>>> Can anyone please explain on this. >>>>> >>>>> Regards, >>>>> Sudha >>>>> >>>>> The information contained in this electronic message and any attachments >>>>> to this message are intended for the exclusive use of the addressee(s) >>>>> and may contain proprietary, confidential or privileged information. If >>>>> you are not the intended recipient, you should not disseminate, >>>>> distribute or copy this e-mail. Please notify the sender immediately and >>>>> destroy all copies of this message and any attachments. WARNING: Computer >>>>> viruses can be transmitted via email. The recipient should check this >>>>> email and any attachments for the presence of viruses. The company >>>>> accepts no liability for any damage caused by any virus transmitted by >>>>> this email. www.wipro.com >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>>> >>>>> The information contained in this electronic message and any attachments >>>>> to this message are intended for the exclusive use of the addressee(s) >>>>> and may contain proprietary, confidential or privileged information. If >>>>> you are not the intended recipient, you should not disseminate, >>>>> distribute or copy this e-mail. Please notify the sender immediately and >>>>> destroy all copies of this message and any attachments. WARNING: Computer >>>>> viruses can be transmitted via email. The recipient should check this >>>>> email and any attachments for the presence of viruses. The company >>>>> accepts no liability for any damage caused by any virus transmitted by >>>>> this email. www.wipro.com >>>> >>>> The information contained in this electronic message and any attachments >>>> to this message are intended for the exclusive use of the addressee(s) and >>>> may contain proprietary, confidential or privileged information. If you >>>> are not the intended recipient, you should not disseminate, distribute or >>>> copy this e-mail. Please notify the sender immediately and destroy all >>>> copies of this message and any attachments. WARNING: Computer viruses can >>>> be transmitted via email. The recipient should check this email and any >>>> attachments for the presence of viruses. The company accepts no liability >>>> for any damage caused by any virus transmitted by this email. www.wipro.com >>>> >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> >>> -- >>> Best, >>> >>> Feng >>> The information contained in this electronic message and any attachments to >>> this message are intended for the exclusive use of the addressee(s) and may >>> contain proprietary, confidential or privileged information. If you are not >>> the intended recipient, you should not disseminate, distribute or copy this >>> e-mail. Please notify the sender immediately and destroy all copies of this >>> message and any attachments. WARNING: Computer viruses can be transmitted >>> via email. The recipient should check this email and any attachments for >>> the presence of viruses. The company accepts no liability for any damage >>> caused by any virus transmitted by this email. www.wipro.com >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >>> >> >> The information contained in this electronic message and any attachments to >> this message are intended for the exclusive use of the addressee(s) and may >> contain proprietary, confidential or privileged information. If you are not >> the intended recipient, you should not disseminate, distribute or copy this >> e-mail. Please notify the sender immediately and destroy all copies of this >> message and any attachments. WARNING: Computer viruses can be transmitted >> via email. The recipient should check this email and any attachments for the >> presence of viruses. The company accepts no liability for any damage caused >> by any virus transmitted by this email. www.wipro.com >> > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. WARNING: Computer viruses can be transmitted via > email. The recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage caused > by any virus transmitted by this email. www.wipro.com > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
