Am 10.05.2015 um 19:30 schrieb <[email protected]> <[email protected]>:
> Hi Reuti, > > The startup mechanism is as below > > qlogin_daemon /usr/sbin/sshd -i > qlogin_command /gridapl1/HWEE_ge6/new/qssh Then it's most likely that the `ssh` is not tightly integrated into SGE. Please have a look at: https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html section "SSH TIGHT INTEGRATION". -- Reuti > Regards, > Sudha > > -----Original Message----- > From: Reuti [mailto:[email protected]] > Sent: Friday, May 08, 2015 10:50 PM > To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > Cc: [email protected]; [email protected] > Subject: Re: [gridengine users] grid jobs not visible with qstat output > > >> Am 08.05.2015 um 16:57 schrieb [email protected]: >> >> Hi Zhang, >> >> Please find the o/p >> >> 32682 61457200 27020 karppa 32682 >> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter >> /gridapl1/HWEE_ge6/default/spo >> 32734 61457200 27020 karppa 32734 \_ /bin/ksh ./run_it_file.vcs >> 33043 61457200 27020 karppa 32734 \_ /bin/ksh ./vcs.start.dh.no_gui >> 33059 61457200 27020 karppa 32734 \_ >> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+ >> 38048 61457200 27020 karppa 32734 \_ [target.bin] <defunct> >> 5049 61457200 27020 karppa 5049 >> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter >> /gridapl1/HWEE_ge6/default/spoo >> 5101 61457200 27020 karppa 5101 \_ /bin/ksh ./run_it_file.vcs >> 5408 61457200 27020 karppa 5101 \_ /bin/ksh ./vcs.start.dh.no_gui >> 5424 61457200 27020 karppa 5101 \_ >> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+a >> 9089 61457200 27020 karppa 5101 \_ [target.bin] <defunct> > > The problem seems to be, that the `qrsh`starter` is no longer bound to the > "sge_shephered". This was after the job? How does it look like while SGE > still knows about the job. What is the startup mechanism: > > $ qconf -sconf > ... > qlogin_command builtin > qlogin_daemon builtin > rlogin_command builtin > rlogin_daemon builtin > rsh_command builtin > rsh_daemon builtin > > -- Reuti > > >> Regards, >> Sudha >> >> -----Original Message----- >> From: Feng Zhang [mailto:[email protected]] >> Sent: Friday, May 08, 2015 7:35 PM >> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >> Subject: Re: [gridengine users] grid jobs not visible with qstat output >> >> Sudha, >> >> Can you run "ps -e f -o pid,ppid,command", which can show more details? >> >> On Fri, May 8, 2015 at 4:09 AM, <[email protected]> wrote: >>> Hi Reuti, >>> >>> The processes are not bound to sge_shepherd anymore. >>> >>> Below are the qrsh_starter processes running still >>> >>> 5049 ? 00:00:00 qrsh_starter >>> 5101 ? 00:00:00 run_it_file.vcs >>> 5408 ? 00:00:00 vcs.start.dh.no >>> 5424 ? 8-20:57:02 simv >>> 9089 ? 00:00:00 target.bin <defunct> >>> 16868 ? 00:00:00 sshd >>> 16913 pts/9 00:00:00 bash >>> 17371 pts/9 00:00:00 ps >>> 32682 ? 00:00:00 qrsh_starter >>> 32734 ? 00:00:00 run_it_file.vcs >>> 33043 ? 00:00:00 vcs.start.dh.no >>> 33059 ? 8-21:19:03 simv >>> 38048 ? 00:00:00 target.bin <defunct> >>> >>> Regards, >>> Sudha >>> >>> -----Original Message----- >>> From: Reuti [mailto:[email protected]] >>> Sent: Thursday, May 07, 2015 9:52 PM >>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >>> Cc: [email protected]; [email protected] >>> Subject: Re: [gridengine users] grid jobs not visible with qstat output >>> >>> Are the processes still bound to the sge_shephered or did they jump out of >>> the process tree? By what method were they started by qrsh_starter: >>> "builtin" or by defining `ssh`? >>> >>> -- Reuti >>> >>> >>>> Am 07.05.2015 um 18:00 schrieb <[email protected]> >>>> <[email protected]>: >>>> >>>> Hi, >>>> >>>> No the slots are not being used anymore >>>> >>>> That according to qstat I seem not to have any jobs at host. However, >>>> there are my processes running in that specific host (launched by >>>> qrsh_starter) that are altogether consuming 200% of CPU and licenses. The >>>> problem here is that the processes have been running there over a week and >>>> I haven’t been aware of those. I’ve thought that the processes were killed >>>> when the job was killed with qdel. >>>> >>>> What could be the reason for this. >>>> >>>> Regards, >>>> Sudha >>>> >>>> From: Srirangam Addepalli [mailto:[email protected]] >>>> Sent: Wednesday, May 06, 2015 7:52 PM >>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output >>>> >>>> That would be strange. Do the slots on the host show as being used. >>>> >>>> qhost -j -h hostname should list the jobs that Grid Engine is aware of. >>>> Unless qrsh some how spwanned a process that is not bound by sge_execd. On >>>> the client/ execution host what info do you have in active_jobs and jobs >>>> directories. It is more likely that the qrsh session is terminated but >>>> left resident processes. >>>> >>>> Rangam >>>> >>>> On Wed, May 6, 2015 at 9:05 AM, <[email protected]> wrote: >>>> Hi, >>>> >>>> I noticed that I've had two grid jobs running over a week on a machine of >>>> which I haven't been aware of. Both of the jobs have been launched with >>>> qrsh but they are not visible with qstat thus for a reason or another they >>>> are no longer included in grid book-keeping. This issue will cause that >>>> grid resources are wasted for ghost jobs as for example both of my jobs >>>> seem to consume 100% CPU on the host. >>>> >>>> Can anyone please explain on this. >>>> >>>> Regards, >>>> Sudha >>>> >>>> The information contained in this electronic message and any attachments >>>> to this message are intended for the exclusive use of the addressee(s) and >>>> may contain proprietary, confidential or privileged information. If you >>>> are not the intended recipient, you should not disseminate, distribute or >>>> copy this e-mail. Please notify the sender immediately and destroy all >>>> copies of this message and any attachments. WARNING: Computer viruses can >>>> be transmitted via email. The recipient should check this email and any >>>> attachments for the presence of viruses. The company accepts no liability >>>> for any damage caused by any virus transmitted by this email. www.wipro.com >>>> >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users >>>> >>>> >>>> The information contained in this electronic message and any attachments >>>> to this message are intended for the exclusive use of the addressee(s) and >>>> may contain proprietary, confidential or privileged information. If you >>>> are not the intended recipient, you should not disseminate, distribute or >>>> copy this e-mail. Please notify the sender immediately and destroy all >>>> copies of this message and any attachments. WARNING: Computer viruses can >>>> be transmitted via email. The recipient should check this email and any >>>> attachments for the presence of viruses. The company accepts no liability >>>> for any damage caused by any virus transmitted by this email. www.wipro.com >>> >>> The information contained in this electronic message and any attachments to >>> this message are intended for the exclusive use of the addressee(s) and may >>> contain proprietary, confidential or privileged information. If you are not >>> the intended recipient, you should not disseminate, distribute or copy this >>> e-mail. Please notify the sender immediately and destroy all copies of this >>> message and any attachments. WARNING: Computer viruses can be transmitted >>> via email. The recipient should check this email and any attachments for >>> the presence of viruses. The company accepts no liability for any damage >>> caused by any virus transmitted by this email. www.wipro.com >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> >> >> -- >> Best, >> >> Feng >> The information contained in this electronic message and any attachments to >> this message are intended for the exclusive use of the addressee(s) and may >> contain proprietary, confidential or privileged information. If you are not >> the intended recipient, you should not disseminate, distribute or copy this >> e-mail. Please notify the sender immediately and destroy all copies of this >> message and any attachments. WARNING: Computer viruses can be transmitted >> via email. The recipient should check this email and any attachments for the >> presence of viruses. The company accepts no liability for any damage caused >> by any virus transmitted by this email. www.wipro.com >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. WARNING: Computer viruses can be transmitted via > email. The recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage caused > by any virus transmitted by this email. www.wipro.com > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
