> Am 08.05.2015 um 16:57 schrieb [email protected]: > > Hi Zhang, > > Please find the o/p > > 32682 61457200 27020 karppa 32682 > /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter > /gridapl1/HWEE_ge6/default/spo > 32734 61457200 27020 karppa 32734 \_ /bin/ksh ./run_it_file.vcs > 33043 61457200 27020 karppa 32734 \_ /bin/ksh ./vcs.start.dh.no_gui > 33059 61457200 27020 karppa 32734 \_ > ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+ > 38048 61457200 27020 karppa 32734 \_ [target.bin] <defunct> > 5049 61457200 27020 karppa 5049 > /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter > /gridapl1/HWEE_ge6/default/spoo > 5101 61457200 27020 karppa 5101 \_ /bin/ksh ./run_it_file.vcs > 5408 61457200 27020 karppa 5101 \_ /bin/ksh ./vcs.start.dh.no_gui > 5424 61457200 27020 karppa 5101 \_ > ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+a > 9089 61457200 27020 karppa 5101 \_ [target.bin] <defunct>
The problem seems to be, that the `qrsh`starter` is no longer bound to the "sge_shephered". This was after the job? How does it look like while SGE still knows about the job. What is the startup mechanism: $ qconf -sconf ... qlogin_command builtin qlogin_daemon builtin rlogin_command builtin rlogin_daemon builtin rsh_command builtin rsh_daemon builtin -- Reuti > Regards, > Sudha > > -----Original Message----- > From: Feng Zhang [mailto:[email protected]] > Sent: Friday, May 08, 2015 7:35 PM > To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) > Subject: Re: [gridengine users] grid jobs not visible with qstat output > > Sudha, > > Can you run "ps -e f -o pid,ppid,command", which can show more details? > > On Fri, May 8, 2015 at 4:09 AM, <[email protected]> wrote: >> Hi Reuti, >> >> The processes are not bound to sge_shepherd anymore. >> >> Below are the qrsh_starter processes running still >> >> 5049 ? 00:00:00 qrsh_starter >> 5101 ? 00:00:00 run_it_file.vcs >> 5408 ? 00:00:00 vcs.start.dh.no >> 5424 ? 8-20:57:02 simv >> 9089 ? 00:00:00 target.bin <defunct> >> 16868 ? 00:00:00 sshd >> 16913 pts/9 00:00:00 bash >> 17371 pts/9 00:00:00 ps >> 32682 ? 00:00:00 qrsh_starter >> 32734 ? 00:00:00 run_it_file.vcs >> 33043 ? 00:00:00 vcs.start.dh.no >> 33059 ? 8-21:19:03 simv >> 38048 ? 00:00:00 target.bin <defunct> >> >> Regards, >> Sudha >> >> -----Original Message----- >> From: Reuti [mailto:[email protected]] >> Sent: Thursday, May 07, 2015 9:52 PM >> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >> Cc: [email protected]; [email protected] >> Subject: Re: [gridengine users] grid jobs not visible with qstat output >> >> Are the processes still bound to the sge_shephered or did they jump out of >> the process tree? By what method were they started by qrsh_starter: >> "builtin" or by defining `ssh`? >> >> -- Reuti >> >> >>> Am 07.05.2015 um 18:00 schrieb <[email protected]> >>> <[email protected]>: >>> >>> Hi, >>> >>> No the slots are not being used anymore >>> >>> That according to qstat I seem not to have any jobs at host. However, there >>> are my processes running in that specific host (launched by qrsh_starter) >>> that are altogether consuming 200% of CPU and licenses. The problem here is >>> that the processes have been running there over a week and I haven’t been >>> aware of those. I’ve thought that the processes were killed when the job >>> was killed with qdel. >>> >>> What could be the reason for this. >>> >>> Regards, >>> Sudha >>> >>> From: Srirangam Addepalli [mailto:[email protected]] >>> Sent: Wednesday, May 06, 2015 7:52 PM >>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom) >>> Subject: Re: [gridengine users] grid jobs not visible with qstat output >>> >>> That would be strange. Do the slots on the host show as being used. >>> >>> qhost -j -h hostname should list the jobs that Grid Engine is aware of. >>> Unless qrsh some how spwanned a process that is not bound by sge_execd. On >>> the client/ execution host what info do you have in active_jobs and jobs >>> directories. It is more likely that the qrsh session is terminated but >>> left resident processes. >>> >>> Rangam >>> >>> On Wed, May 6, 2015 at 9:05 AM, <[email protected]> wrote: >>> Hi, >>> >>> I noticed that I've had two grid jobs running over a week on a machine of >>> which I haven't been aware of. Both of the jobs have been launched with >>> qrsh but they are not visible with qstat thus for a reason or another they >>> are no longer included in grid book-keeping. This issue will cause that >>> grid resources are wasted for ghost jobs as for example both of my jobs >>> seem to consume 100% CPU on the host. >>> >>> Can anyone please explain on this. >>> >>> Regards, >>> Sudha >>> >>> The information contained in this electronic message and any attachments to >>> this message are intended for the exclusive use of the addressee(s) and may >>> contain proprietary, confidential or privileged information. If you are not >>> the intended recipient, you should not disseminate, distribute or copy this >>> e-mail. Please notify the sender immediately and destroy all copies of this >>> message and any attachments. WARNING: Computer viruses can be transmitted >>> via email. The recipient should check this email and any attachments for >>> the presence of viruses. The company accepts no liability for any damage >>> caused by any virus transmitted by this email. www.wipro.com >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> The information contained in this electronic message and any attachments to >>> this message are intended for the exclusive use of the addressee(s) and may >>> contain proprietary, confidential or privileged information. If you are not >>> the intended recipient, you should not disseminate, distribute or copy this >>> e-mail. Please notify the sender immediately and destroy all copies of this >>> message and any attachments. WARNING: Computer viruses can be transmitted >>> via email. The recipient should check this email and any attachments for >>> the presence of viruses. The company accepts no liability for any damage >>> caused by any virus transmitted by this email. www.wipro.com >> >> The information contained in this electronic message and any attachments to >> this message are intended for the exclusive use of the addressee(s) and may >> contain proprietary, confidential or privileged information. If you are not >> the intended recipient, you should not disseminate, distribute or copy this >> e-mail. Please notify the sender immediately and destroy all copies of this >> message and any attachments. WARNING: Computer viruses can be transmitted >> via email. The recipient should check this email and any attachments for the >> presence of viruses. The company accepts no liability for any damage caused >> by any virus transmitted by this email. www.wipro.com >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > > > -- > Best, > > Feng > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. WARNING: Computer viruses can be transmitted via > email. The recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage caused > by any virus transmitted by this email. www.wipro.com > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
