Am 11.02.2014 um 23:37 schrieb Stephen Spencer:

> I did swap them initially, sorry. 
> 
> Yes, "qrsh -q all.q@n20 hostname" returns the appropriate FQDN.

So, you can reach the troublesome hosts now?

Next step is:

$ qalter -w v <job_id>
$ qalter -w p <job_id>

with the waiting jobs.

-- Reuti


> 
> Best,
> Stephen
> 
> 
> On Tue, Feb 11, 2014 at 2:33 PM, Reuti <[email protected]> wrote:
> Am 11.02.2014 um 23:20 schrieb Stephen Spencer:
> 
> > The definition of "qconf -sconf" is as you expected: all "builtin."
> >
> > Could you please be specific as to the commands you'd like me to try from 
> > the next line?
> >
> > Any output when you use the "-q ..." for `qrsh` too? In addition, you can 
> > try "-w v" and "-w p" too.
> 
> I meant:
> 
> $ qrsh -q all.q@n20 hostname
> 
> (queue@host, did you swap them?)
> 
> -- Reuti
> 
> 
> >
> > I tried "qrsh -w v" and "qrsh -w p" and both returned "verification: found 
> > suitable queue(s)".
> > "qrsh -q all.q" gave me a shell, surprisingly, on one of the troublesome 
> > nodes. (Actually, was three for three.)
> > All nodes have "BIP" for "qtype" - no limitations, there.
> >
> > Best,
> > Stephen
> >
> >
> > On Tue, Feb 11, 2014 at 1:57 PM, Reuti <[email protected]> wrote:
> > Hi,
> >
> > Am 11.02.2014 um 22:37 schrieb Stephen Spencer:
> >
> > > I have a sixty-node cluster running SGE 6.2u5 (RHEL 6.5).
> > >
> > > The immediate issue is that a user has jobs in the "qw" state, and there 
> > > are idle nodes in the cluster which appear to be able to accept the jobs.
> > >
> > > What works and doesn't work?
> > >       • "qsub -q [email protected] job.sh" works - the job runs on "n20"
> > >       • Repeated invocations of "qrsh hostname" will not, however, result 
> > > in the job running on one of the troublesome hosts.
> >
> > What is the definition of:
> >
> > $ qconf -sconf
> > ...
> > qlogin_command               builtin
> > qlogin_daemon                builtin
> > rlogin_command               builtin
> > rlogin_daemon                builtin
> > rsh_command                  builtin
> > rsh_daemon                   builtin
> >
> > Any output when you use the "-q ..." for `qrsh` too? In addition, you can 
> > try "-w v" and "-w p" too.
> >
> >
> > > Things I've tried, and know, so far:
> > >       • I've restarted the troublesome nodes - no change.
> > >       • "sge_execd" is running on the the troublesome nodes.
> > >       • The troublesome nodes are in the execution host list and the 
> > > submit host list.
> > >       • Most of the rest of the cluster's pretty busy.
> > >       • Interestingly, the troublesome nodes don't show up in the 
> > > "scheduling info" list produced as part of the "qstat -j <jobid>" 
> > > command's output.
> > > Short of restarting the entire cluster, I'm at a loss as to what to look 
> > > at next.
> >
> > Is "qtype INTERACTIVE" limited to certain nodes/queues?
> >
> > -- Reuti
> >
> >
> > > --
> > > Stephen Spencer
> > > [email protected]
> > > _______________________________________________
> > > users mailing list
> > > [email protected]
> > > https://gridengine.org/mailman/listinfo/users
> >
> >
> >
> >
> > --
> > Stephen Spencer
> > [email protected]
> 
> 
> 
> 
> -- 
> Stephen Spencer
> [email protected]


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to