GE users,

I while back, I made it mandatory for my users to specify h_rt when they
submit jobs. To prevent jobs from being queued but never running, I added

-w e

to my $SGE_ROOT/default/common/sge_request

An unintended side effect is that now certain jobs get rejected if they
can't be run immediately with this error:

Unable to run job: error: no suitable queues.
Exiting.

If I submit the same job, but specify '-w -w' or '-w n',  it will be
accepted and queued up

Reading the qsub man page, this should not be. -w validates a job
assuming an empty system with no other jobs, so if a job won't run with
'-w e', it certainly shouldn't run without it, either!

Here's an example of my problem using a simple MPI "Hello, World" Program:

# My submit script

$ more mpihello.sh
#!/bin/bash
#$ -N mpihello
#$ -pe orte 1
#$ -cwd
#$ -V
#$ -R y
#$ -l "h_rt=00:05:00,exclusive=true,cuda=false"

MPI=/usr/local/openmpi/pgi/x86_64
PATH=${MPI}/bin:${PATH}
LD_LIBRARY_PATH=${MPI}/lib
mpirun ./mpihello

# My 'normal' qsub command

$ qsub mpihello.sh
Unable to run job: error: no suitable queues.
Exiting.

# Using qsub with '-w n'

$ qsub -w n mpihello.sh
Your job 1247254 ("mpihello") has been submitted

In this case, my job is failing because of the exclusive=true in the
submit script. All of my cluster nodes are busy at the moment, so I
can't get exclusive use of any node right now. I would expect the job to
be queued until a node becomes free, but instead it's being rejected.
I'm using SGE 6.2u3. Is this is a bug?

-- 
Prentice 

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to