GE users,
I while back, I made it mandatory for my users to specify h_rt when they
submit jobs. To prevent jobs from being queued but never running, I added
-w e
to my $SGE_ROOT/default/common/sge_request
An unintended side effect is that now certain jobs get rejected if they
can't be run immediately with this error:
Unable to run job: error: no suitable queues.
Exiting.
If I submit the same job, but specify '-w -w' or '-w n', it will be
accepted and queued up
Reading the qsub man page, this should not be. -w validates a job
assuming an empty system with no other jobs, so if a job won't run with
'-w e', it certainly shouldn't run without it, either!
Here's an example of my problem using a simple MPI "Hello, World" Program:
# My submit script
$ more mpihello.sh
#!/bin/bash
#$ -N mpihello
#$ -pe orte 1
#$ -cwd
#$ -V
#$ -R y
#$ -l "h_rt=00:05:00,exclusive=true,cuda=false"
MPI=/usr/local/openmpi/pgi/x86_64
PATH=${MPI}/bin:${PATH}
LD_LIBRARY_PATH=${MPI}/lib
mpirun ./mpihello
# My 'normal' qsub command
$ qsub mpihello.sh
Unable to run job: error: no suitable queues.
Exiting.
# Using qsub with '-w n'
$ qsub -w n mpihello.sh
Your job 1247254 ("mpihello") has been submitted
In this case, my job is failing because of the exclusive=true in the
submit script. All of my cluster nodes are busy at the moment, so I
can't get exclusive use of any node right now. I would expect the job to
be queued until a node becomes free, but instead it's being rejected.
I'm using SGE 6.2u3. Is this is a bug?
--
Prentice
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users