Hi,

I have a cluster with 168 slots (14 nodes). I use a submission script that
does nothing

#!/bin/bash --login
#$ -S /bin/bash
#$ -j y
#$ -cwd
#$ -pe make 16-
#$ -q all.q
#$ -N test

Make PE definition:

pe_name            make
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

This job, although it does nothing, blocks the queue for more than a minute

plbadob@k:~/test> qsub ./test.sge ; date ; qstat
Your job 127 ("test") has been submitted
Mon Nov 14 10:33:44 CET 2011
job-ID  prior   name       user         state submit/start at     queue
                     slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
    127 0.00000 test       plbadob      qw    11/14/2011 10:33:44
                        16
plbadob@k:~/test> date ; qstat
Mon Nov 14 10:33:46 CET 2011
job-ID  prior   name       user         state submit/start at     queue
                     slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
    127 0.55500 test       plbadob      r     11/14/2011 10:33:44
[email protected]            168
plbadob@k:~/test> date ; qstat
Mon Nov 14 10:35:10 CET 2011
job-ID  prior   name       user         state submit/start at     queue
                     slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
    127 0.55500 test       plbadob      r     11/14/2011 10:33:44
[email protected]            168
plbadob@k:~/test> date ; qstat
Mon Nov 14 10:35:18 CET 2011

The qacct reports that the job finished when it started - as it should be

plbadob@k:~/test> qacct -j 127
==============================================================
qname        all.q
qsub_time    Mon Nov 14 10:33:44 2011
start_time   Mon Nov 14 10:33:44 2011
end_time     Mon Nov 14 10:33:44 2011

When I switch control_slaves to FALSE everything works as expected - the
job is removed from the queue immediately.

Andy ideas what could be wrong? How to fix this behavior, so that the
cluster is not being blocked doing nothing?

Cheers,
Bartek
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to