Re: [gridengine users] how to reserve all cluster slots for maintenance?

Reuti Fri, 19 Apr 2013 02:46:40 -0700

Hi,

Am 18.04.2013 um 17:37 schrieb Riccardo Murri:


>>> However, large adavance reservations seem to behave erratically. Our
>>> cluster is comprised of 576 nodes, each with 8 cores, so distributed
>>> among queues:
>>> 
>>>   # qstat -g c
>>>   CLUSTER QUEUE                                       CQLOAD   USED
>>> RES  AVAIL  TOTAL aoACDPS  cdsuE
>>>   
>>> ----------------------------------------------------------------------------------------------
>>>   long.q                                                1.88      8
>>>   0      0    496    112      0
>>>   med.q                                                 0.96      8
>>>   0     16    960      0      0
>>>   short.q                                               0.81      8
>>>   0    264   1536     88     16
>>>   test.q                                                0.36      0
>>>   0     80     80      0      0
>>>   very-short.q                                          0.36      8
>>>   0     40     80      0      0
>>>   wide.q                                                1.23      8
>>>   0    616   1536    288      0
>>> 
>>> No two queues overlap, excaept for queue `test.q` which is reserved to
>>> sysadmins and thus of no concern.
>>> 
>>> A reservation for 4608 slots (576 nodes with 8 cores each) for 12
>>> hours (from 08:00 to 20:00)
>>> fails:
>>> 
>>>   # qrsub -a 201305190800 -e 201305192000 -pe parastation 4608
>>>   queue "short.q@r06c01b12n01" is temporarily disabled
>>>   queue "short.q@r06c01b12n02" is temporarily disabled
>>>   advance_reservation: no suitable queues
>> 
>> The PE is also attached to very-short.q?
> 
> Yes:
> 
>    murri@login2:~> qconf -sq very-short.q
>    qname                 very-short.q
>    ...
>    pe_list               intel-mpi2_mpd mpich1_rsh mpich2_rsh openmpi
> openmpi2 \
>                          parastation smp
> 
>>> However, trying to reserve the same number of slots for the next day
>>> fails again:
>>> 
>>>   # qrsub -a 201305200800 -e 201305202000 -pe parastation 4528
>>>   queue "short.q@r06c01b12n01" is temporarily disabled
>>>   queue "short.q@r06c01b12n02" is temporarily disabled
>>>   advance_reservation: no suitable queues
>> 
>> Maybe the problem is not the very-short.q, but the actual running jobs which 
>> have a longer h_rt defined.
> 
> Maximal runtime for any job is 3 days (in long.q, other queues have
> shorter limits), so that shouldn't affect reservations that happen to
> be in one month from now, shouldn't it?
> 
> However, some jobs only define {s,h}_cpu and no {s,h}_rt.  Could that
> be the cause?

In principle yes, the default_duration will be taken then for the exstimated 
runtime of the job. But I must admit, that it's strange that it's blocked at a 
later point in time. There is no other AR in the way I assume.

-- Reuti


>> Is there in addition any calendar defined?
> 
> No:
> 
>    murri@login2:~> qconf -scall
>    no calendar defined
> 
> Thanks,
> Riccardo


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] how to reserve all cluster slots for maintenance?

Reply via email to